Database2026-W205 min read

Bounding PostgreSQL Ordered-Set Aggregates Before They Eat Your Database

percentile_cont looks elegant — no intermediate materialization in the plan. It's still a full sort under the planner. Two guards (TTL clamp + row cap) keep the aggregate bounded regardless of customer size.

The problem

percentile_cont(0.5) WITHIN GROUP (ORDER BY value) is one of PostgreSQL's more elegant constructs — pass it a column, get the median back, no intermediate materialization in the query plan. The problem is that "no intermediate materialization" doesn't mean "no scan." The aggregate still has to read and sort every row in the input set to find the 50th percentile. In a tight CTE that feeds it a filtered subquery, this is fine as long as the filter actually removes most of the rows. When it doesn't — say, when the time range is unbounded or the org has years of usage data — the aggregate quietly turns into a full-table sort under the planner's feet.

I found this in a violation-detection service. The function used percentile_cont to find the median prompt token count per model, then flagged requests where a user's input was >3× that median. The math was correct. The query plan was not: no TTL on the lookback window, no row cap on the CTE that fed the aggregate. For a small dataset it ran in milliseconds. For a large one it would scan and sort every event row for that org since the beginning of time.

The approach

Two guards, not one.

TTL guard: Clamp the start date. If the caller requests a window starting three years ago, override it to 90 days back. The 90th-day data is still statistically valid for detecting prompt inflation — a model's median input length doesn't drift that slowly. The key insight is that the aggregate doesn't need the full history to be accurate; it needs a representative sample.

PROMPT_BLOAT_MAX_LOOKBACK_DAYS = 90
lookback_floor = end_date - timedelta(days=PROMPT_BLOAT_MAX_LOOKBACK_DAYS)
effective_start = max(start_date, lookback_floor)

Row cap: Even with a 90-day window, a high-volume org can have millions of rows. Cap the CTE at 50,000 rows, ordered by ts DESC. This keeps the aggregate's input bounded regardless of org size, and the recency bias is intentional — recent behavior is more predictive of current norms than behavior from 89 days ago.

WITH scoped AS (
    SELECT input_tokens, model
    FROM events
    WHERE org_id = :org_id
      AND model = :model
      AND ts BETWEEN :start_date AND :end_date
    ORDER BY ts DESC
    LIMIT :scoped_limit
)
SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY input_tokens) AS median_input
FROM scoped

The two guards compose: the TTL shrinks the time window, the row cap bounds the worst case within that window. Either alone is insufficient — a high-volume org can blow past 50K rows in 90 days, and a low-volume org with a very old start date might have 90 days of data but concentrated in an old cohort.

What I learned

The planner doesn't warn you when an aggregate is scanning more than you expect. EXPLAIN ANALYZE will show you the sort node and its cost, but only after you know to look. The symptom in production is usually "this query was fast for months then suddenly slow" — which is a new customer, not a new bug. Adding explicit caps at the query-construction layer (not as a Postgres hint, but as a hard parameter) keeps the performance envelope predictable regardless of customer size. The 90-day TTL and 50K cap aren't arbitrary — they're the smallest values that preserve statistical validity for the use case. Change the use case, change the constants, but always have both.

Start a build