aggregates Aggregates#

What are Aggregates (in PostgreSQL)?#

An aggregate is a function that:

Takes many rows → produces one result

In simple words:

“Summarize data”

Common aggregates you already know

These are built-in aggregates (they live here):

Aggregate

What it does

COUNT()

Number of rows

SUM()

Total value

AVG()

Average

MIN()

Smallest value

MAX()

Largest value

BOOL_AND()

All true?

BOOL_OR()

Any true?

Example:

SELECT COUNT(*) FROM users;

Takes many user rows → returns one number


Why pgAdmin shows “Aggregates” separately

Because in PostgreSQL:
  • Aggregates are schema objects

  • They are not “just functions”

  • They can be custom-defined

So PostgreSQL treats them as first-class database objects

Difference: Function vs Aggregate

Function

Aggregate

Works on 1 row

Works on many rows

LOWER(name)

COUNT(name)

Called per row

Called per group

Simple

Stateful


How aggregates actually work internally#

An aggregate has 3 phases:
  1. State initialization

  2. State transition (per row)

  3. Final result

Example: SUM(amount)
  • Start at 0

  • Add each amount

  • Return final value

Example with GROUP BY

SELECT department, COUNT(*)
FROM employees
GROUP BY department;

department

count

IT

5

HR

2

Each group uses the aggregate separately


Custom Aggregates (advanced but powerful)

You can define your own aggregate:

Example idea:
  • Median

  • Custom weighted score

  • JSON merge

  • Financial rollups

Very simplified example (conceptual):

CREATE AGGREGATE my_sum(integer) (
    SFUNC = int4pl,
    STYPE = integer
);

Used in advanced analytics systems, not day-one apps.


Security & schema rule

Never create app aggregates in public in production

Why?
  • Aggregates affect query planning

  • Can override built-ins

  • Can cause unexpected behavior

Best practice:

analytics.sum_revenue()

instead of:

public.sum_revenue()

Mental model

Aggregates are “reducers”

They collapse many rows into one meaning.


Anatomy of CREATE AGGREGATE#

Before examples, understand this once:

CREATE AGGREGATE name (input_type) (
    SFUNC     = state_function,
    STYPE     = state_type,
    INITCOND  = initial_value,   -- optional
    FINALFUNC = final_function   -- optional
);

Meaning

Part

Purpose

STYPE

Type of internal state

SFUNC

Runs once per row

INITCOND

Initial state value

FINALFUNC

Converts state → final output


🟢 Example 1: Custom SUM (integer)

Step 1: State function

CREATE OR REPLACE FUNCTION my_int_sum(state integer, value integer)
RETURNS integer
LANGUAGE sql
AS $$
    SELECT COALESCE(state, 0) + COALESCE(value, 0);
$$;

Step 2: Aggregate

CREATE AGGREGATE my_sum(integer) (
    SFUNC = my_int_sum,
    STYPE = integer,
    INITCOND = 0
);

Usage

SELECT my_sum(salary) FROM employees;

🟢 Example 2: STRING CONCAT Aggregate (custom GROUP_CONCAT)

(PostgreSQL already has string_agg, but this teaches the concept.)

State function

CREATE OR REPLACE FUNCTION concat_state(state text, value text)
RETURNS text
LANGUAGE sql
AS $$
    SELECT
        CASE
            WHEN state IS NULL THEN value
            ELSE state || ', ' || value
        END;
$$;

Aggregate

CREATE AGGREGATE my_concat(text) (
    SFUNC = concat_state,
    STYPE = text
);

Usage

SELECT my_concat(name) FROM users;

🟡 Example 3: Average (manual implementation)

State = (sum, count)

CREATE TYPE avg_state AS (
    total numeric,
    count integer
);

State transition

CREATE OR REPLACE FUNCTION avg_sfunc(state avg_state, value numeric)
RETURNS avg_state
LANGUAGE sql
AS $$
    SELECT
        (COALESCE(state.total, 0) + value,
        COALESCE(state.count, 0) + 1);
$$;

Final function

CREATE OR REPLACE FUNCTION avg_final(state avg_state)
RETURNS numeric
LANGUAGE sql
AS $$
    SELECT state.total / state.count;
$$;

Aggregate

CREATE AGGREGATE my_avg(numeric) (
    SFUNC     = avg_sfunc,
    STYPE     = avg_state,
    FINALFUNC = avg_final
);

Usage

SELECT my_avg(price) FROM products;

🔵 Example 4: MEDIAN (classic interview example)

State = array of numbers

CREATE OR REPLACE FUNCTION median_state(state numeric[], value numeric)
RETURNS numeric[]
LANGUAGE sql
AS $$
    SELECT array_append(state, value);
$$;

Final function

CREATE OR REPLACE FUNCTION median_final(state numeric[])
RETURNS numeric
LANGUAGE sql
AS $$
    SELECT percentile_cont(0.5)
    WITHIN GROUP (ORDER BY x)
    FROM unnest(state) x;
$$;

Aggregate

CREATE AGGREGATE median(numeric) (
    SFUNC     = median_state,
    STYPE     = numeric[],
    INITCOND  = '{}',
    FINALFUNC = median_final
);

Usage

SELECT median(salary) FROM employees;

đź”´ Example 5: JSON Merge Aggregate (very practical)

State function

CREATE OR REPLACE FUNCTION json_merge(state jsonb, value jsonb)
RETURNS jsonb
LANGUAGE sql
AS $$
    SELECT COALESCE(state, '{}'::jsonb) || value;
$$;

Aggregate

CREATE AGGREGATE json_merge_agg(jsonb) (
    SFUNC = json_merge,
    STYPE = jsonb,
    INITCOND = '{}'
);

Usage

SELECT json_merge_agg(metadata)
FROM events;

Production Rules

❌ Do NOT create aggregates in public in prod

âś” Create a dedicated schema:

CREATE SCHEMA analytics;
CREATE AGGREGATE analytics.median(numeric) (...)

When should you create custom aggregates?

  • âś… Analytics

  • âś… Reporting engines

  • âś… Financial rollups

  • âś… JSON / event systems

  • ❌ Normal CRUD apps

  • ❌ Beginner projects


Mental Model

Aggregate = Function + Memory + Finalizer