Aggregates#

What are Aggregates (in PostgreSQL)?#

An aggregate is a function that:: Takes many rows → produces one result
In simple words:: “Summarize data”

Common aggregates you already know

These are built-in aggregates (they live here):

Aggregate	What it does
COUNT()	Number of rows
SUM()	Total value
AVG()	Average
MIN()	Smallest value
MAX()	Largest value
BOOL_AND()	All true?
BOOL_OR()	Any true?

Example:

SELECT COUNT(*) FROM users;

Takes many user rows → returns one number

Why pgAdmin shows “Aggregates” separately

Because in PostgreSQL:

Aggregates are schema objects
They are not “just functions”
They can be custom-defined

So PostgreSQL treats them as first-class database objects

Difference: Function vs Aggregate

Function	Aggregate
Works on 1 row	Works on many rows
LOWER(name)	COUNT(name)
Called per row	Called per group
Simple	Stateful

How aggregates actually work internally#

An aggregate has 3 phases:

State initialization
State transition (per row)
Final result

Example: SUM(amount)

Start at 0
Add each amount
Return final value

Example with GROUP BY

SELECT department, COUNT(*)
FROM employees
GROUP BY department;

department	count
IT	5
HR	2

Each group uses the aggregate separately

Custom Aggregates (advanced but powerful)

You can define your own aggregate:

Example idea:

Median
Custom weighted score
JSON merge
Financial rollups

Very simplified example (conceptual):

CREATE AGGREGATE my_sum(integer) (
    SFUNC = int4pl,
    STYPE = integer
);

Used in advanced analytics systems, not day-one apps.

Security & schema rule

Never create app aggregates in public in production

Why?

Aggregates affect query planning
Can override built-ins
Can cause unexpected behavior

Best practice:

analytics.sum_revenue()

instead of:

public.sum_revenue()

Mental model

Aggregates are “reducers”

They collapse many rows into one meaning.

Anatomy of CREATE AGGREGATE#

Before examples, understand this once:

CREATE AGGREGATE name (input_type) (
    SFUNC     = state_function,
    STYPE     = state_type,
    INITCOND  = initial_value,   -- optional
    FINALFUNC = final_function   -- optional
);

Meaning

Part	Purpose
STYPE	Type of internal state
SFUNC	Runs once per row
INITCOND	Initial state value
FINALFUNC	Converts state → final output

🟢 Example 1: Custom SUM (integer)

Step 1: State function

CREATE OR REPLACE FUNCTION my_int_sum(state integer, value integer)
RETURNS integer
LANGUAGE sql
AS $$
    SELECT COALESCE(state, 0) + COALESCE(value, 0);
$$;

Step 2: Aggregate

CREATE AGGREGATE my_sum(integer) (
    SFUNC = my_int_sum,
    STYPE = integer,
    INITCOND = 0
);

Usage

SELECT my_sum(salary) FROM employees;

🟢 Example 2: STRING CONCAT Aggregate (custom GROUP_CONCAT)

(PostgreSQL already has string_agg, but this teaches the concept.)

State function

CREATE OR REPLACE FUNCTION concat_state(state text, value text)
RETURNS text
LANGUAGE sql
AS $$
    SELECT
        CASE
            WHEN state IS NULL THEN value
            ELSE state || ', ' || value
        END;
$$;

Aggregate

CREATE AGGREGATE my_concat(text) (
    SFUNC = concat_state,
    STYPE = text
);

Usage

SELECT my_concat(name) FROM users;

🟡 Example 3: Average (manual implementation)

State = (sum, count)

CREATE TYPE avg_state AS (
    total numeric,
    count integer
);

State transition

CREATE OR REPLACE FUNCTION avg_sfunc(state avg_state, value numeric)
RETURNS avg_state
LANGUAGE sql
AS $$
    SELECT
        (COALESCE(state.total, 0) + value,
        COALESCE(state.count, 0) + 1);
$$;

Final function

CREATE OR REPLACE FUNCTION avg_final(state avg_state)
RETURNS numeric
LANGUAGE sql
AS $$
    SELECT state.total / state.count;
$$;

Aggregate

CREATE AGGREGATE my_avg(numeric) (
    SFUNC     = avg_sfunc,
    STYPE     = avg_state,
    FINALFUNC = avg_final
);

Usage

SELECT my_avg(price) FROM products;

🔵 Example 4: MEDIAN (classic interview example)

State = array of numbers

CREATE OR REPLACE FUNCTION median_state(state numeric[], value numeric)
RETURNS numeric[]
LANGUAGE sql
AS $$
    SELECT array_append(state, value);
$$;

Final function

CREATE OR REPLACE FUNCTION median_final(state numeric[])
RETURNS numeric
LANGUAGE sql
AS $$
    SELECT percentile_cont(0.5)
    WITHIN GROUP (ORDER BY x)
    FROM unnest(state) x;
$$;

Aggregate

CREATE AGGREGATE median(numeric) (
    SFUNC     = median_state,
    STYPE     = numeric[],
    INITCOND  = '{}',
    FINALFUNC = median_final
);

Usage

SELECT median(salary) FROM employees;

🔴 Example 5: JSON Merge Aggregate (very practical)

State function

CREATE OR REPLACE FUNCTION json_merge(state jsonb, value jsonb)
RETURNS jsonb
LANGUAGE sql
AS $$
    SELECT COALESCE(state, '{}'::jsonb) || value;
$$;

Aggregate

CREATE AGGREGATE json_merge_agg(jsonb) (
    SFUNC = json_merge,
    STYPE = jsonb,
    INITCOND = '{}'
);

Usage

SELECT json_merge_agg(metadata)
FROM events;

Production Rules

❌ Do NOT create aggregates in public in prod

✔ Create a dedicated schema:

CREATE SCHEMA analytics;

CREATE AGGREGATE analytics.median(numeric) (...)

When should you create custom aggregates?

✅ Analytics
✅ Reporting engines
✅ Financial rollups
✅ JSON / event systems
❌ Normal CRUD apps
❌ Beginner projects

Mental Model

Aggregate = Function + Memory + Finalizer

Aggregates

Contents

Aggregates#

What are Aggregates (in PostgreSQL)?#

How aggregates actually work internally#

Anatomy of CREATE AGGREGATE#