fts_parsers FTS Parsers#

What a parser does#

Given this text:

Email me at test@example.com on 2026-01-01.
The parser:
  1. Walks through the text character by character

  2. Splits it into tokens

  3. Assigns each token a type

Example tokens:
  • Email → word

  • me → word

  • test@example.com → email

  • 2026-01-01 → date

  • . → punctuation

The parser does not:
  • Stem words

  • Remove stop-words

  • Normalize text

That’s the dictionary’s job, later.


Parser vs Dictionary#

Component

Responsibility

Parser

Splits text + labels token types

Dictionary

Decides how each token is processed

Configuration

Connects token types → dictionaries


Built-in parser: default#

PostgreSQL ships with one main parser called default.

It understands:
  • Words

  • Numbers

  • URLs

  • Emails

  • Hyphenated words

  • Punctuation

  • Whitespace

In practice:

99% of databases use the default parser unchanged


Token types produced by the parser

Some common token types:
  • word

  • numword

  • email

  • url

  • host

  • sfloat

  • version

  • hword (hyphenated word)

You can see them with:

SELECT alias, tokid
FROM ts_token_type('default');

How parsers are used (indirectly)#

You never call a parser directly.

When you run:

SELECT to_tsvector('english', 'Some sample text');
Internally:
  1. Parser splits the text into tokens

  2. Configuration maps token types

  3. Dictionaries normalize/filter words

  4. A tsvector is produced


Why you almost never change parsers#

Parsers are:
  • Low-level

  • Hard to customize

  • Written in C (not SQL)

Most customization happens at:
  • Dictionary level

  • Configuration level

If you want different behavior:
  • Change dictionaries

  • Change token mappings

—not the parser.


pgAdmin: what you’re seeing#

In Schemas → public → FTS Parsers, pgAdmin shows:
  • Available parsers (usually just default)

  • Their token types

  • Internal metadata

This mirrors:

SELECT * FROM pg_ts_parser;

When would someone create a custom parser? (rare)#

Only if:
  • Text is not natural language

  • You have custom syntax (logs, DSLs, code)

  • Tokens must be split in a non-standard way

Examples:
  • Source code search

  • Structured log formats

  • Custom markup languages


Summary:

FTS Parsers split text into labeled pieces; dictionaries decide what those pieces mean.