FTS Parsers#
What a parser does#
Given this text:
Email me at test@example.com on 2026-01-01.
- The parser:
Walks through the text character by character
Splits it into tokens
Assigns each token a type
- Example tokens:
Email → word
me → word
test@example.com → email
2026-01-01 → date
. → punctuation
- The parser does not:
Stem words
Remove stop-words
Normalize text
That’s the dictionary’s job, later.
Parser vs Dictionary#
Component |
Responsibility |
|---|---|
Parser |
Splits text + labels token types |
Dictionary |
Decides how each token is processed |
Configuration |
Connects token types → dictionaries |
Built-in parser: default#
PostgreSQL ships with one main parser called default.
- It understands:
Words
Numbers
URLs
Emails
Hyphenated words
Punctuation
Whitespace
- In practice:
99% of databases use the default parser unchanged
Token types produced by the parser
- Some common token types:
word
numword
email
url
host
sfloat
version
hword (hyphenated word)
You can see them with:
SELECT alias, tokid
FROM ts_token_type('default');
How parsers are used (indirectly)#
You never call a parser directly.
When you run:
SELECT to_tsvector('english', 'Some sample text');
- Internally:
Parser splits the text into tokens
Configuration maps token types
Dictionaries normalize/filter words
A tsvector is produced
Why you almost never change parsers#
- Parsers are:
Low-level
Hard to customize
Written in C (not SQL)
- Most customization happens at:
Dictionary level
Configuration level
- If you want different behavior:
Change dictionaries
Change token mappings
—not the parser.
pgAdmin: what you’re seeing#
- In Schemas → public → FTS Parsers, pgAdmin shows:
Available parsers (usually just default)
Their token types
Internal metadata
This mirrors:
SELECT * FROM pg_ts_parser;
When would someone create a custom parser? (rare)#
- Only if:
Text is not natural language
You have custom syntax (logs, DSLs, code)
Tokens must be split in a non-standard way
- Examples:
Source code search
Structured log formats
Custom markup languages
- Summary:
FTS Parsers split text into labeled pieces; dictionaries decide what those pieces mean.