FTS Dictionaries#
What is an FTS Dictionary?#
An FTS dictionary is the component that processes each token (word-like piece) after the text is split by the parser. Its job is to decide what to keep, how to normalize it, or whether to drop it.
- In short:
Parser → splits text
Dictionary → understands words
Configuration → connects everything
What a dictionary actually does#
For every token it receives, a dictionary can:
Normalize: - Lowercase - Remove punctuation - Convert variants to a base form
- Reduce:
Convert related forms to one base form (stemming)
Filter: - Drop stop-words (the, is, and, etc.) - Drop tokens it doesn’t recognize
- Accept as-is:
Keep the word unchanged
Common built-in dictionary types#
- 1️⃣ simple (most important to understand)
Keeps words as they are
No stemming
No stop-words
Just lowercases
SELECT to_tsvector('simple', 'Running runs runner');
-- 'running' 'runs' 'runner'
- Use when:
Exact words matter
You don’t want linguistic changes
- 2️⃣ Language stem dictionaries (e.g., english_stem)
Converts words to a root form
Removes common stop-words
SELECT to_tsvector('english', 'running runs runner');
-- 'run'
- Use when:
You want conceptual matching
“run”, “running”, “runs” should match
- 3️⃣ synonym dictionary
Maps words to equivalent terms
Example synonym file:
dbms database
pgsql postgresql
- Result:
Searching for database also matches dbms
- Used for:
Aliases
Abbreviations
Product names
- 4️⃣ thesaurus dictionary
More advanced synonyms
Supports phrases
Can expand terms into multiple tokens
- Used for:
Advanced search UX
Domain-specific vocabulary
- 5️⃣ ispell dictionary
Morphological analysis
External dictionary + affix rules
- Used when:
You need fine-grained linguistic control
You have custom dictionary files
How dictionaries are wired#
Dictionaries never work alone.
They are used by a configuration, mapped per token type.
Example mapping:
word → english_stem
numword → simple
email → simple
url → simple
That mapping lives inside the FTS Configuration.
Seeing dictionaries in SQL#
List dictionaries
SELECT dictname, dicttemplate
FROM pg_ts_dict;
SELECT *
FROM pg_ts_dict
WHERE dictname = 'english_stem';
Using a dictionary indirectly (real life)
You don’t call dictionaries directly.
You use them through:
to_tsvector('english', text)
to_tsquery('english', query)
The configuration decides which dictionary is used.
Typical real-world setups#
- Exact-match search
Dictionary: simple
No stemming
Predictable behavior
- Natural language search
Dictionary: english_stem
Better recall
Slightly less precision
- Mixed content (IDs, emails, text)
simple for structured tokens
english_stem for words
pgAdmin view#
- In Schemas → public → FTS Dictionaries, pgAdmin shows:
Dictionary name
Template (simple, ispell, synonym, etc.)
Options
- It’s just a visual layer over:
pg_ts_dict
pg_ts_template
- Summary:
FTS Dictionaries decide what a “word” means, how it is normalized, and whether it is searchable at all.