đ Series: Typo-Tolerant Autocomplete
In the previous post, we looked at the constraints that make autocomplete difficult in real Lucene-based systems: large indices, high reindexing costs, tight latency budgets, and partial user input.
Given those constraints, the most important design decision is where correction logic should live.
This post describes a query-time architecture for typo-tolerant autocomplete that works around existing indices rather than modifying them.
Why Query-Time Instead of Index-Time
Index-time solutions optimize query execution by pushing complexity into preprocessing.
That trade-off only works when reindexing is cheap.
When reindexing is expensive or impossible, the balance shifts:
- Index-time complexity becomes operational risk
- Query-time complexity becomes acceptableâif it is tightly bounded
Autocomplete is a good candidate for this trade-off because:
- Queries are short
- Corrections can be constrained
- Results are approximate by nature
High-Level Architecture
At a high level, the system introduces a lightweight preprocessing step before queries reach the search engine.
Conceptually:
- User types a partial query
- The query is analyzed and tokenized
- Each token is validated or corrected
- A rewritten query is sent to Lucene
- Lucene executes a standard prefix query
The key idea is that Lucene remains unchanged.
All correction logic lives outside the index.
Separating Vocabulary from Indexed Terms
A critical architectural choice is to decouple correction vocabulary from indexed data.
Rather than enumerating terms from the inverted index:
- An external vocabulary is maintained
- This vocabulary represents possible user input, not indexed terms
This has several advantages:
- Vocabulary can be curated independently
- Domain-specific terms can be added without reindexing
- Correction logic is not tied to index internals
Lucene continues to do what it does best: efficient term lookup and document retrieval.
Prefix-Aware Correction as a Design Principle
Autocomplete input is incomplete by definition.
The architecture treats partial tokens differently from completed ones:
- Completed tokens are matched conservatively
- The last token is treated as a prefix candidate
- Correction is applied with prefix-awareness
This avoids two common failure modes:
- Correcting words the user hasnât finished typing
- Generating suggestions that cannot be matched as prefixes
The system does not attempt to âguess the final wordââit only tries to make the current prefix usable.
Candidate Generation, Not Prediction
Itâs important to distinguish this approach from suggestion systems.
The goal is not to predict what the user intends to type.
The goal is to generate valid prefix candidates that Lucene can match.
This keeps the system:
- deterministic
- explainable
- easy to reason about
Every generated candidate must be:
- a plausible correction
- a valid prefix
- compatible with standard prefix queries
Controlled Query Expansion
Once correction candidates are generated, the query is rewritten using a bounded OR expansion.
For example:
- One token produces a small set of corrected prefixes
- These are combined into a single prefix clause
- Other tokens remain unchanged or minimally corrected
This ensures:
- predictable query size
- bounded execution cost
- stable latency characteristics
Unbounded expansion is explicitly avoided.
Integration with Existing Search Pipelines
Because the output is a standard Lucene query:
- No analyzer changes are required
- No custom index structures are needed
- Existing ranking logic remains intact
From Luceneâs perspective, nothing special is happeningâit is simply executing a prefix query.
This makes the approach:
- easy to integrate
- easy to roll back
- compatible with existing monitoring and tuning practices
What This Architecture Optimizes For
This design optimizes for:
- operational safety
- backward compatibility
- predictable performance
It deliberately does not optimize for:
- maximum recall
- linguistic completeness
- language-specific intelligence
Those trade-offs are intentional and discussed later in this series.
Architecture as a Constraint-Driven Choice
This approach is not universally applicable.
It exists because:
- reindexing is expensive
- autocomplete must remain fast
- existing infrastructure must remain untouched
Once those constraints are accepted, query-time correction becomes a natural architectural choice.
In the next post, weâll zoom into one of the core building blocks used to make this feasible:
a prefix-oriented vocabulary structure that supports fast candidate generation.
Next: Using Tries for Prefix-Aware Typo Correction