A Query-Time Approach to Typo-Tolerant Autocomplete

📚 Series: Typo-Tolerant Autocomplete

Part 1Autocomplete in Search : Design Constraints When You Can’t Reindex

Part 2Using Tries for Prefix-Aware Typo Correction

Part 3Bounding Typo Correction with Edit Distance

Part 4A Query-Time Approach to Typo-Tolerant Autocomplete (current)

← Previous: Bounding Typo Correction with Edit Distance

In the previous post, we looked at the constraints that make autocomplete difficult in real Lucene-based systems: large indices, high reindexing costs, tight latency budgets, and partial user input.

Given those constraints, the most important design decision is where correction logic should live.

This post describes a query-time architecture for typo-tolerant autocomplete that works around existing indices rather than modifying them.

Why Query-Time Instead of Index-Time

Index-time solutions optimize query execution by pushing complexity into preprocessing.
That trade-off only works when reindexing is cheap.

When reindexing is expensive or impossible, the balance shifts:

Index-time complexity becomes operational risk
Query-time complexity becomes acceptable—if it is tightly bounded

Autocomplete is a good candidate for this trade-off because:

Queries are short
Corrections can be constrained
Results are approximate by nature

High-Level Architecture

At a high level, the system introduces a lightweight preprocessing step before queries reach the search engine.

Conceptually:

User types a partial query
The query is analyzed and tokenized
Each token is validated or corrected
A rewritten query is sent to Lucene
Lucene executes a standard prefix query

The key idea is that Lucene remains unchanged.
All correction logic lives outside the index.

Separating Vocabulary from Indexed Terms

A critical architectural choice is to decouple correction vocabulary from indexed data.

Rather than enumerating terms from the inverted index:

An external vocabulary is maintained
This vocabulary represents possible user input, not indexed terms

This has several advantages:

Vocabulary can be curated independently
Domain-specific terms can be added without reindexing
Correction logic is not tied to index internals

Lucene continues to do what it does best: efficient term lookup and document retrieval.

Prefix-Aware Correction as a Design Principle

Autocomplete input is incomplete by definition.

The architecture treats partial tokens differently from completed ones:

Completed tokens are matched conservatively
The last token is treated as a prefix candidate
Correction is applied with prefix-awareness

This avoids two common failure modes:

Correcting words the user hasn’t finished typing
Generating suggestions that cannot be matched as prefixes

The system does not attempt to “guess the final word”—it only tries to make the current prefix usable.

Candidate Generation, Not Prediction

It’s important to distinguish this approach from suggestion systems.

The goal is not to predict what the user intends to type.
The goal is to generate valid prefix candidates that Lucene can match.

This keeps the system:

deterministic
explainable
easy to reason about

Every generated candidate must be:

a plausible correction
a valid prefix
compatible with standard prefix queries

Controlled Query Expansion

Once correction candidates are generated, the query is rewritten using a bounded OR expansion.

For example:

One token produces a small set of corrected prefixes
These are combined into a single prefix clause
Other tokens remain unchanged or minimally corrected

This ensures:

predictable query size
bounded execution cost
stable latency characteristics

Unbounded expansion is explicitly avoided.

Integration with Existing Search Pipelines

Because the output is a standard Lucene query:

No analyzer changes are required
No custom index structures are needed
Existing ranking logic remains intact

From Lucene’s perspective, nothing special is happening—it is simply executing a prefix query.

This makes the approach:

easy to integrate
easy to roll back
compatible with existing monitoring and tuning practices

What This Architecture Optimizes For

This design optimizes for:

operational safety
backward compatibility
predictable performance

It deliberately does not optimize for:

maximum recall
linguistic completeness
language-specific intelligence

Those trade-offs are intentional and discussed later in this series.

Architecture as a Constraint-Driven Choice

This approach is not universally applicable.

It exists because:

reindexing is expensive
autocomplete must remain fast
existing infrastructure must remain untouched

Once those constraints are accepted, query-time correction becomes a natural architectural choice.

In the next post, we’ll zoom into one of the core building blocks used to make this feasible:
a prefix-oriented vocabulary structure that supports fast candidate generation.

Next: Using Tries for Prefix-Aware Typo Correction

📚 Series: Typo-Tolerant Autocomplete

Why Query-Time Instead of Index-Time#

High-Level Architecture#

Separating Vocabulary from Indexed Terms#

Prefix-Aware Correction as a Design Principle#

Candidate Generation, Not Prediction#

Controlled Query Expansion#

Integration with Existing Search Pipelines#

What This Architecture Optimizes For#

Architecture as a Constraint-Driven Choice#