In the previous post, we looked at the constraints that make autocomplete difficult in real Lucene-based systems: large indices, high reindexing costs, tight latency budgets, and partial user input.

Given those constraints, the most important design decision is where correction logic should live.

This post describes a query-time architecture for typo-tolerant autocomplete that works around existing indices rather than modifying them.


Why Query-Time Instead of Index-Time

Index-time solutions optimize query execution by pushing complexity into preprocessing.
That trade-off only works when reindexing is cheap.

When reindexing is expensive or impossible, the balance shifts:

  • Index-time complexity becomes operational risk
  • Query-time complexity becomes acceptable—if it is tightly bounded

Autocomplete is a good candidate for this trade-off because:

  • Queries are short
  • Corrections can be constrained
  • Results are approximate by nature

High-Level Architecture

At a high level, the system introduces a lightweight preprocessing step before queries reach the search engine.

Conceptually:

  1. User types a partial query
  2. The query is analyzed and tokenized
  3. Each token is validated or corrected
  4. A rewritten query is sent to Lucene
  5. Lucene executes a standard prefix query

The key idea is that Lucene remains unchanged.
All correction logic lives outside the index.


Separating Vocabulary from Indexed Terms

A critical architectural choice is to decouple correction vocabulary from indexed data.

Rather than enumerating terms from the inverted index:

  • An external vocabulary is maintained
  • This vocabulary represents possible user input, not indexed terms

This has several advantages:

  • Vocabulary can be curated independently
  • Domain-specific terms can be added without reindexing
  • Correction logic is not tied to index internals

Lucene continues to do what it does best: efficient term lookup and document retrieval.


Prefix-Aware Correction as a Design Principle

Autocomplete input is incomplete by definition.

The architecture treats partial tokens differently from completed ones:

  • Completed tokens are matched conservatively
  • The last token is treated as a prefix candidate
  • Correction is applied with prefix-awareness

This avoids two common failure modes:

  • Correcting words the user hasn’t finished typing
  • Generating suggestions that cannot be matched as prefixes

The system does not attempt to “guess the final word”—it only tries to make the current prefix usable.


Candidate Generation, Not Prediction

It’s important to distinguish this approach from suggestion systems.

The goal is not to predict what the user intends to type.
The goal is to generate valid prefix candidates that Lucene can match.

This keeps the system:

  • deterministic
  • explainable
  • easy to reason about

Every generated candidate must be:

  • a plausible correction
  • a valid prefix
  • compatible with standard prefix queries

Controlled Query Expansion

Once correction candidates are generated, the query is rewritten using a bounded OR expansion.

For example:

  • One token produces a small set of corrected prefixes
  • These are combined into a single prefix clause
  • Other tokens remain unchanged or minimally corrected

This ensures:

  • predictable query size
  • bounded execution cost
  • stable latency characteristics

Unbounded expansion is explicitly avoided.


Integration with Existing Search Pipelines

Because the output is a standard Lucene query:

  • No analyzer changes are required
  • No custom index structures are needed
  • Existing ranking logic remains intact

From Lucene’s perspective, nothing special is happening—it is simply executing a prefix query.

This makes the approach:

  • easy to integrate
  • easy to roll back
  • compatible with existing monitoring and tuning practices

What This Architecture Optimizes For

This design optimizes for:

  • operational safety
  • backward compatibility
  • predictable performance

It deliberately does not optimize for:

  • maximum recall
  • linguistic completeness
  • language-specific intelligence

Those trade-offs are intentional and discussed later in this series.


Architecture as a Constraint-Driven Choice

This approach is not universally applicable.

It exists because:

  • reindexing is expensive
  • autocomplete must remain fast
  • existing infrastructure must remain untouched

Once those constraints are accepted, query-time correction becomes a natural architectural choice.

In the next post, we’ll zoom into one of the core building blocks used to make this feasible:
a prefix-oriented vocabulary structure that supports fast candidate generation.


Next: Using Tries for Prefix-Aware Typo Correction