Why I Built This: When “Good Search” Still Feels Wrong
Most enterprise search stacks do a decent job at matching words. OpenSearch (and similar engines) commonly rely on lexical ranking families like TF‑IDF/BM25. These are strong baselines, but they tend to miss something important: user-specific intent.
Three failure modes show up repeatedly in real systems:
- Ambiguity: the same term can mean different things to different users (or even the same user on different days).
- Intent drift: users’ interests change over time; yesterday’s context matters.
- One-size-fits-all ranking: a uniform ranking is applied across all users even though patterns differ by role, team, or experience level.
A typical reaction is: “Let’s use embeddings” (dense retrieval, sentence transformers, vector databases, LTR models). That can work—but it also brings cost, latency, new infra, and sometimes reduced explainability.
This case study explores a different approach:
A lightweight, explainable personalization reranker that learns from a user’s own query history using hash-based word correlation vectors—no pretraining, no embeddings, and no external vector store.
Design Goals & Constraints
This architecture was driven by very practical constraints:
- Low memory per user: must scale to many users without storing full query logs.
- Low operational complexity: avoid adding heavy ML services or vector DBs.
- Explainability: keep a clear “why did this rank higher?” story.
- Compatibility: should work as a reranking layer on top of standard OpenSearch results.
- Incremental learning: adapt in real time as users search, not through offline training.
High-Level Architecture
At a high level, the system wraps around OpenSearch like this:
- User submits a query
- The system updates a per-user word map (a compact model of word-to-word correlations)
- OpenSearch returns top‑N results using its standard relevance score
- We compute a similarity score between:
- the user’s current query context, and
- each candidate result title (or snippet)
- We blend the similarity with the OpenSearch score and re-rank.
Visually:
- Learning path (write): Query → tokenize → hash → update sparse vectors
- Serving path (read): Query + Top‑N results → vectorize → cosine similarity → blend scores → re-rank
Data Model: The Per-User Word Store
The core state is a per-user map:
Map<String, float[]> userWordMap- Key: word (or hashed representation of the word)
- Value: a fixed-size float vector representing correlations to other words.
Each vector is intentionally:
- Fixed-size (bounded memory)
- Sparse (most entries stay 0)
- Incrementally updated on every query.
Vector Interpretation
Think of each word as having a “memory” of other words it has appeared near in the user’s searches.
- Vector index = hash slot of a co-occurring word
- Vector value = strength of correlation (based on proximity within queries)
This is not an embedding in the neural sense. It’s closer to a user-personalized, hashed co-occurrence graph stored as sparse arrays.
Step 1 — Tokenization and Hashing
When a query arrives, we tokenize it into words, then hash each token into a bounded range:
int hash(String word) {
return abs(MurmurHash3(word) % VECTOR_SIZE);
}
The disclosure example uses a vector size such as 100 (and notes the hash values are illustrative).
Why hashing?
- Keeps memory bounded regardless of vocabulary growth
- Avoids maintaining a global dictionary
- Enables fast $O(1)$ updates into a fixed vector slot
- Trade-off: Hashing can cause collisions. In practice, you manage this by tuning vector size and observing impact.
Step 2 — Learning: Updating Word Correlation Vectors
Here’s the learning rule:
For each query, for each pair of words, update the vector of the source word at the target word’s hash slot.
The update weight is inverse distance within the query:
float weight = 1.0 / distance;
vector[targetHash] += weight;
So, closer words get stronger correlation updates.
Concrete Example (Intuition)
Query: “create oci object storage bucket”
- If “create” is 1 position away from “oci”, distance = 1 implies weight = 1.0
- If “create” is 4 positions away from “bucket”, distance = 4 implies weight = 0.25
This builds a correlation profile that captures the structure of the query, not just the bag-of-words.
Why Inverse Distance?
Inverse distance is simple and explainable:
- It encodes that adjacency matters (“object storage” is a meaningful phrase).
- It is stable and fast.
- It avoids needing any ML model training.
Step 3 — Storage Behavior Over Time (Incremental Learning)
Each time a user searches:
- New word -> create a new zero vector and start learning.
- Existing word -> update its vector “in-place”.
Over time, the userWordMap becomes a compact summary of the user’s search behavior: which words tend to appear together (and how tightly). This is what enables personalization without storing full query history.
Serving: Re-Ranking Search Results Using User Context
Learning builds the user context. Serving uses it.
Serving Step 1 — Get Candidate Results
OpenSearch returns top‑N results with its semantic/lexical score (BM25 or your stack’s scoring). We do not replace OpenSearch. We treat it as the retrieval stage and add personalization after.
Serving Step 2 — Vectorize the Current Query and Result Titles
We tokenize:
- The user’s current query
- Each result’s title (or another short text field)
Then for each word token, we look up its vector from the userWordMap. At this point, we have vector sets representing:
- Current query context
- Each candidate result’s title context
Serving Step 3 — Cosine Similarity
We compute similarity between the query context and candidate result context using cosine similarity.
Cosine similarity is a natural fit because:
- Vectors are sparse and high-dimensional.
- Direction (co-occurrence pattern) matters more than raw magnitude.
- It’s easy to implement and optimize.
Serving Step 4 — Blend with OpenSearch Score
Final ranking score is a linear blend:
finalScore = α * openSearchScore + β * similarityScore
Example values:
- α = 0.6
- β = 0.4
Then we sort by finalScore and return the re-ranked results. This blending is important:
- OpenSearch score preserves global relevance (query/document match).
- Similarity score injects user-specific context.
- α , β lets you tune how strong personalization should be.
Walkthrough Example: Why “oci object storage” Rises to the Top
The disclosure provides a scenario like:
- Query history: * “put data to oci object storage”
- “get data from oci object storage bucket”
- Later query: “storage”
OpenSearch might return generic storage results:
- “File storage encryption”
- “persistent volume storage”
- “oci object storage”
After reranking, “oci object storage” rises because the user’s history strongly correlates oci -> object -> storage.
The “Aha” Moment: Even when the query is short and ambiguous (“storage”), the system can infer that this user often means “OCI object storage”.
Memory & Performance Characteristics
This design is intentionally lightweight.
Memory Optimization
Key properties from the implementation notes:
- Only distinct words stored per user.
- No redundant storage of duplicate queries.
- Vector size is bounded and fixed.
- Updates are sparse (most entries are zeros).
This is why the approach can be attractive where embeddings + external vector stores are too heavy.
Runtime Overhead
There is overhead: cosine similarity per result. The disclosure notes this is mitigated by limiting to top‑N results.
In practice, you can also:
- Only rerank top‑K (e.g., 20–50).
- Cache vectorized titles.
- Prune low-activity users.
- Periodically decay old correlations.
Strengths: Why This Works in Real Systems
- No training pipeline: The model updates in real time from queries—no offline training loop, no model registry, no feature store.
- No external vector store: Everything is stored as small per-user sparse vectors.
- Explainable personalization: If a result rises, you can often explain it: “This user has historically searched oci with storage frequently.”
- Works with existing OpenSearch: It’s a reranker, not a replacement. That reduces adoption friction.
Limitations & Practical Considerations
No architecture is free. The disclosure highlights several realistic constraints:
- Cold start: If the user has no history, similarity score will be weak. In that case, rely mostly on OpenSearch score (α close to 1) and gradually increase β as history accumulates.
- Query quality dependency: If user queries are inconsistent or noisy, learned correlations may be noisy.
- Runtime cost: Cosine similarity for each candidate adds computation. Limiting to top‑N helps.
- Hash collisions: Hashing is bounded but can collide.
Mitigations:
- Increase vector size (within memory budget).
- Use multiple hashes (like a Bloom-style approach).
- Monitor impact with offline evaluation.