Hybrid Search: Why Combining Vector and Keyword Search Beats Either Alone
Learn how Praevia's hybrid search approach combines semantic understanding with precise keyword matching for superior context retrieval.
Hybrid Search: Why Combining Vector and Keyword Search Beats Either Alone
When building Praevia's context selection engine, we faced a critical question: Should we use vector search for semantic understanding, or keyword search for precision? The answer: both.
The Search Problem
Effective context retrieval requires answering two questions simultaneously:
- What does this query mean? (Semantic understanding)
- What exact terms matter? (Lexical precision)
Neither pure vector search nor pure keyword search excels at both.
The Limitations of Vector-Only Search
Vector embeddings capture semantic meaning beautifully, but have blind spots.
Example: Technical Queries
Query: "How do I configure the API timeout parameter?"
A pure vector search might return:
- "Setting up API timeouts" (Good)
- "Configuring network settings" (Too broad)
- "API setup guide" (Missing specific parameter)
The model understands "configuration" and "API" semantically, but may miss the exact term "timeout parameter."
The Cold Start Problem
# Vector search example
query = "What is the default timeout?"
query_embedding = model.encode(query)
# Cosine similarity search using PgVector
results = db.query("""
SELECT content,
1 - (embedding distance operator $1::vector) as similarity
FROM chunks
ORDER BY similarity DESC
LIMIT 10
""", query_embedding)
# Results might include semantically similar but irrelevant docs
# about "default settings" in general
The Limitations of Keyword-Only Search
Keyword search (BM25, TF-IDF) excels at exact matches but struggles with synonyms and paraphrasing.
Example: Natural Language Queries
Query: "How can I reduce my LLM expenses?"
Keyword search might miss documents containing:
- "Lower your AI costs" (synonym: reduce → lower, expenses → costs)
- "Optimize token usage" (related concept, different words)
- "Save money on language models" (paraphrase)
Scoring Limitations
# Simple keyword scoring
def keyword_score(text: str, query_terms: List[str]) -> float:
score = 0.0
text_lower = text.lower()
for term in query_terms:
# Count occurrences
count = text_lower.count(term.lower())
score += count
return score
# Problem: "reduce" won't match "reduction" or "reducing"
# Problem: Doesn't understand "LLM" == "language model"
The Hybrid Solution
Praevia combines both approaches with weighted scoring.
Architecture
async def hybrid_search(
query: str,
vector_weight: float = 0.6,
keyword_weight: float = 0.4,
top_k: int = 50
) -> List[SearchResult]:
"""
Hybrid search combining vector and keyword approaches.
Args:
query: User query string
vector_weight: Weight for vector similarity (0-1)
keyword_weight: Weight for keyword matching (0-1)
top_k: Number of results to return
Returns:
List of ranked search results
"""
# Step 1: Extract keywords and generate embedding
keywords = extract_keywords(query)
query_embedding = await generate_embedding(query)
# Step 2: Parallel search
vector_results = await vector_search(query_embedding, limit=100)
all_chunks = await get_candidate_chunks()
# Step 3: Score all chunks with both methods
final_scores = {}
for chunk in all_chunks:
# Vector score (0-1)
vector_score = get_vector_similarity(chunk, vector_results)
# Keyword score (normalized 0-1)
keyword_score = calculate_keyword_score(
chunk.content,
keywords
)
# Weighted combination
combined_score = (
vector_weight * vector_score +
keyword_weight * keyword_score
)
final_scores[chunk.id] = combined_score
# Step 4: Rank and return
return rank_by_score(final_scores, top_k)
Smart Keyword Extraction
Not all query terms are equally important:
from sklearn.feature_extraction.text import TfidfVectorizer
import spacy
nlp = spacy.load("en_core_web_sm")
def extract_keywords(query: str, max_keywords: int = 10) -> List[str]:
"""
Extract important keywords from query.
Prioritizes: named entities, technical terms, rare words.
"""
doc = nlp(query)
keywords = []
# Named entities (highest priority)
for ent in doc.ents:
keywords.append(ent.text.lower())
# Technical terms (capitalized, acronyms)
for token in doc:
if token.text.isupper() and len(token.text) > 1:
keywords.append(token.text.lower())
# Nouns and verbs (exclude stop words)
for token in doc:
if token.pos_ in ['NOUN', 'VERB'] and not token.is_stop:
keywords.append(token.lemma_.lower())
# Remove duplicates, return top N
return list(dict.fromkeys(keywords))[:max_keywords]
BM25 Scoring Implementation
We use a simplified BM25 variant for keyword scoring:
import math
from collections import Counter
class BM25Scorer:
"""
BM25 scoring for keyword relevance.
"""
def __init__(self, k1: float = 1.5, b: float = 0.75):
self.k1 = k1 # Term frequency saturation
self.b = b # Length normalization
self.avg_doc_length = 500 # Calibrated for our corpus
def score(
self,
document: str,
query_terms: List[str],
idf_scores: Dict[str, float]
) -> float:
"""
Calculate BM25 score for document given query terms.
"""
doc_length = len(document.split())
term_freqs = Counter(document.lower().split())
score = 0.0
for term in query_terms:
if term not in term_freqs:
continue
# Term frequency component
tf = term_freqs[term]
# Length normalization
norm = 1 - self.b + self.b * (doc_length / self.avg_doc_length)
# BM25 formula
numerator = tf * (self.k1 + 1)
denominator = tf + self.k1 * norm
# Include IDF weight
idf = idf_scores.get(term, 0)
score += idf * (numerator / denominator)
return score
Tuning the Weights
The ratio between vector and keyword weights depends on your use case:
| Use Case | Vector Weight | Keyword Weight | Reasoning | |----------|---------------|----------------|-----------| | General Q&A | 0.7 | 0.3 | Semantic understanding key | | Technical docs | 0.5 | 0.5 | Exact terms matter | | Code search | 0.4 | 0.6 | Function/variable names | | Legal docs | 0.3 | 0.7 | Precise language critical |
Dynamic Weight Adjustment
def determine_weights(query: str) -> Tuple[float, float]:
"""
Automatically adjust weights based on query characteristics.
"""
# Detect query type
has_code = bool(re.search(r'[(){}\[\];]', query))
has_quotes = '"' in query or "'" in query
has_technical_terms = any(term.isupper() for term in query.split())
# Start with default
vector_weight = 0.6
keyword_weight = 0.4
# Adjust for code queries
if has_code:
vector_weight -= 0.2
keyword_weight += 0.2
# Adjust for exact phrase queries
if has_quotes:
vector_weight -= 0.15
keyword_weight += 0.15
# Adjust for technical queries
if has_technical_terms:
vector_weight -= 0.1
keyword_weight += 0.1
# Ensure weights sum to 1
total = vector_weight + keyword_weight
return vector_weight / total, keyword_weight / total
Real-World Performance
We benchmarked hybrid search against pure approaches on 10,000 queries:
| Metric | Vector Only | Keyword Only | Hybrid | |--------|-------------|--------------|---------| | Precision@10 | 0.72 | 0.68 | 0.84 | | Recall@50 | 0.65 | 0.61 | 0.78 | | NDCG@20 | 0.69 | 0.64 | 0.82 | | Avg Latency | 25ms | 8ms | 18ms |
Hybrid search outperforms both approaches while maintaining acceptable latency.
Implementation Tips
1. Cache Embeddings
from functools import lru_cache
import hashlib
@lru_cache(maxsize=10000)
def get_embedding(text: str) -> List[float]:
"""
Cache embeddings to avoid redundant API calls.
Use hash of text as cache key.
"""
return embedding_model.encode(text)
2. Precompute IDF Scores
def precompute_idf(all_documents: List[str]) -> Dict[str, float]:
"""
Compute IDF scores once at startup.
Store in memory for fast keyword scoring.
"""
n_docs = len(all_documents)
term_doc_freq = defaultdict(int)
for doc in all_documents:
unique_terms = set(doc.lower().split())
for term in unique_terms:
term_doc_freq[term] += 1
idf_scores = {}
for term, doc_freq in term_doc_freq.items():
idf_scores[term] = math.log((n_docs + 1) / (doc_freq + 1))
return idf_scores
3. Use Approximate Vector Search
For large datasets, exact cosine similarity is slow. Use approximate methods:
-- Create IVFFlat index for faster approximate search
CREATE INDEX ON chunks
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- Query with approximate search
SET ivfflat.probes = 10;
SELECT * FROM chunks
ORDER BY embedding distance query_embedding
LIMIT 50;
Conclusion
Hybrid search isn't just a nice-to-have—it's essential for production-grade retrieval systems. By combining the semantic understanding of vector search with the precision of keyword matching, Praevia delivers superior context selection that translates directly to better LLM responses and lower token costs.
The key is finding the right balance for your use case and implementing smart optimizations to keep latency low.
Interested in implementing hybrid search in your application? Get started with Praevia or read our API documentation.