Search Strategy
Multi-strategy search implementation: semantic, keyword, facts, temporal, and hybrid approaches.
Overview
Cortex provides four primary search strategies that can be used independently or combined:
- Semantic Search - Vector similarity (meaning-based)
- Facts Search - Structured knowledge from Layer 3 (most efficient)
- Keyword Search - Full-text matching (exact words)
- Temporal Search - Recent/time-based retrieval
The recall() method (v0.24.0+) automatically combines strategies for optimal results.
Layer 3 - Most token-efficient
Layer 2 - Semantic similarity
Layer 2 - Time-based retrieval
Multi-signal scoring algorithm
Strategy 1: Semantic Search
How It Works
Uses vector embeddings and cosine similarity:
// User query
const query = "What is the user's preferred communication method?";
// Generate query embedding
const queryEmbedding = await embed(query);
// SDK vector search
const results = await cortex.vector.search(memorySpaceId, query, {
embedding: queryEmbedding,
limit: 20,
});
// Returns:
// 1. "User prefers email over phone" (score: 0.92)
// 2. "Email is best way to reach user" (score: 0.88)
// 3. "User doesn't like phone calls" (score: 0.85)
// Even though exact words don't match!
Strengths:
- Finds meaning, not just words
- Handles synonyms ("car" matches "vehicle")
- Understands context
- Works across languages (with multilingual models)
Weaknesses:
- Requires embeddings (cost + latency)
- Can miss exact matches
- Less predictable than keywords
Best for:
- Natural language queries
- Conceptual matching
- Cross-language search
- Fuzzy matching
SDK Usage
// Using the Cortex SDK
const query = "What is the user's preferred communication method?";
const embedding = await embed(query);
const results = await cortex.vector.search(memorySpaceId, query, {
embedding,
userId: "user-123",
limit: 20,
});
// Results include score field (0-1 similarity)
results.forEach((result) => {
console.log(`${result.content} (score: ${result.score})`);
});
All searches are automatically filtered by tenantId for multi-tenant isolation. This is handled at the database level via filterFields configuration.
Conceptual Example: Internal Implementation
The following shows how semantic search works internally. Use cortex.vector.search() or cortex.memory.recall() in your application code.
// Internal Convex query implementation (conceptual)
export const semanticSearch = query({
args: {
memorySpaceId: v.string(),
embedding: v.array(v.float64()),
filters: v.any(),
},
handler: async (ctx, args) => {
let q = ctx.db.query("memories").withIndex("by_embedding", (q) => {
let search = q
.similar("embedding", args.embedding, args.filters.limit || 20)
.eq("memorySpaceId", args.memorySpaceId);
// Tenant isolation (pre-filtered via filterFields)
if (args.filters.tenantId) {
search = search.eq("tenantId", args.filters.tenantId);
}
return search;
});
// Apply additional filters
if (args.filters.userId) {
q = q.filter((q) => q.eq(q.field("userId"), args.filters.userId));
}
if (args.filters.participantId) {
q = q.filter((q) =>
q.eq(q.field("participantId"), args.filters.participantId),
);
}
const results = await q.collect();
// Calculate similarity scores
return results.map((memory) => ({
...memory,
score: cosineSimilarity(memory.embedding, args.embedding),
strategy: "semantic",
}));
},
});
Strategy 2: Facts Search (Layer 3)
How It Works
Searches structured facts from Layer 3 - the most token-efficient strategy:
// User query
const query = "What is the user's communication preference?";
// SDK facts search
const facts = await cortex.facts.search(memorySpaceId, query, {
limit: 20,
});
// Returns:
// 1. "User prefers email for communication" (confidence: 95%)
// 2. "User dislikes phone calls" (confidence: 88%)
// 3. "User checks email twice daily" (confidence: 82%)
// Each fact is ~8 tokens vs 50+ tokens for raw conversation
Strengths:
- Most token-efficient (60-90% savings)
- Structured knowledge (subject-predicate-object)
- High confidence (LLM-extracted)
- Deduplication via Belief Revision System
- Enables infinite context capability
Weaknesses:
- Requires LLM for extraction (initial cost)
- May miss nuance from raw conversation
- Confidence scoring needed
Best for:
- Long-running conversations (infinite context)
- Entity-centric queries (subject-based)
- Knowledge retrieval across sessions
- Token-constrained environments
SDK Usage
// Using the Cortex SDK
const facts = await cortex.facts.search(
memorySpaceId,
"What is the user's communication preference?",
{
userId: "user-123",
factType: "preference",
minConfidence: 70,
limit: 20,
}
);
// Results include confidence scores
facts.forEach((fact) => {
console.log(`${fact.fact} (confidence: ${fact.confidence}%)`);
});
Conceptual Example: Internal Implementation
The following shows how facts search works internally. Use cortex.facts.search() or cortex.memory.recall() in your application code.
// Internal Convex query implementation (conceptual)
export const factsSearch = query({
args: {
memorySpaceId: v.string(),
query: v.string(),
filters: v.any(),
},
handler: async (ctx, args) => {
let results = await ctx.db
.query("facts")
.withSearchIndex("by_content", (q) =>
q.search("fact", args.query).eq("memorySpaceId", args.memorySpaceId),
)
.take(args.filters.limit || 20);
// Apply filters
if (args.filters.userId) {
results = results.filter((f) => f.userId === args.filters.userId);
}
if (args.filters.factType) {
results = results.filter((f) => f.factType === args.filters.factType);
}
// Filter by confidence
if (args.filters.minConfidence) {
results = results.filter(
(f) => f.confidence >= args.filters.minConfidence,
);
}
return results.map((fact) => ({
...fact,
score: fact.confidence / 100, // Confidence as score
strategy: "facts",
}));
},
});
Facts by Subject (Entity-Centric)
The following shows entity-centric fact retrieval. Use cortex.facts.search() with subject filters or cortex.memory.recall() for orchestrated search.
// Conceptual: Query facts about a specific entity
// In practice, use cortex.facts.search() with subject filters
const userFacts = await cortex.facts.search(memorySpaceId, "*", {
subject: "user-123",
limit: 100,
});
// Returns: All preferences, identities, relationships for user-123
Strategy 3: Keyword Search
How It Works
Uses Convex full-text search indexes:
// User query
const query = "password Blue";
// SDK vector search (uses keyword matching when no embedding provided)
const results = await cortex.vector.search(memorySpaceId, query, {
limit: 20,
});
// Returns:
// 1. "The password is Blue" (exact match)
// 2. "User's password: Blue123" (contains words)
// 3. "Blue is the password color" (contains both)
// Only returns if words actually appear
Strengths:
- Exact word matching
- Fast (no embedding needed)
- Predictable results
- No API costs
Weaknesses:
- Misses synonyms
- Order matters somewhat
- Doesn't understand meaning
Best for:
- Exact phrase lookup
- Technical terms
- IDs, codes, names
- When embeddings unavailable
SDK Usage
// Using the Cortex SDK
const results = await cortex.vector.search(memorySpaceId, "password Blue", {
userId: "user-123",
limit: 20,
});
// Results are relevance-ranked by Convex search index
results.forEach((result) => {
console.log(`${result.content} (score: ${result.score})`);
});
Conceptual Example: Internal Implementation
The following shows how keyword search works internally. Use cortex.vector.search() or cortex.memory.recall() in your application code.
// Internal Convex query implementation (conceptual)
export const keywordSearch = query({
args: {
memorySpaceId: v.string(),
keywords: v.string(),
filters: v.any(),
},
handler: async (ctx, args) => {
let results = await ctx.db
.query("memories")
.withSearchIndex("by_content", (q) =>
q
.search("content", args.keywords)
.eq("memorySpaceId", args.memorySpaceId),
)
.take(args.filters.limit || 20);
// Convex returns relevance-ranked results
return results.map((memory, index) => ({
...memory,
score: 1 - index / results.length, // Approximate relevance
strategy: "keyword",
}));
},
});
Strategy 4: Temporal Search
How It Works
Prioritizes recent or time-relevant memories:
// Get recent memories using recall()
const result = await cortex.memory.recall({
memorySpaceId,
query: "*", // Empty query for recent-only
sources: {
vector: { limit: 0 }, // Disable vector search
facts: { limit: 0 }, // Disable facts search
},
limit: 20,
});
// Returns most recent memories, regardless of content
// Results are automatically sorted by recency
Strengths:
- Always relevant (recent = relevant)
- Very fast (simple index)
- No embeddings needed
- Great for conversations
Weaknesses:
- Ignores content relevance
- May miss important old information
Best for:
- Conversation context (last N messages)
- Recent user interactions
- Time-sensitive information
- When recency matters more than content
SDK Usage
// Using cortex.memory.recall() for recent memories
const result = await cortex.memory.recall({
memorySpaceId: memorySpaceId,
query: "*", // Empty query for recent-only
sources: {
vector: { limit: 0 }, // Disable vector search
facts: { limit: 0 }, // Disable facts search
},
limit: 20,
});
// Results sorted by recency
result.items.forEach((item) => {
console.log(`${item.content} (created: ${new Date(item.memory?.createdAt || 0)})`);
});
Conceptual Example: Internal Implementation
The following shows how temporal search works internally. Use cortex.memory.recall() in your application code.
// Internal Convex query implementation (conceptual)
export const recentSearch = query({
args: {
memorySpaceId: v.string(),
filters: v.any(),
},
handler: async (ctx, args) => {
let q = ctx.db
.query("memories")
.withIndex("by_memorySpace_created", (q) =>
q.eq("memorySpaceId", args.memorySpaceId),
)
.order("desc");
// Filter by importance (flattened field)
if (args.filters.minImportance) {
q = q.filter((q) =>
q.gte(q.field("importance"), args.filters.minImportance),
);
}
// Filter by user
if (args.filters.userId) {
q = q.filter((q) => q.eq(q.field("userId"), args.filters.userId));
}
// Filter by participant (Hive Mode)
if (args.filters.participantId) {
q = q.filter((q) =>
q.eq(q.field("participantId"), args.filters.participantId),
);
}
const results = await q.take(args.filters.limit || 20);
return results.map((memory, index) => ({
...memory,
score: 1 - index / results.length, // Recency score
strategy: "recent",
}));
},
});
Strategy 5: Orchestrated Search (recall())
Using recall() for Multi-Strategy Search
The cortex.memory.recall() method automatically combines multiple search strategies:
// Automatic multi-strategy search
const result = await cortex.memory.recall({
memorySpaceId: memorySpaceId,
query: "What are the user's preferences?",
embedding: await embed("What are the user's preferences?"),
userId: "user-123",
limit: 10,
});
// Results include items from multiple sources
console.log(`Found ${result.items.length} items`);
console.log(`- ${result.sources.vector?.count || 0} from vector search`);
console.log(`- ${result.sources.facts?.count || 0} from facts search`);
console.log(`- ${result.sources.graph?.count || 0} from graph expansion`);
// Items are ranked using multi-signal scoring
result.items.forEach((item) => {
console.log(`${item.content} (score: ${item.score})`);
});
What recall() does automatically:
- Searches facts (Layer 3) for structured knowledge
- Searches vectors (Layer 2) for semantic matches
- Expands via graph connections for related context
- Merges and deduplicates results
- Ranks using multi-signal scoring algorithm
- Returns LLM-ready context string
Conceptual Example: Internal Hybrid Logic
The following shows how recall() combines strategies internally. Use cortex.memory.recall() directly in your application code.
// Conceptual: How recall() combines strategies internally
// In practice, use cortex.memory.recall() which handles this automatically
// 1. Search facts (most token-efficient)
const facts = await cortex.facts.search(memorySpaceId, query, {
minConfidence: 70,
limit: 20,
});
// 2. Search vectors (semantic similarity)
const vectors = await cortex.vector.search(memorySpaceId, query, {
embedding,
limit: 20,
});
// 3. recall() merges, deduplicates, and ranks automatically
const result = await cortex.memory.recall({
memorySpaceId,
query,
embedding,
});
Ranking and Scoring Algorithm
The recall() method uses a multi-signal scoring algorithm with the following weights (from resultProcessor.ts):
| Signal | Weight | Description |
|---|---|---|
| Semantic similarity | 0.35 | Vector similarity score (0-1) |
| Confidence | 0.20 | Fact confidence (0-100 → 0-1) |
| Importance | 0.15 | Memory importance (0-100 → 0-1) |
| Recency | 0.15 | Time decay (exponential, 30-day half-life) |
| Graph connectivity | 0.15 | Number of connected entities (logarithmic) |
Additional boosts:
- Highly connected items (>3 entities): ×1.2 multiplier
- User messages: ×1.1 multiplier (more likely to contain preferences)
Conceptual Example: Scoring Implementation
The following shows how scoring works internally. cortex.memory.recall() applies this automatically.
// Conceptual: Multi-signal scoring algorithm (from resultProcessor.ts)
const RANKING_WEIGHTS = {
semantic: 0.35, // Vector similarity
confidence: 0.20, // Fact confidence
importance: 0.15, // Memory importance
recency: 0.15, // Time decay
graphConnectivity: 0.15, // Graph connections
};
function calculateScore(item: RecallItem): number {
let score = 0;
// Base semantic score
score += (item.score || 0.5) * RANKING_WEIGHTS.semantic;
// Confidence (facts) or default (memories)
const confidence = item.type === "fact"
? item.fact!.confidence / 100
: 0.8;
score += confidence * RANKING_WEIGHTS.confidence;
// Importance
const importance = item.type === "memory"
? item.memory!.importance / 100
: item.fact!.confidence / 100; // Use confidence as proxy for facts
score += importance * RANKING_WEIGHTS.importance;
// Recency (exponential decay, 30-day half-life)
const age = Date.now() - (item.memory?.createdAt || item.fact?.createdAt || Date.now());
const recencyScore = Math.pow(2, -age / (30 * 24 * 60 * 60 * 1000));
score += recencyScore * RANKING_WEIGHTS.recency;
// Graph connectivity (logarithmic scale)
const connectedEntities = item.graphContext?.connectedEntities || [];
const connectivityScore = Math.min(1.0, Math.log2(connectedEntities.length + 1) / Math.log2(11));
score += connectivityScore * RANKING_WEIGHTS.graphConnectivity;
// Boosts
if (connectedEntities.length > 3) {
score *= 1.2; // Highly connected boost
}
if (item.type === "memory" && item.memory?.messageRole === "user") {
score *= 1.1; // User message boost
}
return Math.min(1.0, Math.max(0, score));
}
Filter Application
Pre-Filtering (Convex Index)
Applied BEFORE search for efficiency. Tenant isolation is automatically handled:
// Pre-filtered (fast) - Conceptual example
await ctx.db
.query("memories")
.withIndex(
"by_embedding",
(q) =>
q
.similar("embedding", vector, 20)
.eq("memorySpaceId", memorySpaceId) // ← Pre-filter
.eq("tenantId", tenantId) // ← Automatic tenant isolation
.eq("userId", userId), // ← Pre-filter
)
.collect();
// Only searches vectors for this memorySpace+tenant+user
// filterFields: ["memorySpaceId", "tenantId", "userId"] enables this
All searches automatically filter by tenantId at the database level. This ensures complete data isolation between tenants. The SDK handles this transparently - you don't need to pass tenantId explicitly.
Post-Filtering (Application Logic)
Applied AFTER search for complex logic that can't be expressed in indexes:
// Post-filter (after vector search) - Conceptual example
const results = await ctx.db
.query("memories")
.withIndex("by_embedding", (q) =>
q
.similar("embedding", vector, 50) // Get more results
.eq("memorySpaceId", memorySpaceId),
)
.collect();
// Complex filters in code
const filtered = results.filter((memory) => {
// Importance range
if (memory.importance < 50 || memory.importance > 90) {
return false;
}
// Tag matching (AND logic)
if (!memory.metadata?.tags?.includes("password")) {
return false;
}
// Date range
if (memory.createdAt < someDate) {
return false;
}
// Custom logic
if (memory.metadata?.customField !== "value") {
return false;
}
return true;
});
return filtered.slice(0, 10);
Rule: Use pre-filtering when possible, post-filtering for complex logic. The SDK automatically applies tenant isolation via pre-filtering.
Universal Filters Implementation
Filter Processing Pipeline
function applyUniversalFilters(
results: MemoryEntry[],
filters: UniversalFilters,
): MemoryEntry[] {
let filtered = results;
// userId (usually pre-filtered)
if (filters.userId) {
filtered = filtered.filter((m) => m.userId === filters.userId);
}
// Tags (any or all)
if (filters.tags && filters.tags.length > 0) {
if (filters.tagMatch === "all") {
// Must have ALL tags
filtered = filtered.filter((m) =>
filters.tags.every((tag) => m.metadata.tags.includes(tag)),
);
} else {
// Must have ANY tag (default)
filtered = filtered.filter((m) =>
filters.tags.some((tag) => m.metadata.tags.includes(tag)),
);
}
}
// Importance range
if (filters.importance) {
if (typeof filters.importance === "number") {
filtered = filtered.filter(
(m) => m.metadata.importance === filters.importance,
);
} else {
// RangeQuery
if (filters.importance.$gte !== undefined) {
filtered = filtered.filter(
(m) => m.metadata.importance >= filters.importance.$gte,
);
}
if (filters.importance.$lte !== undefined) {
filtered = filtered.filter(
(m) => m.metadata.importance <= filters.importance.$lte,
);
}
}
}
// Date ranges
if (filters.createdAfter) {
filtered = filtered.filter(
(m) => m.createdAt >= filters.createdAfter.getTime(),
);
}
if (filters.createdBefore) {
filtered = filtered.filter(
(m) => m.createdAt <= filters.createdBefore.getTime(),
);
}
// Source type
if (filters["source.type"]) {
filtered = filtered.filter((m) => m.source.type === filters["source.type"]);
}
// Access patterns
if (filters.accessCount) {
filtered = applyRangeFilter(filtered, "accessCount", filters.accessCount);
}
return filtered;
}
Ranking and Scoring
Base Similarity Score
function cosineSimilarity(a: number[], b: number[]): number {
let dotProduct = 0;
let normA = 0;
let normB = 0;
for (let i = 0; i < a.length; i++) {
dotProduct += a[i] * b[i];
normA += a[i] * a[i];
normB += b[i] * b[i];
}
return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
}
// Returns 0-1 (1 = identical, 0 = opposite)
Multi-Factor Ranking
The following shows simplified scoring logic. The actual implementation in resultProcessor.ts uses the weights shown in the Ranking and Scoring Algorithm section above.
// Conceptual: Simplified scoring (actual weights differ)
function calculateFinalScore(
memory: MemoryEntry,
baseScore: number,
options: SearchOptions,
): number {
let score = baseScore;
// Factor 1: Importance (weight: 0.15 in actual implementation)
if (options.boostImportance) {
const importanceFactor = memory.importance / 100;
score = score * 0.85 + importanceFactor * 0.15;
}
// Factor 2: Recency (weight: 0.15, exponential decay)
if (options.boostRecent) {
const age = Date.now() - memory.createdAt;
const halfLife = 30 * 24 * 60 * 60 * 1000; // 30 days
const recencyFactor = Math.pow(2, -age / halfLife);
score = score * 0.85 + recencyFactor * 0.15;
}
// Factor 3: Graph connectivity (weight: 0.15 in actual implementation)
// This is handled automatically by recall() when graph expansion is enabled
return score;
}
Strategy Selection Logic
Using recall() for Automatic Strategy Selection
The cortex.memory.recall() method automatically selects and combines strategies:
// Automatic strategy selection via recall()
const result = await cortex.memory.recall({
memorySpaceId: memorySpaceId,
query: "user preferences",
embedding: await embed("user preferences"), // Optional: enables semantic search
userId: "user-123",
limit: 10,
});
// recall() automatically:
// 1. Searches facts (always, most efficient)
// 2. Searches vectors (if embedding provided)
// 3. Expands via graph (if enabled)
// 4. Merges, deduplicates, and ranks results
Conceptual Example: Strategy Selection Logic
The following shows how strategy selection works internally. Use cortex.memory.recall() directly.
// Conceptual: How recall() selects strategies internally
// In practice, use cortex.memory.recall() which handles this automatically
async function recall(memorySpaceId: string, query: string, options: any) {
const results: RecallItem[] = [];
// Always search facts (most token-efficient)
const facts = await cortex.facts.search(memorySpaceId, query, options);
results.push(...facts.map(f => factToRecallItem(f, "facts")));
// Search vectors if embedding provided
if (options.embedding) {
const vectors = await cortex.vector.search(memorySpaceId, query, options);
results.push(...vectors.map(v => memoryToRecallItem(v, "vector")));
}
// Graph expansion (if enabled)
if (options.enableGraph) {
// ... graph expansion logic
}
// Merge, deduplicate, and rank
return processRecallResults(results);
}
Search Optimization Techniques
1. Limit Before Filter
The following shows optimization techniques. The SDK handles these optimizations automatically.
// Slow: Filter all, then limit (conceptual)
const all = await ctx.db.query("memories").collect();
const filtered = all.filter((m) => m.importance >= 80);
const limited = filtered.slice(0, 10);
// Fast: Limit during query (conceptual)
const results = await ctx.db
.query("memories")
.withIndex("by_memorySpace", (q) => q.eq("memorySpaceId", memorySpaceId))
.filter((q) => q.gte(q.field("importance"), 80))
.take(10); // ← Stop after 10 matches
2. Use Specific Indexes
// Generic index (conceptual)
await ctx.db
.query("memories")
.withIndex("by_memorySpace", (q) => q.eq("memorySpaceId", memorySpaceId))
.filter((q) => q.eq(q.field("source.type"), "a2a"))
.collect();
// Specific compound index (conceptual)
await ctx.db
.query("memories")
.withIndex("by_memorySpace_source", (q) =>
q.eq("memorySpaceId", memorySpaceId).eq("source.type", "a2a"),
)
.collect();
3. Cache Frequent Queries
const searchCache = new Map<string, { results: any[]; timestamp: number }>();
export const cachedSearch = query({
handler: async (ctx, args) => {
const cacheKey = JSON.stringify(args);
const cached = searchCache.get(cacheKey);
// Cache for 60 seconds
if (cached && Date.now() - cached.timestamp < 60000) {
return cached.results;
}
// Execute search
const results = await doSearch(ctx, args);
// Cache results
searchCache.set(cacheKey, {
results,
timestamp: Date.now(),
});
return results;
},
});
Real-World Search Patterns
Pattern 1: Conversational Context
// Get recent relevant context for conversation
async function getConversationContext(
memorySpaceId: string,
userId: string,
currentMessage: string,
) {
const embedding = await embed(currentMessage);
const result = await cortex.memory.recall({
memorySpaceId,
query: currentMessage,
embedding,
userId, // User-specific
importance: { $gte: 50 }, // Skip trivial
limit: 5,
});
return result.items;
}
Pattern 2: Knowledge Retrieval
// Find best answer from knowledge base
async function findAnswer(memorySpaceId: string, query: string) {
const embedding = await embed(query);
const result = await cortex.memory.recall({
memorySpaceId,
query,
embedding,
importance: { $gte: 70 }, // Only important articles
tags: ["kb-article"],
limit: 3,
});
return result.items;
}
Pattern 3: Multi-Criteria Search
// Complex search with multiple criteria
async function advancedSearch(memorySpaceId: string, criteria: any) {
const embedding = await embed(criteria.query);
const result = await cortex.memory.recall({
memorySpaceId,
query: criteria.query,
embedding,
userId: criteria.userId,
tags: criteria.tags,
tagMatch: "all", // Must have all tags
importance: { $gte: criteria.minImportance },
createdAfter: criteria.since,
"source.type": criteria.sourceType,
limit: criteria.limit,
});
return result.items;
}
Fallback Strategies
Graceful Degradation
async function searchWithFallback(
memorySpaceId: string,
query: string,
embedding?: number[],
) {
try {
// Use recall() which automatically combines strategies
const result = await cortex.memory.recall({
memorySpaceId,
query: query || "*", // Empty query falls back to recent
embedding, // Optional: enables semantic search if provided
limit: 10,
importance: { $gte: 50 },
});
if (result.items.length > 0) {
return result.items;
}
// Fall back to recent only
const recentResult = await cortex.memory.recall({
memorySpaceId,
query: "*",
sources: {
vector: { limit: 0 }, // Disable vector search
facts: { limit: 0 }, // Disable facts search
},
limit: 10,
importance: { $gte: 50 },
});
return recentResult.items;
} catch (error) {
console.error("All search strategies failed:", error);
return [];
}
}
Performance Benchmarks
Search Latency by Strategy
| Strategy | Index Used | Typical Latency | Dataset Size Impact |
|---|---|---|---|
| Semantic | Vector index | 50-100ms | Logarithmic (with filterFields) |
| Keyword | Search index | 20-50ms | Logarithmic |
| Recent | Regular index | 10-20ms | Logarithmic |
| Hybrid | Multiple | 100-200ms | Logarithmic (parallel) |
Throughput
Queries per second (estimated):
- Semantic: 50-100 QPS (embedding generation is bottleneck)
- Keyword: 200-500 QPS (no embedding needed)
- Recent: 500-1000 QPS (simple index lookup)
Optimizations:
- Cache query embeddings (5-10× faster)
- Batch embedding generation
- Use Convex reactive queries (cached by Convex)
Next Steps
- Context Chain Design - Context propagation architecture
- Performance - Optimization techniques
- Semantic Search Guide - Usage patterns
Questions? Ask in GitHub Discussions.