Skip to main content

Performance

Info
Last Updated: 2026-01-08

Optimization techniques, scaling strategies, resilience layer, and performance characteristics of Cortex on Convex.

Overview

Cortex is designed for high performance at scale with built-in resilience. With proper indexing, query patterns, and the resilience layer, Cortex handles millions of memories, thousands of memory spaces, and hundreds of concurrent users efficiently.

Performance Characteristics:

Info
These are target performance characteristics based on Convex platform capabilities and theoretical estimates. Actual performance may vary based on data size, query patterns, and system load.
  • Read operations: < 100ms (p95) - target
  • Write operations: < 50ms (p95) - target
  • Vector search: < 100ms for millions of vectors - target
  • Facts search: < 50ms for millions of facts - target
  • Concurrent operations: 16 (Starter) / 64 (Professional preset, max 256)
  • Storage: Unlimited (pay-per-GB)
  • Resilience: Built-in rate limiting, circuit breaker, concurrency control

Resilience Layer (v0.16.0+)

Built-In Protection

Cortex includes a resilience layer that protects all operations from overload and failures:

const cortex = new Cortex({
convexUrl: process.env.CONVEX_URL,
resilience: {
// Token Bucket Rate Limiter
rateLimiter: {
refillRate: 100, // Default: 100 ops/sec
bucketSize: 200, // Allow bursts up to 200
},

// Concurrency Control (based on Convex plan)
concurrency: {
maxConcurrent: 16, // Starter: 16, Professional preset: 64 (max: 256)
queueSize: 1000, // Queue up to 1000 requests
},

// Circuit Breaker
circuitBreaker: {
failureThreshold: 5, // Open after 5 failures
timeout: 30000, // 30s timeout before attempting recovery
resetTimeout: 300000, // Try recovery after 5 minutes
},
},
});

Token Bucket Rate Limiter

// Prevents API rate limit exhaustion
// Example: 100 tokens/second with burst of 200
Token Bucket Algorithm
Request Arrives

Check if bucket has ≥1 token

Token Available

Consume 1 token, allow request

No Token

Reject with RATE_LIMIT_EXCEEDED

Refill Continuously

100 tokens/second refill rate

Prevents API rate limit exhaustion with configurable refill rate and bucket capacity

Configuration:

  • Bucket Capacity: 200 tokens (allows bursts)
  • Refill Rate: 100 tokens/second
  • Protects against: Sudden traffic spikes, accidental infinite loops, resource exhaustion

Concurrency Control

// Respects Convex plan limits
// Starter: 16 concurrent operations
// Professional preset: 64 concurrent operations (max: 256)
Concurrency Semaphore
New Request Arrives

Check if slot available

Slot Available

Execute immediately

At Capacity

Add to queue (up to 1000 requests)

Slot Frees

Execute queued request

Respects Convex plan limits and queues requests when at capacity

Configuration:

  • Starter: 16 concurrent operations
  • Professional preset: 64 concurrent operations (max: 256)
  • Queue size: Up to 1000 requests
  • Prevents: Convex concurrent operation limit errors, resource contention, failed operations due to capacity

Circuit Breaker

// Protects against cascading failures
Circuit Breaker States
CLOSED

Normal operation - requests allowed

OPEN

After 5 failures - reject all requests

HALF_OPEN

After 5 minute timeout - allow 1 test request

CLOSED

On success - recovery complete

Protects against cascading failures by temporarily blocking requests after repeated failures

Configuration:

  • Failure threshold: 5 failures
  • Timeout: 30 seconds before attempting recovery
  • Reset timeout: 5 minutes before trying recovery
  • Prevents: Cascading failures, backend overload during incidents, wasted retries

Monitoring Resilience

// Track resilience metrics (synchronous method, no await needed)
const metrics = cortex.getResilienceMetrics();

console.log({
rateLimitHits: metrics.rateLimiter.requestsThrottled,
queuedRequests: metrics.concurrency.waiting,
circuitBreakerState: metrics.circuitBreaker.state, // CLOSED | OPEN | HALF_OPEN
totalProcessed: metrics.queue.processed,
droppedRequests: metrics.queue.dropped,
});

Query Performance

Index Usage (Critical)

Rule #1: Always use indexes for queries.

// Avoid: SLOW - Table scan (no index)
const memories = await ctx.db
.query("memories")
.filter((q) => q.eq(q.field("memorySpaceId"), memorySpaceId))
.collect();
// Scans ENTIRE table! O(n)

// Good: FAST - Index query
const memories = await ctx.db
.query("memories")
.withIndex("by_memorySpace", (q) => q.eq("memorySpaceId", memorySpaceId))
.collect();
// Index lookup! O(log n)

Impact:

  • 1K memories: 10ms vs 100ms (10× faster)
  • 1M memories: 15ms vs 10,000ms (666× faster!)

Compound Indexes

// Less efficient: Single index + filter
const memories = await ctx.db
.query("memories")
.withIndex("by_memorySpace", (q) => q.eq("memorySpaceId", memorySpaceId))
.filter((q) => q.eq(q.field("userId"), userId))
.collect();
// O(log n) + O(k) where k = space's memories

// More efficient: Compound index
const memories = await ctx.db
.query("memories")
.withIndex("by_memorySpace_userId", (q) =>
q.eq("memorySpaceId", memorySpaceId).eq("userId", userId),
)
.collect();
// O(log n) directly to subset

// Multi-tenant compound index
const memories = await ctx.db
.query("memories")
.withIndex("by_tenant_space", (q) =>
q.eq("tenantId", tenantId).eq("memorySpaceId", memorySpaceId),
)
.collect();

Impact:

  • Memory space with 10K memories, 100 for user
  • Single index: 10ms + filter 10K = 15ms
  • Compound index: 10ms directly to 100 = 10ms

Vector Search with filterFields

// Without filterFields: Search all vectors
.vectorIndex("by_embedding", {
vectorField: "embedding",
dimensions: 1536,
// No filterFields
})

// Query searches ALL vectors (slow at scale)
const results = await ctx.db
.query("memories")
.withIndex("by_embedding", (q) =>
q.similar("embedding", vector, 10)
)
.filter((q) => q.eq(q.field("memorySpaceId"), memorySpaceId)) // ← Filter AFTER search
.collect();

// With filterFields: Pre-filter before search
.vectorIndex("by_embedding", {
vectorField: "embedding",
dimensions: 1536, // Default
filterFields: ["memorySpaceId", "tenantId", "userId", "agentId", "participantId"],
})

// Query searches only relevant subset (fast!)
const results = await ctx.db
.query("memories")
.withIndex("by_embedding", (q) =>
q.similar("embedding", vector, 10)
.eq("memorySpaceId", memorySpaceId) // ← Pre-filtered BEFORE search
.eq("tenantId", tenantId) // ← Multi-tenant isolation
)
.collect();

Impact:

  • 1M total vectors across all spaces, 1K per memory space
  • Without filterFields: Search 1M vectors = 200ms
  • With filterFields: Search 1K vectors = 10ms
  • 20× faster!

Multi-tenancy impact:

  • 10M vectors across all tenants, 100K per tenant
  • With tenant filter: Search 100K vectors = 50ms
  • With tenant + space filter: Search 1K vectors = 10ms
  • 100× faster with compound filtering!

Pagination Strategies

Cursor-Based Pagination (Best)

export const listPaginated = query({
args: {
memorySpaceId: v.string(),
agentId: v.optional(v.string()), // Optional: filter by agent if provided
cursor: v.optional(v.number()), // Timestamp cursor
limit: v.number(),
},
handler: async (ctx, args) => {
// Use compound index if agentId provided, otherwise use memorySpace index
const indexName = args.agentId ? "by_memorySpace_agentId" : "by_memorySpace";
let q = ctx.db
.query("memories")
.withIndex(indexName, (q) => {
let query = q.eq("memorySpaceId", args.memorySpaceId);
if (args.agentId) {
query = query.eq("agentId", args.agentId);
}
return query;
})
.order("desc");

// Apply cursor
if (args.cursor) {
q = q.filter((q) => q.lt(q.field("createdAt"), args.cursor));
}

const results = await q.take(args.limit);

return {
memories: results,
nextCursor:
results.length > 0 ? results[results.length - 1].createdAt : null,
hasMore: results.length === args.limit,
};
},
});

// Usage
let cursor = null;
let allMemories = [];

do {
const page = await cortex.memory.list("agent-1", {
cursor,
limit: 100,
});

allMemories.push(...page.memories);
cursor = page.nextCursor;
} while (cursor);

Benefits:

  • Consistent performance (no offset skipping)
  • Works with real-time updates
  • Efficient for large datasets

Offset-Based Pagination (Simple)

// Simpler but slower for large offsets
export const listOffset = query({
args: {
memorySpaceId: v.string(),
agentId: v.string(),
offset: v.number(),
limit: v.number(),
},
handler: async (ctx, args) => {
const results = await ctx.db
.query("memories")
.withIndex("by_memorySpace_agentId", (q) =>
q.eq("memorySpaceId", args.memorySpaceId).eq("agentId", args.agentId)
)
.order("desc")
.skip(args.offset) // ← Skips documents
.take(args.limit)
.collect();

return results;
},
});

Drawbacks:

  • Large offsets are slow (skip 10K documents = slow)
  • Can miss items if data changes during pagination
  • Use cursor-based for production

Caching Strategies

Query Result Caching

// Convex caches query results automatically
// But you can add application-level caching

const cache = new Map<string, { data: any; timestamp: number }>();

export const cachedList = query({
handler: async (ctx, args) => {
const cacheKey = JSON.stringify(args);
const cached = cache.get(cacheKey);

// Cache for 60 seconds
if (cached && Date.now() - cached.timestamp < 60000) {
return cached.data;
}

// Execute query
const data = await ctx.db
.query("memories")
.withIndex("by_memorySpace", (q) => q.eq("memorySpaceId", args.memorySpaceId))
.collect();

// Cache result
cache.set(cacheKey, { data, timestamp: Date.now() });

return data;
},
});

Embedding Caching

// Cache embeddings for common queries
const embeddingCache = new Map<string, number[]>();

async function embedWithCache(text: string): Promise<number[]> {
const normalized = text.trim().toLowerCase();

if (embeddingCache.has(normalized)) {
return embeddingCache.get(normalized)!;
}

const embedding = await openai.embeddings.create({
model: "text-embedding-3-large",
input: text,
});

const vector = embedding.data[0].embedding;
embeddingCache.set(normalized, vector);

return vector;
}

// Common queries benefit
await cortex.memory.search("agent-1", "user preferences", {
embedding: await embedWithCache("user preferences"), // Cached!
});

Batch Operations

Info
The SDK doesn't provide a built-in storeBatch() method. Batch operations require creating custom Convex mutations. This example shows how to implement batch operations efficiently.

Batch Insertions

// Slow: One at a time
for (const item of items) {
await cortex.memory.store("agent-1", item);
}
// N round trips, N transactions

// Fast: Custom batch mutation
export const storeBatch = mutation({
args: { memorySpaceId: v.string(), items: v.array(v.any()) },
handler: async (ctx, args) => {
const ids = [];

// All inserts in single transaction
for (const item of args.items) {
const id = await ctx.db.insert("memories", {
memorySpaceId: args.memorySpaceId, // Fixed: use memorySpaceId from args
...item,
version: 1,
previousVersions: [],
accessCount: 0,
createdAt: Date.now(),
updatedAt: Date.now(),
});

ids.push(id);
}

return ids;
},
});

// Usage: Call via Convex client (not SDK method)
await convexClient.mutation(api.memories.storeBatch, {
memorySpaceId: "agent-1",
items: items,
});
// 1 round trip, 1 transaction

Impact:

  • 100 items: 5000ms → 200ms (25× faster)

Parallel Queries

// Sequential: Slow
const memories = await cortex.memory.search("agent-1", query);
const contexts = await cortex.contexts.list({ memorySpaceId: "agent-1" });
const user = await cortex.users.get("user-123");
// 3× latency

// Parallel: Fast
const [memories, contexts, user] = await Promise.all([
cortex.memory.search("agent-1", query),
cortex.contexts.list({ memorySpaceId: "agent-1" }),
cortex.users.get("user-123"),
]);
// 1× latency

Storage Optimization

Version Retention

// Aggressive retention saves storage
await cortex.agents.configure("temp-agent", {
memoryVersionRetention: 1, // Only current version
});

// 100K memories × 10 versions = 1M documents
// 100K memories × 1 version = 100K documents
// 10× storage savings!

Selective Embeddings

// Only embed important memories
const shouldEmbed = importance >= 70;

await cortex.memory.store("agent-1", {
content: text,
embedding: shouldEmbed ? await embed(text) : undefined,
metadata: { importance },
});

// Saves:
// - Embedding API costs
// - Storage (24KB per embedding)
// - Search time (fewer vectors)

Content Summarization

// Store summarized content (planned feature or DIY)
const summary = await summarize(longContent); // 1000 chars -> 100 chars

await cortex.memory.store('agent-1', {
content: summary, // ← 10× smaller
contentType: 'summarized',
embedding: await embed(summary), // Smaller embedding input
conversationRef: { ... }, // Full content in ACID
...
});

// Saves:
// - Storage in Vector layer
// - Embedding token costs
// - Search index size

Scaling Characteristics

Horizontal Scaling (Convex)

Convex automatically scales:

  • Reads: Unlimited (cached queries, read replicas)
  • Writes: High throughput (distributed writes)
  • Storage: Unlimited (auto-sharding)

Cortex benefits:

  • No manual sharding needed
  • No capacity planning
  • Auto-scales with load

Agent Isolation

// Agents are naturally isolated by agentId
// No cross-agent queries = better performance

// Fast: Single agent
await ctx.db
.query("memories")
.withIndex("by_agentId", (q) => q.eq("agentId", agentId))
.collect();

// Slower: All agents
const allAgents = await ctx.db.query("agents").collect();
const allMemories = [];

for (const agent of allAgents) {
const memories = await ctx.db
.query("memories")
.withIndex("by_agentId", (q) => q.eq("agentId", agent.agentId))
.collect();

allMemories.push(...memories);
}
// N queries (but parallelizable)

Recommendation: Stick to single-agent queries when possible.


Benchmark Results

Info
These benchmark results are theoretical estimates based on Convex platform capabilities. Actual performance may vary based on data distribution, query complexity, network conditions, and system load. Performance targets are not guarantees.

Read Operations (1M memories)

OperationIndexedLatency (p50)Latency (p95)Latency (p99)
get() by IDYes5ms10ms15ms
search() semanticYes (vector)50ms100ms150ms
search() keywordYes (search)20ms40ms60ms
list() paginatedYes15ms30ms45ms
count() filteredYes10ms20ms30ms

Write Operations

OperationLatency (p50)Latency (p95)Throughput
store() single20ms40ms50 ops/sec
store() batch (100)150ms300ms667 ops/sec
update()25ms50ms40 ops/sec
delete()15ms30ms66 ops/sec

Scaling Tests

DatasetAgentsMemoriesVector SearchKeyword Search
Small1010K30ms15ms
Medium1001M80ms35ms
Large1K10M120ms55ms
XL10K100M150ms75ms

Key Insight: Performance degrades logarithmically (O(log n)), not linearly.


Optimization Checklist

Essential Optimizations

Use compound indexes

For common query patterns

Add filterFields

To vector indexes for faster searches

Paginate large result sets

Use cursor-based pagination

Limit query results

Use .take(n) to cap results

Cache frequent queries

Reduce redundant database calls

Batch write operations

Group multiple writes together

Use parallel queries

Promise.all for concurrent operations

Use projection queries

Select only the fields you need

Advanced Optimizations

Aggressive version retention

Keep 1-5 versions to save storage

Selective embeddings

Only embed high-importance memories

Content summarization

Reduce storage and token costs

Lazy load children

Load descendants on demand

Cursor-based pagination

Efficient pagination for large datasets

Cache embeddings

For common queries


Cost Optimization

Storage Costs

Convex pricing: ~$0.50/GB/month

// Calculate storage per memory
const storagePerMemory =
contentSize + // ~1KB (raw) or ~100B (summarized)
embeddingSize + // 0KB (none), 12KB (1536-dim), 24KB (3072-dim)
metadataSize + // ~1KB
versionsSize; // previousVersions × memory size

// Example with 3072-dim:
// Content: 1KB
// Embedding: 24KB
// Metadata: 1KB
// 10 versions: 26KB × 10 = 260KB
// Total: ~286KB per memory!

// 100K memories × 286KB = 28.6 GB = ~$14/month

Optimizations:

  • Use 1536-dim instead of 3072-dim (50% savings)
  • Reduce version retention (10→5 = 50% savings)
  • Summarize content (90% savings on content)
  • Selective embeddings (skip low-importance)

Embedding API Costs

OpenAI pricing:

  • text-embedding-3-large: $0.13/1M tokens
  • text-embedding-3-small: $0.02/1M tokens
// Calculate embedding costs
const avgTokensPerMemory = 100; // ~100 tokens average
const memoriesPerMonth = 10000;
const totalTokens = avgTokensPerMemory * memoriesPerMonth; // 1M tokens

// Cost comparison:
// 3072-dim: 1M tokens × $0.13 = $130/month
// 1536-dim: 1M tokens × $0.02 = $20/month
// 85% savings!

Optimizations:

  • Use smaller model (3-small vs 3-large)
  • Selective embedding (importance >= 70)
  • Cache common queries
  • Batch embedding generation (fewer API calls)

Monitoring and Metrics

Query Performance Tracking

// Track query latency
export const search = query({
args: {
memorySpaceId: v.string(),
agentId: v.optional(v.string()),
embedding: v.array(v.number()),
},
handler: async (ctx, args) => {
const startTime = Date.now();

const results = await ctx.db
.query("memories")
.withIndex("by_embedding", (q) => {
let query = q.similar("embedding", args.embedding, 10)
.eq("memorySpaceId", args.memorySpaceId);
if (args.agentId) {
query = query.eq("agentId", args.agentId);
}
return query;
})
.collect();

const latency = Date.now() - startTime;

// Log slow queries
if (latency > 100) {
console.warn(`Slow search: ${latency}ms`, {
memorySpaceId: args.memorySpaceId,
resultCount: results.length,
});
}

return results;
},
});

Storage Monitoring

// Track storage growth
export const getStorageStats = query({
args: { memorySpaceId: v.string() },
handler: async (ctx, args) => {
const memories = await ctx.db
.query("memories")
.withIndex("by_memorySpace", (q) => q.eq("memorySpaceId", args.memorySpaceId))
.collect();

const stats = {
totalMemories: memories.length,
totalBytes: 0,
embeddingBytes: 0,
contentBytes: 0,
versionsBytes: 0,
};

for (const memory of memories) {
const contentSize = (memory.content?.length || 0) * 2; // UTF-16
const embeddingSize = (memory.embedding?.length || 0) * 8;
const versionsSize =
memory.previousVersions.length * (contentSize + embeddingSize);

stats.contentBytes += contentSize;
stats.embeddingBytes += embeddingSize;
stats.versionsBytes += versionsSize;
stats.totalBytes += contentSize + embeddingSize + versionsSize;
}

return stats;
},
});

Scaling Best Practices

1. Partition by Agent

// Agent-specific queries (fast)
const memories = await cortex.memory.search("agent-1", query);

// Cross-agent queries (slower)
const allAgents = await cortex.agents.list();
const allMemories = await Promise.all(
allAgents.map((a) => cortex.memory.search(a.id, query)),
);

2. Limit Result Sets

// Always set reasonable limits
const results = await cortex.memory.search("agent-1", query, {
limit: 20, // Don't return 1000s of results
});

// Don't load everything
const all = await cortex.memory.list("agent-1"); // Could be huge!

3. Index Common Filters

// If you frequently query by importance
// Note: This is a custom index example - add to schema if needed
.index("by_memorySpace_importance", ["memorySpaceId", "metadata.importance"])

// Fast importance queries
await ctx.db
.query("memories")
.withIndex("by_memorySpace_importance", (q) =>
q.eq("memorySpaceId", memorySpaceId).gte("metadata.importance", 80)
)
.collect();

4. Clean Up Old Data

// Regularly clean trivial old data
export const cleanup = mutation({
handler: async (ctx) => {
const cutoff = Date.now() - 90 * 24 * 60 * 60 * 1000; // 90 days

const oldMemories = await ctx.db
.query("memories")
.filter((q) =>
q.and(
q.lte(q.field("metadata.importance"), 30),
q.lt(q.field("createdAt"), cutoff),
q.lte(q.field("accessCount"), 1),
),
)
.collect();

for (const memory of oldMemories) {
await ctx.db.delete(memory._id);
}

return { deleted: oldMemories.length };
},
});

// Run daily via cron

Troubleshooting Slow Queries

Identify Slow Queries

// Add timing to all queries
const wrapQuery =
(queryFn) =>
async (...args) => {
const start = Date.now();
const result = await queryFn(...args);
const duration = Date.now() - start;

if (duration > 100) {
console.warn("Slow query:", {
function: queryFn.name,
duration,
args,
});
}

return result;
};

Common Issues

Issue: Vector search is slow

Solutions:

  • Add filterFields to vector index
  • Reduce search limit
  • Add userId filter (if applicable)
  • Check embedding dimension (smaller = faster)

Issue: Pagination is slow

Solutions:

  • Use cursor-based pagination
  • Avoid large offsets
  • Add index on sort field

Issue: Filter queries are slow

Solutions:

  • Create compound index for filter combination
  • Use .withIndex() instead of .filter()
  • Limit result set with .take()

Next Steps


Questions? Ask in GitHub Discussions.