Skip to main content

Vector Embeddings

Last Updated: 2025-10-28

Embedding strategy, dimension choices, and integration with Convex vector search.

Overview

Cortex is embedding-agnostic - bring your own embeddings or use Cloud Mode for automatic generation. We support any embedding model and any dimension size that fits your needs.

Key Principle: Cortex provides the storage and search infrastructure; you choose the embedding strategy.

┌─────────────────────────────────────────────────┐
│ Direct Mode (You Manage) │
├─────────────────────────────────────────────────┤
│ 1. Choose embedding model (OpenAI, Cohere, etc.)│
│ 2. Generate embeddings in your code │
│ 3. Store with cortex.memory.store() │
│ 4. Cortex indexes in Convex vector search │
└─────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────┐
│ Cloud Mode (Cortex Manages) │
├─────────────────────────────────────────────────┤
│ 1. Set autoEmbed: true │
│ 2. Cortex generates embeddings automatically │
│ 3. Cortex stores and indexes │
│ 4. Automatic model upgrades over time │
└─────────────────────────────────────────────────┘

Embedding Models

Supported Models (All)

Cortex supports any embedding model that produces float64 arrays:

ProviderModelDimensionsMax TokensQualityCost
OpenAItext-embedding-3-large30728191Excellent$0.13/1M
OpenAItext-embedding-3-small15368191Good$0.02/1M
OpenAItext-embedding-ada-00215368191Good$0.10/1M
Cohereembed-english-v3.01024512Excellent$0.10/1M
Cohereembed-multilingual-v3.01024512Excellent$0.10/1M
Voyage AIvoyage-2102416000Excellent$0.12/1M
Googletext-embedding-0047682048GoodFree (quota)
Localsentence-transformers384-1024VariesGoodFree
Localinstructor-xl768512GoodFree

Cortex is flexible:

  • ✅ Any provider
  • ✅ Any dimension (1536, 3072, 384, etc.)
  • ✅ Mix models (per agent or per deployment)
  • ✅ Upgrade models anytime

Dimension Strategy

Choosing Dimensions

Trade-offs:

DimensionsStorage per VectorSearch SpeedQualityUse Case
384~3KBFastestGoodHigh-volume, cost-sensitive
768~6KBFastGoodBalanced
1024~8KBFastExcellentCohere, Voyage
1536~12KBMediumGoodOpenAI standard
3072~24KBMediumExcellentOpenAI large, best quality

Recommendations:

  • High Quality: 3072-dim (OpenAI text-embedding-3-large)
  • Balanced: 1536-dim (OpenAI text-embedding-3-small)
  • Cost-Optimized: 768-dim (local models)
  • High-Volume: 384-dim (sentence-transformers)

Setting Dimensions in Convex

// convex/schema.ts
import { defineSchema, defineTable } from "convex/server";
import { v } from "convex/values";

export default defineSchema({
memories: defineTable({
// ...
embedding: v.optional(v.array(v.float64())),
}).vectorIndex("by_embedding", {
vectorField: "embedding",
dimensions: 3072, // ← Set your dimension here
filterFields: ["agentId", "userId"],
}),
});

Important: Once set, all embeddings must be this dimension. Changing requires re-embedding all documents.

Per-Agent Dimensions (Advanced)

You can use different dimensions per agent by having separate vector indexes:

// Not recommended, but possible
memories_1536: defineTable({ ... })
.vectorIndex("by_embedding", { dimensions: 1536, ... }),

memories_3072: defineTable({ ... })
.vectorIndex("by_embedding", { dimensions: 3072, ... }),

// Better: Use one dimension for consistency

Direct Mode Integration

Example: OpenAI Embeddings

import OpenAI from "openai";
import { Cortex } from "@cortex-platform/sdk";

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const cortex = new Cortex({ convexUrl: process.env.CONVEX_URL });

// Generate embedding
async function embed(text: string): Promise<number[]> {
const response = await openai.embeddings.create({
model: "text-embedding-3-large",
input: text,
dimensions: 3072,
});

return response.data[0].embedding;
}

// Store with embedding
await cortex.memory.store("agent-1", {
content: "User prefers dark mode",
contentType: "raw",
embedding: await embed("User prefers dark mode"), // ← Your embedding
source: { type: "system", timestamp: new Date() },
metadata: { importance: 60, tags: ["preferences"] },
});

// Search with embedding
const results = await cortex.memory.search("agent-1", "user preferences", {
embedding: await embed("user preferences"), // ← Your embedding
limit: 10,
});

Example: Cohere Embeddings

import { CohereClient } from 'cohere-ai';
import { Cortex } from '@cortex-platform/sdk';

const cohere = new CohereClient({ token: process.env.COHERE_API_KEY });
const cortex = new Cortex({ convexUrl: process.env.CONVEX_URL });

async function embed(text: string): Promise<number[]> {
const response = await cohere.embed({
texts: [text],
model: 'embed-english-v3.0',
inputType: 'search_document',
});

return response.embeddings[0];
}

// Use like OpenAI
await cortex.memory.store('agent-1', {
content: 'Important information',
embedding: await embed('Important information'),
...
});

Example: Local Models

import { pipeline } from '@xenova/transformers';

// Load model once
const embedder = await pipeline(
'feature-extraction',
'Xenova/all-MiniLM-L6-v2' // 384-dim
);

async function embed(text: string): Promise<number[]> {
const output = await embedder(text, {
pooling: 'mean',
normalize: true,
});

return Array.from(output.data);
}

// Use with Cortex
await cortex.memory.store('agent-1', {
content: text,
embedding: await embed(text),
...
});

Cloud Mode Integration

Auto-Embeddings

const cortex = new Cortex({
mode: "cloud",
apiKey: process.env.CORTEX_CLOUD_KEY,
});

// Zero-config embeddings
await cortex.memory.store("agent-1", {
content: "User prefers dark mode",
contentType: "raw",
autoEmbed: true, // ← Cortex Cloud generates embedding
source: { type: "system", timestamp: new Date() },
metadata: { importance: 60 },
});
// No API keys, no embedding code, automatic upgrades ✅

// Search (Cloud provides query embedding too)
const results = await cortex.memory.search("agent-1", "user preferences", {
autoEmbed: true, // ← Cortex Cloud generates query embedding
limit: 10,
});

Cloud Mode handles:

  • Embedding generation (OpenAI text-embedding-3-large)
  • Model selection and upgrades
  • Rate limiting and retries
  • Cost optimization
  • Dimension management

Embedding Storage

In Convex Documents

{
_id: "mem_abc123",
memorySpaceId: "agent-1",
content: "User prefers dark mode",
embedding: [
0.123456,
-0.234567,
0.345678,
// ... 3072 values
],
// ~24KB for 3072-dim embedding
}

Storage calculation:

  • 3072 dimensions × 8 bytes (float64) = 24,576 bytes = ~24KB
  • Total memory: content (< 1KB) + embedding (24KB) + metadata (< 1KB) = ~26KB

Optional Embeddings

// Embeddings are optional - can store without
await cortex.memory.store("agent-1", {
content: "Debug log: Agent started",
// No embedding - keyword search only
source: { type: "system", timestamp: new Date() },
metadata: { importance: 10 },
});

// Strategies:
// 1. Always embed (best search, higher cost)
// 2. Selective embed (importance >= 50)
// 3. Lazy embed (store now, embed later)
// 4. Never embed (keyword search only)

Search Behavior

// Query with embedding
const results = await cortex.memory.search("agent-1", "user password", {
embedding: await embed("user password"),
limit: 10,
});

// Convex vector search
await ctx.db
.query("memories")
.withIndex("by_embedding", (q) =>
q.similar("embedding", queryEmbedding, 10).eq("agentId", "agent-1"),
)
.collect();

// Returns memories ranked by cosine similarity
// Even if exact keywords don't match!
// Query without embedding
const results = await cortex.memory.search("agent-1", "user password", {
// No embedding provided
limit: 10,
});

// Convex keyword search
await ctx.db
.query("memories")
.withSearchIndex("by_content", (q) =>
q.search("content", "user password").eq("agentId", "agent-1"),
)
.collect();

// Returns memories with exact keyword matches
// Requires actual words to match

Hybrid Search (Best of Both)

// Cortex can combine strategies
const results = await cortex.memory.search("agent-1", "user password", {
embedding: await embed("user password"),
strategy: "auto", // Combines semantic + keyword
limit: 10,
});

// Behind the scenes:
// 1. Semantic search (top 20)
// 2. Keyword search (top 20)
// 3. Merge and re-rank by combined score
// 4. Return top 10

Embedding Best Practices

1. Consistent Dimensions

// ✅ Good: Same model throughout
const EMBEDDING_MODEL = "text-embedding-3-large";
const DIMENSIONS = 3072;

const cortex = new Cortex({
convexUrl: process.env.CONVEX_URL,
embeddingDimensions: DIMENSIONS, // Enforced
});

async function embed(text: string) {
return await openai.embeddings.create({
model: EMBEDDING_MODEL,
input: text,
dimensions: DIMENSIONS,
});
}

// ❌ Bad: Mixed dimensions
await cortex.memory.store("agent-1", {
embedding: await embed1536(text), // 1536-dim
});

await cortex.memory.store("agent-1", {
embedding: await embed3072(text), // 3072-dim ❌ Error!
});

2. Normalize Embeddings

// Some models return normalized, some don't
function normalizeEmbedding(embedding: number[]): number[] {
const magnitude = Math.sqrt(
embedding.reduce((sum, val) => sum + val * val, 0)
);

return embedding.map(val => val / magnitude);
}

// Use with local models
const raw = await localEmbed(text);
const normalized = normalizeEmbedding(raw);

await cortex.memory.store('agent-1', {
embedding: normalized, // ← Normalized for better similarity
...
});

3. Batch Embedding Generation

// ✅ Efficient: Batch embeddings
async function embedBatch(texts: string[]): Promise<number[][]> {
const response = await openai.embeddings.create({
model: 'text-embedding-3-large',
input: texts, // ← Multiple texts at once
});

return response.data.map(d => d.embedding);
}

// Store many memories efficiently
const contents = ['Memory 1', 'Memory 2', 'Memory 3'];
const embeddings = await embedBatch(contents);

for (let i = 0; i < contents.length; i++) {
await cortex.memory.store('agent-1', {
content: contents[i],
embedding: embeddings[i],
...
});
}

4. Cache Embeddings

// Cache frequently-used embeddings
const embeddingCache = new Map<string, number[]>();

async function embedWithCache(text: string): Promise<number[]> {
if (embeddingCache.has(text)) {
return embeddingCache.get(text)!;
}

const embedding = await embed(text);
embeddingCache.set(text, embedding);

return embedding;
}

// Useful for common queries
const userPrefsEmbedding = await embedWithCache("user preferences");

5. Handle Embedding Errors

async function safeEmbed(text: string): Promise<number[] | undefined> {
try {
return await embed(text);
} catch (error) {
console.error('Embedding failed:', error);

// Store without embedding (keyword search still works)
return undefined;
}
}

// Store with fallback
await cortex.memory.store('agent-1', {
content: text,
embedding: await safeEmbed(text), // undefined if fails
...
});

Dimension Considerations

Storage Impact

// Embedding storage per dimension
const BYTES_PER_DIMENSION = 8; // float64

function calculateEmbeddingSize(dimensions: number): number {
return dimensions * BYTES_PER_DIMENSION;
}

console.log("384-dim:", calculateEmbeddingSize(384), "bytes"); // 3KB
console.log("768-dim:", calculateEmbeddingSize(768), "bytes"); // 6KB
console.log("1536-dim:", calculateEmbeddingSize(1536), "bytes"); // 12KB
console.log("3072-dim:", calculateEmbeddingSize(3072), "bytes"); // 24KB

// For 100K memories:
// 384-dim: 100K × 3KB = 300MB
// 3072-dim: 100K × 24KB = 2.4GB
// 8× storage difference!

Quality vs Cost

// High quality (expensive)
{
model: 'text-embedding-3-large',
dimensions: 3072,
cost: '$0.13/1M tokens',
storage: '24KB per embedding',
useCase: 'Critical information, high accuracy needed',
}

// Balanced (recommended)
{
model: 'text-embedding-3-small',
dimensions: 1536,
cost: '$0.02/1M tokens',
storage: '12KB per embedding',
useCase: 'Most applications',
}

// Cost-optimized (free)
{
model: 'sentence-transformers/all-MiniLM-L6-v2',
dimensions: 384,
cost: 'Free (local)',
storage: '3KB per embedding',
useCase: 'High-volume, budget-constrained',
}

Migration Strategy

// Migrating from 1536 to 3072 dimensions

// 1. Create new index (different dimension)
memories_3072: defineTable({ ... })
.vectorIndex("by_embedding", { dimensions: 3072, ... });

// 2. Re-embed all memories
const old = await ctx.db.query("memories").collect();

for (const memory of old) {
// Generate new embedding
const newEmbedding = await embed3072(memory.content);

// Store in new table
await ctx.db.insert("memories_3072", {
...memory,
embedding: newEmbedding,
});
}

// 3. Switch SDK to new table
// 4. Delete old table when ready

Convex Vector Index Configuration

Basic Configuration

.vectorIndex("by_embedding", {
vectorField: "embedding",
dimensions: 3072,
})
.vectorIndex("by_embedding", {
vectorField: "embedding",
dimensions: 3072,
filterFields: ["agentId", "userId"], // ← Pre-filter before similarity
})

// Query with pre-filtering (fast!)
await ctx.db
.query("memories")
.withIndex("by_embedding", (q) =>
q.similar("embedding", queryVector, 10)
.eq("agentId", agentId) // ← Filtered BEFORE similarity search
.eq("userId", userId) // ← Much faster for large datasets
)
.collect();

Why filterFields matter:

  • Searches only relevant subset
  • Faster (fewer vectors to compare)
  • Better isolation
  • Lower latency

Multiple Vector Indexes (Advanced)

// Different indexes for different use cases
memories: defineTable({ ... })
.vectorIndex("by_embedding", {
vectorField: "embedding",
dimensions: 3072,
filterFields: ["agentId"], // General search
})
.vectorIndex("by_embedding_user", {
vectorField: "embedding",
dimensions: 3072,
filterFields: ["agentId", "userId"], // User-specific search
});

// Use appropriate index
// User-specific: use by_embedding_user (faster)
// All memories: use by_embedding

Embedding-Optional Architecture

Progressive Enhancement

Cortex works with or without embeddings:

// Level 1: No embeddings (keyword search only)
await cortex.memory.store("agent-1", {
content: "User email is alex@example.com",
// No embedding
source: { type: "system", timestamp: new Date() },
metadata: { importance: 60 },
});

// Can still search with keywords
const results = await cortex.memory.search("agent-1", "email", {
// No embedding provided - uses keyword search
limit: 10,
});

// Level 2: Add embeddings later
const memory = await cortex.memory.get("agent-1", "mem-123");
await cortex.memory.update("agent-1", "mem-123", {
embedding: await embed(memory.content), // ← Add embedding
});

// Now semantic search works too!

Selective Embedding

// Embed only important memories
async function storeWithSelectiveEmbedding(
memorySpaceId: string,
data: MemoryInput,
) {
const shouldEmbed = data.metadata.importance >= 70;

await cortex.memory.store(agentId, {
...data,
embedding: shouldEmbed ? await embed(data.content) : undefined,
});
}

// Saves embedding costs for low-importance memories

Query Embedding Strategy

Query Expansion

// Generate multiple query variations
async function expandedSearch(memorySpaceId: string, query: string) {
const variations = [
query,
`user ${query}`,
`agent ${query}`,
`${query} information`,
];

const embeddings = await Promise.all(variations.map((v) => embed(v)));

// Search with each variation
const allResults = await Promise.all(
embeddings.map((emb) =>
cortex.memory.search(agentId, query, {
embedding: emb,
limit: 10,
}),
),
);

// Merge and deduplicate
const seen = new Set();
const unique = [];

for (const results of allResults) {
for (const result of results) {
if (!seen.has(result.id)) {
seen.add(result.id);
unique.push(result);
}
}
}

return unique;
}

Query Rewriting

// Use LLM to improve query
async function semanticQuery(memorySpaceId: string, userQuery: string) {
// Rewrite for better search
const improved = await llm.complete({
prompt: `Rewrite this query for semantic search: "${userQuery}"`,
maxTokens: 50,
});

// Search with improved query
return await cortex.memory.search(agentId, improved, {
embedding: await embed(improved),
limit: 10,
});
}

Performance Considerations

Embedding Generation

Latency:

  • OpenAI API: 50-200ms
  • Cohere API: 50-150ms
  • Local models: 10-50ms

Throughput:

  • OpenAI: 3000 RPM (batching helps)
  • Cohere: 10,000 RPM
  • Local: Unlimited (CPU-bound)

Cost:

  • OpenAI 3072-dim: $0.13/1M tokens (~$0.0001 per embedding)
  • OpenAI 1536-dim: $0.02/1M tokens (~$0.00002 per embedding)
  • Local: Free (hardware cost)

Vector Search Performance

Convex vector search:

  • < 100ms for millions of vectors
  • Pre-filtering with filterFields (much faster)
  • Cosine similarity (optimized)
// With filterFields (fast)
.vectorIndex("by_embedding", {
vectorField: "embedding",
dimensions: 3072,
filterFields: ["agentId"], // ← Searches only agent's vectors
})

// 1M total vectors, 1K per agent
// Without filtering: Searches 1M vectors (~200ms)
// With filtering: Searches 1K vectors (~10ms)
// 20× faster!

Embedding Strategies by Use Case

Chatbot (User-Agent)

// Store user message with embedding
await cortex.memory.remember({
memorySpaceId: "chatbot",
conversationId: "conv-123",
userMessage: req.body.message,
agentResponse: response,
userId: req.user.id,
userName: req.user.name,
generateEmbedding: embed, // Provide embedding function
});

// Semantic search for context
const context = await cortex.memory.search("chatbot", req.body.message, {
embedding: await embed(req.body.message),
userId: req.user.id,
limit: 5,
});

Recommendation: 1536-dim (balanced quality/cost)

Knowledge Base

// Store KB articles
await cortex.immutable.store({
type: "kb-article",
id: "refund-policy",
data: {
title: "Refund Policy",
content: "Detailed refund policy...",
},
});

// Index for search (with embedding)
await cortex.vector.store("kb-agent", {
content: "Refund Policy: Detailed refund policy...",
embedding: await embed("refund policy content"),
immutableRef: { type: "kb-article", id: "refund-policy" },
metadata: { importance: 90, tags: ["policy"] },
});

// All agents can search
const results = await cortex.memory.search("any-agent", "refund policy", {
embedding: await embed("refund policy"),
});

Recommendation: 3072-dim (best quality for reused content)

High-Volume Logs

// System logs without embeddings
await cortex.vector.store("agent-1", {
content: "System started at 10:00 AM",
// No embedding - saves cost
source: { type: "system", timestamp: new Date() },
metadata: { importance: 10, tags: ["debug"] },
});

// Keyword search works fine
const logs = await cortex.memory.search("agent-1", "started", {
// No embedding - keyword search
tags: ["debug"],
limit: 100,
});

Recommendation: No embeddings (keyword search sufficient)


Cloud Mode Embedding Pipeline

How Auto-Embed Works

graph LR
A[SDK: autoEmbed=true] --> B[Cortex Cloud API]
B --> C[Generate Embedding]
C --> D[OpenAI API]
D --> C
C --> E[Store in Convex]
E --> F[Your Convex Instance]

Process:

  1. Developer calls cortex.memory.store({ autoEmbed: true })
  2. Cortex Cloud receives request
  3. Generates embedding (OpenAI text-embedding-3-large, 3072-dim)
  4. Stores in your Convex instance with embedding
  5. Returns memory ID to developer

Benefits:

  • No API key management
  • Automatic retries and error handling
  • Model upgrades (when better models release)
  • Usage-based pricing ($0.02/1K tokens)

Dimension Migration

Changing Dimensions

If you need to change embedding dimensions:

// 1. Update schema
memories: defineTable({ ... })
.vectorIndex("by_embedding", {
dimensions: 3072, // Was 1536
});

// 2. Re-embed all memories
export const reEmbedAll = mutation({
handler: async (ctx) => {
const memories = await ctx.db.query("memories").collect();

for (const memory of memories) {
if (memory.embedding) {
// Mark for re-embedding
await ctx.db.patch(memory._id, {
metadata: {
...memory.metadata,
needsReEmbedding: true,
},
});
}
}
},
});

// 3. Re-embed action
export const reEmbedMemory = action({
args: { memoryId: v.id("memories") },
handler: async (ctx, args) => {
const memory = await ctx.runQuery(api.memories.get, {
memoryId: args.memoryId,
});

// Generate new embedding
const newEmbedding = await embed3072(memory.content);

// Update
await ctx.runMutation(api.memories.patch, {
memoryId: args.memoryId,
embedding: newEmbedding,
metadata: {
...memory.metadata,
needsReEmbedding: false,
},
});
},
});

// 4. Run for all memories
const memories = await cortex.memory.list('agent-1');
for (const memory of memories.memories) {
if (memory.metadata.needsReEmbedding) {
await convexClient.action(api.memories.reEmbedMemory, {
memoryId: memory.id,
});
}
}

Next Steps


Questions? Ask in GitHub Discussions or Discord.