Vector Embeddings

Last Updated: 2025-10-28

Embedding strategy, dimension choices, and integration with Convex vector search.

Overview

Cortex is embedding-agnostic - bring your own embeddings or use Cloud Mode for automatic generation. We support any embedding model and any dimension size that fits your needs.

Key Principle: Cortex provides the storage and search infrastructure; you choose the embedding strategy.

┌─────────────────────────────────────────────────┐
│         Direct Mode (You Manage)                │
├─────────────────────────────────────────────────┤
│ 1. Choose embedding model (OpenAI, Cohere, etc.)│
│ 2. Generate embeddings in your code             │
│ 3. Store with cortex.memory.store()             │
│ 4. Cortex indexes in Convex vector search       │
└─────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────┐
│         Cloud Mode (Cortex Manages)             │
├─────────────────────────────────────────────────┤
│ 1. Set autoEmbed: true                          │
│ 2. Cortex generates embeddings automatically    │
│ 3. Cortex stores and indexes                    │
│ 4. Automatic model upgrades over time           │
└─────────────────────────────────────────────────┘

Embedding Models

Supported Models (All)

Cortex supports any embedding model that produces float64 arrays:

Provider	Model	Dimensions	Max Tokens	Quality	Cost
OpenAI	text-embedding-3-large	3072	8191	Excellent	$0.13/1M
OpenAI	text-embedding-3-small	1536	8191	Good	$0.02/1M
OpenAI	text-embedding-ada-002	1536	8191	Good	$0.10/1M
Cohere	embed-english-v3.0	1024	512	Excellent	$0.10/1M
Cohere	embed-multilingual-v3.0	1024	512	Excellent	$0.10/1M
Voyage AI	voyage-2	1024	16000	Excellent	$0.12/1M
Google	text-embedding-004	768	2048	Good	Free (quota)
Local	sentence-transformers	384-1024	Varies	Good	Free
Local	instructor-xl	768	512	Good	Free

Cortex is flexible:

✅ Any provider
✅ Any dimension (1536, 3072, 384, etc.)
✅ Mix models (per agent or per deployment)
✅ Upgrade models anytime

Dimension Strategy

Choosing Dimensions

Trade-offs:

Dimensions	Storage per Vector	Search Speed	Quality	Use Case
384	~3KB	Fastest	Good	High-volume, cost-sensitive
768	~6KB	Fast	Good	Balanced
1024	~8KB	Fast	Excellent	Cohere, Voyage
1536	~12KB	Medium	Good	OpenAI standard
3072	~24KB	Medium	Excellent	OpenAI large, best quality

Recommendations:

High Quality: 3072-dim (OpenAI text-embedding-3-large)
Balanced: 1536-dim (OpenAI text-embedding-3-small)
Cost-Optimized: 768-dim (local models)
High-Volume: 384-dim (sentence-transformers)

Setting Dimensions in Convex

// convex/schema.ts
import { defineSchema, defineTable } from "convex/server";
import { v } from "convex/values";

export default defineSchema({
  memories: defineTable({
    // ...
    embedding: v.optional(v.array(v.float64())),
  }).vectorIndex("by_embedding", {
    vectorField: "embedding",
    dimensions: 3072, // ← Set your dimension here
    filterFields: ["agentId", "userId"],
  }),
});

Important: Once set, all embeddings must be this dimension. Changing requires re-embedding all documents.

Per-Agent Dimensions (Advanced)

You can use different dimensions per agent by having separate vector indexes:

// Not recommended, but possible
memories_1536: defineTable({ ... })
  .vectorIndex("by_embedding", { dimensions: 1536, ... }),

memories_3072: defineTable({ ... })
  .vectorIndex("by_embedding", { dimensions: 3072, ... }),

// Better: Use one dimension for consistency

Direct Mode Integration

Example: OpenAI Embeddings

import OpenAI from "openai";
import { Cortex } from "@cortex-platform/sdk";

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const cortex = new Cortex({ convexUrl: process.env.CONVEX_URL });

// Generate embedding
async function embed(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: "text-embedding-3-large",
    input: text,
    dimensions: 3072,
  });

  return response.data[0].embedding;
}

// Store with embedding
await cortex.memory.store("agent-1", {
  content: "User prefers dark mode",
  contentType: "raw",
  embedding: await embed("User prefers dark mode"), // ← Your embedding
  source: { type: "system", timestamp: new Date() },
  metadata: { importance: 60, tags: ["preferences"] },
});

// Search with embedding
const results = await cortex.memory.search("agent-1", "user preferences", {
  embedding: await embed("user preferences"), // ← Your embedding
  limit: 10,
});

Example: Cohere Embeddings

import { CohereClient } from 'cohere-ai';
import { Cortex } from '@cortex-platform/sdk';

const cohere = new CohereClient({ token: process.env.COHERE_API_KEY });
const cortex = new Cortex({ convexUrl: process.env.CONVEX_URL });

async function embed(text: string): Promise<number[]> {
  const response = await cohere.embed({
    texts: [text],
    model: 'embed-english-v3.0',
    inputType: 'search_document',
  });

  return response.embeddings[0];
}

// Use like OpenAI
await cortex.memory.store('agent-1', {
  content: 'Important information',
  embedding: await embed('Important information'),
  ...
});

Example: Local Models

import { pipeline } from '@xenova/transformers';

// Load model once
const embedder = await pipeline(
  'feature-extraction',
  'Xenova/all-MiniLM-L6-v2'  // 384-dim
);

async function embed(text: string): Promise<number[]> {
  const output = await embedder(text, {
    pooling: 'mean',
    normalize: true,
  });

  return Array.from(output.data);
}

// Use with Cortex
await cortex.memory.store('agent-1', {
  content: text,
  embedding: await embed(text),
  ...
});

Cloud Mode Integration

Auto-Embeddings

const cortex = new Cortex({
  mode: "cloud",
  apiKey: process.env.CORTEX_CLOUD_KEY,
});

// Zero-config embeddings
await cortex.memory.store("agent-1", {
  content: "User prefers dark mode",
  contentType: "raw",
  autoEmbed: true, // ← Cortex Cloud generates embedding
  source: { type: "system", timestamp: new Date() },
  metadata: { importance: 60 },
});
// No API keys, no embedding code, automatic upgrades ✅

// Search (Cloud provides query embedding too)
const results = await cortex.memory.search("agent-1", "user preferences", {
  autoEmbed: true, // ← Cortex Cloud generates query embedding
  limit: 10,
});

Cloud Mode handles:

Embedding generation (OpenAI text-embedding-3-large)
Model selection and upgrades
Rate limiting and retries
Cost optimization
Dimension management

Embedding Storage

In Convex Documents

{
  _id: "mem_abc123",
  memorySpaceId: "agent-1",
  content: "User prefers dark mode",
  embedding: [
    0.123456,
    -0.234567,
    0.345678,
    // ... 3072 values
  ],
  // ~24KB for 3072-dim embedding
}

Storage calculation:

3072 dimensions × 8 bytes (float64) = 24,576 bytes = ~24KB
Total memory: content (< 1KB) + embedding (24KB) + metadata (< 1KB) = ~26KB

Optional Embeddings

// Embeddings are optional - can store without
await cortex.memory.store("agent-1", {
  content: "Debug log: Agent started",
  // No embedding - keyword search only
  source: { type: "system", timestamp: new Date() },
  metadata: { importance: 10 },
});

// Strategies:
// 1. Always embed (best search, higher cost)
// 2. Selective embed (importance >= 50)
// 3. Lazy embed (store now, embed later)
// 4. Never embed (keyword search only)

Search Behavior

With Embeddings (Semantic Search)

// Query with embedding
const results = await cortex.memory.search("agent-1", "user password", {
  embedding: await embed("user password"),
  limit: 10,
});

// Convex vector search
await ctx.db
  .query("memories")
  .withIndex("by_embedding", (q) =>
    q.similar("embedding", queryEmbedding, 10).eq("agentId", "agent-1"),
  )
  .collect();

// Returns memories ranked by cosine similarity
// Even if exact keywords don't match!

Without Embeddings (Keyword Search)

// Query without embedding
const results = await cortex.memory.search("agent-1", "user password", {
  // No embedding provided
  limit: 10,
});

// Convex keyword search
await ctx.db
  .query("memories")
  .withSearchIndex("by_content", (q) =>
    q.search("content", "user password").eq("agentId", "agent-1"),
  )
  .collect();

// Returns memories with exact keyword matches
// Requires actual words to match

Hybrid Search (Best of Both)

// Cortex can combine strategies
const results = await cortex.memory.search("agent-1", "user password", {
  embedding: await embed("user password"),
  strategy: "auto", // Combines semantic + keyword
  limit: 10,
});

// Behind the scenes:
// 1. Semantic search (top 20)
// 2. Keyword search (top 20)
// 3. Merge and re-rank by combined score
// 4. Return top 10

Embedding Best Practices

1. Consistent Dimensions

// ✅ Good: Same model throughout
const EMBEDDING_MODEL = "text-embedding-3-large";
const DIMENSIONS = 3072;

const cortex = new Cortex({
  convexUrl: process.env.CONVEX_URL,
  embeddingDimensions: DIMENSIONS, // Enforced
});

async function embed(text: string) {
  return await openai.embeddings.create({
    model: EMBEDDING_MODEL,
    input: text,
    dimensions: DIMENSIONS,
  });
}

// ❌ Bad: Mixed dimensions
await cortex.memory.store("agent-1", {
  embedding: await embed1536(text), // 1536-dim
});

await cortex.memory.store("agent-1", {
  embedding: await embed3072(text), // 3072-dim ❌ Error!
});

2. Normalize Embeddings

// Some models return normalized, some don't
function normalizeEmbedding(embedding: number[]): number[] {
  const magnitude = Math.sqrt(
    embedding.reduce((sum, val) => sum + val * val, 0)
  );

  return embedding.map(val => val / magnitude);
}

// Use with local models
const raw = await localEmbed(text);
const normalized = normalizeEmbedding(raw);

await cortex.memory.store('agent-1', {
  embedding: normalized,  // ← Normalized for better similarity
  ...
});

3. Batch Embedding Generation

// ✅ Efficient: Batch embeddings
async function embedBatch(texts: string[]): Promise<number[][]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-large',
    input: texts,  // ← Multiple texts at once
  });

  return response.data.map(d => d.embedding);
}

// Store many memories efficiently
const contents = ['Memory 1', 'Memory 2', 'Memory 3'];
const embeddings = await embedBatch(contents);

for (let i = 0; i < contents.length; i++) {
  await cortex.memory.store('agent-1', {
    content: contents[i],
    embedding: embeddings[i],
    ...
  });
}

4. Cache Embeddings

// Cache frequently-used embeddings
const embeddingCache = new Map<string, number[]>();

async function embedWithCache(text: string): Promise<number[]> {
  if (embeddingCache.has(text)) {
    return embeddingCache.get(text)!;
  }

  const embedding = await embed(text);
  embeddingCache.set(text, embedding);

  return embedding;
}

// Useful for common queries
const userPrefsEmbedding = await embedWithCache("user preferences");

5. Handle Embedding Errors

async function safeEmbed(text: string): Promise<number[] | undefined> {
  try {
    return await embed(text);
  } catch (error) {
    console.error('Embedding failed:', error);

    // Store without embedding (keyword search still works)
    return undefined;
  }
}

// Store with fallback
await cortex.memory.store('agent-1', {
  content: text,
  embedding: await safeEmbed(text),  // undefined if fails
  ...
});

Dimension Considerations

Storage Impact

// Embedding storage per dimension
const BYTES_PER_DIMENSION = 8; // float64

function calculateEmbeddingSize(dimensions: number): number {
  return dimensions * BYTES_PER_DIMENSION;
}

console.log("384-dim:", calculateEmbeddingSize(384), "bytes"); // 3KB
console.log("768-dim:", calculateEmbeddingSize(768), "bytes"); // 6KB
console.log("1536-dim:", calculateEmbeddingSize(1536), "bytes"); // 12KB
console.log("3072-dim:", calculateEmbeddingSize(3072), "bytes"); // 24KB

// For 100K memories:
// 384-dim: 100K × 3KB = 300MB
// 3072-dim: 100K × 24KB = 2.4GB
// 8× storage difference!

Quality vs Cost

// High quality (expensive)
{
  model: 'text-embedding-3-large',
  dimensions: 3072,
  cost: '$0.13/1M tokens',
  storage: '24KB per embedding',
  useCase: 'Critical information, high accuracy needed',
}

// Balanced (recommended)
{
  model: 'text-embedding-3-small',
  dimensions: 1536,
  cost: '$0.02/1M tokens',
  storage: '12KB per embedding',
  useCase: 'Most applications',
}

// Cost-optimized (free)
{
  model: 'sentence-transformers/all-MiniLM-L6-v2',
  dimensions: 384,
  cost: 'Free (local)',
  storage: '3KB per embedding',
  useCase: 'High-volume, budget-constrained',
}

Migration Strategy

// Migrating from 1536 to 3072 dimensions

// 1. Create new index (different dimension)
memories_3072: defineTable({ ... })
  .vectorIndex("by_embedding", { dimensions: 3072, ... });

// 2. Re-embed all memories
const old = await ctx.db.query("memories").collect();

for (const memory of old) {
  // Generate new embedding
  const newEmbedding = await embed3072(memory.content);

  // Store in new table
  await ctx.db.insert("memories_3072", {
    ...memory,
    embedding: newEmbedding,
  });
}

// 3. Switch SDK to new table
// 4. Delete old table when ready

Convex Vector Index Configuration

Basic Configuration

.vectorIndex("by_embedding", {
  vectorField: "embedding",
  dimensions: 3072,
})

With Filter Fields (Recommended)

.vectorIndex("by_embedding", {
  vectorField: "embedding",
  dimensions: 3072,
  filterFields: ["agentId", "userId"],  // ← Pre-filter before similarity
})

// Query with pre-filtering (fast!)
await ctx.db
  .query("memories")
  .withIndex("by_embedding", (q) =>
    q.similar("embedding", queryVector, 10)
     .eq("agentId", agentId)      // ← Filtered BEFORE similarity search
     .eq("userId", userId)        // ← Much faster for large datasets
  )
  .collect();

Why filterFields matter:

Searches only relevant subset
Faster (fewer vectors to compare)
Better isolation
Lower latency

Multiple Vector Indexes (Advanced)

// Different indexes for different use cases
memories: defineTable({ ... })
  .vectorIndex("by_embedding", {
    vectorField: "embedding",
    dimensions: 3072,
    filterFields: ["agentId"],  // General search
  })
  .vectorIndex("by_embedding_user", {
    vectorField: "embedding",
    dimensions: 3072,
    filterFields: ["agentId", "userId"],  // User-specific search
  });

// Use appropriate index
// User-specific: use by_embedding_user (faster)
// All memories: use by_embedding

Embedding-Optional Architecture

Progressive Enhancement

Cortex works with or without embeddings:

// Level 1: No embeddings (keyword search only)
await cortex.memory.store("agent-1", {
  content: "User email is alex@example.com",
  // No embedding
  source: { type: "system", timestamp: new Date() },
  metadata: { importance: 60 },
});

// Can still search with keywords
const results = await cortex.memory.search("agent-1", "email", {
  // No embedding provided - uses keyword search
  limit: 10,
});

// Level 2: Add embeddings later
const memory = await cortex.memory.get("agent-1", "mem-123");
await cortex.memory.update("agent-1", "mem-123", {
  embedding: await embed(memory.content), // ← Add embedding
});

// Now semantic search works too!

Selective Embedding

// Embed only important memories
async function storeWithSelectiveEmbedding(
  memorySpaceId: string,
  data: MemoryInput,
) {
  const shouldEmbed = data.metadata.importance >= 70;

  await cortex.memory.store(agentId, {
    ...data,
    embedding: shouldEmbed ? await embed(data.content) : undefined,
  });
}

// Saves embedding costs for low-importance memories

Query Embedding Strategy

Query Expansion

// Generate multiple query variations
async function expandedSearch(memorySpaceId: string, query: string) {
  const variations = [
    query,
    `user ${query}`,
    `agent ${query}`,
    `${query} information`,
  ];

  const embeddings = await Promise.all(variations.map((v) => embed(v)));

  // Search with each variation
  const allResults = await Promise.all(
    embeddings.map((emb) =>
      cortex.memory.search(agentId, query, {
        embedding: emb,
        limit: 10,
      }),
    ),
  );

  // Merge and deduplicate
  const seen = new Set();
  const unique = [];

  for (const results of allResults) {
    for (const result of results) {
      if (!seen.has(result.id)) {
        seen.add(result.id);
        unique.push(result);
      }
    }
  }

  return unique;
}

Query Rewriting

// Use LLM to improve query
async function semanticQuery(memorySpaceId: string, userQuery: string) {
  // Rewrite for better search
  const improved = await llm.complete({
    prompt: `Rewrite this query for semantic search: "${userQuery}"`,
    maxTokens: 50,
  });

  // Search with improved query
  return await cortex.memory.search(agentId, improved, {
    embedding: await embed(improved),
    limit: 10,
  });
}

Performance Considerations

Embedding Generation

Latency:

OpenAI API: 50-200ms
Cohere API: 50-150ms
Local models: 10-50ms

Throughput:

OpenAI: 3000 RPM (batching helps)
Cohere: 10,000 RPM
Local: Unlimited (CPU-bound)

Cost:

OpenAI 3072-dim: $0.13/1M tokens (~$0.0001 per embedding)
OpenAI 1536-dim: $0.02/1M tokens (~$0.00002 per embedding)
Local: Free (hardware cost)

Vector Search Performance

Convex vector search:

< 100ms for millions of vectors
Pre-filtering with filterFields (much faster)
Cosine similarity (optimized)

// With filterFields (fast)
.vectorIndex("by_embedding", {
  vectorField: "embedding",
  dimensions: 3072,
  filterFields: ["agentId"],  // ← Searches only agent's vectors
})

// 1M total vectors, 1K per agent
// Without filtering: Searches 1M vectors (~200ms)
// With filtering: Searches 1K vectors (~10ms)
// 20× faster!

Embedding Strategies by Use Case

Chatbot (User-Agent)

// Store user message with embedding
await cortex.memory.remember({
  memorySpaceId: "chatbot",
  conversationId: "conv-123",
  userMessage: req.body.message,
  agentResponse: response,
  userId: req.user.id,
  userName: req.user.name,
  generateEmbedding: embed, // Provide embedding function
});

// Semantic search for context
const context = await cortex.memory.search("chatbot", req.body.message, {
  embedding: await embed(req.body.message),
  userId: req.user.id,
  limit: 5,
});

Recommendation: 1536-dim (balanced quality/cost)

Knowledge Base

// Store KB articles
await cortex.immutable.store({
  type: "kb-article",
  id: "refund-policy",
  data: {
    title: "Refund Policy",
    content: "Detailed refund policy...",
  },
});

// Index for search (with embedding)
await cortex.vector.store("kb-agent", {
  content: "Refund Policy: Detailed refund policy...",
  embedding: await embed("refund policy content"),
  immutableRef: { type: "kb-article", id: "refund-policy" },
  metadata: { importance: 90, tags: ["policy"] },
});

// All agents can search
const results = await cortex.memory.search("any-agent", "refund policy", {
  embedding: await embed("refund policy"),
});

Recommendation: 3072-dim (best quality for reused content)

High-Volume Logs

// System logs without embeddings
await cortex.vector.store("agent-1", {
  content: "System started at 10:00 AM",
  // No embedding - saves cost
  source: { type: "system", timestamp: new Date() },
  metadata: { importance: 10, tags: ["debug"] },
});

// Keyword search works fine
const logs = await cortex.memory.search("agent-1", "started", {
  // No embedding - keyword search
  tags: ["debug"],
  limit: 100,
});

Recommendation: No embeddings (keyword search sufficient)

Cloud Mode Embedding Pipeline

How Auto-Embed Works

graph LR
    A[SDK: autoEmbed=true] --> B[Cortex Cloud API]
    B --> C[Generate Embedding]
    C --> D[OpenAI API]
    D --> C
    C --> E[Store in Convex]
    E --> F[Your Convex Instance]

Process:

Developer calls cortex.memory.store({ autoEmbed: true })
Cortex Cloud receives request
Generates embedding (OpenAI text-embedding-3-large, 3072-dim)
Stores in your Convex instance with embedding
Returns memory ID to developer

Benefits:

No API key management
Automatic retries and error handling
Model upgrades (when better models release)
Usage-based pricing ($0.02/1K tokens)

Dimension Migration

Changing Dimensions

If you need to change embedding dimensions:

// 1. Update schema
memories: defineTable({ ... })
  .vectorIndex("by_embedding", {
    dimensions: 3072,  // Was 1536
  });

// 2. Re-embed all memories
export const reEmbedAll = mutation({
  handler: async (ctx) => {
    const memories = await ctx.db.query("memories").collect();

    for (const memory of memories) {
      if (memory.embedding) {
        // Mark for re-embedding
        await ctx.db.patch(memory._id, {
          metadata: {
            ...memory.metadata,
            needsReEmbedding: true,
          },
        });
      }
    }
  },
});

// 3. Re-embed action
export const reEmbedMemory = action({
  args: { memoryId: v.id("memories") },
  handler: async (ctx, args) => {
    const memory = await ctx.runQuery(api.memories.get, {
      memoryId: args.memoryId,
    });

    // Generate new embedding
    const newEmbedding = await embed3072(memory.content);

    // Update
    await ctx.runMutation(api.memories.patch, {
      memoryId: args.memoryId,
      embedding: newEmbedding,
      metadata: {
        ...memory.metadata,
        needsReEmbedding: false,
      },
    });
  },
});

// 4. Run for all memories
const memories = await cortex.memory.list('agent-1');
for (const memory of memories.memories) {
  if (memory.metadata.needsReEmbedding) {
    await convexClient.action(api.memories.reEmbedMemory, {
      memoryId: memory.id,
    });
  }
}

Next Steps

Search Strategy - Multi-strategy search implementation
Dimension Strategies - Detailed dimension guide
Performance - Optimization techniques

Questions? Ask in GitHub Discussions or Discord.

Overview​

Embedding Models​

Supported Models (All)​

Dimension Strategy​

Choosing Dimensions​

Setting Dimensions in Convex​

Per-Agent Dimensions (Advanced)​

Direct Mode Integration​

Example: OpenAI Embeddings​

Example: Cohere Embeddings​

Example: Local Models​

Cloud Mode Integration​

Auto-Embeddings​

Embedding Storage​

In Convex Documents​

Optional Embeddings​

Search Behavior​

With Embeddings (Semantic Search)​

Without Embeddings (Keyword Search)​

Hybrid Search (Best of Both)​

Embedding Best Practices​

1. Consistent Dimensions​

2. Normalize Embeddings​

3. Batch Embedding Generation​

4. Cache Embeddings​

5. Handle Embedding Errors​

Dimension Considerations​

Storage Impact​

Quality vs Cost​

Migration Strategy​

Convex Vector Index Configuration​

Basic Configuration​

With Filter Fields (Recommended)​

Multiple Vector Indexes (Advanced)​

Embedding-Optional Architecture​

Progressive Enhancement​

Selective Embedding​

Query Embedding Strategy​

Query Expansion​

Query Rewriting​

Performance Considerations​

Embedding Generation​

Vector Search Performance​

Embedding Strategies by Use Case​

Chatbot (User-Agent)​

Knowledge Base​

High-Volume Logs​

Cloud Mode Embedding Pipeline​

How Auto-Embed Works​

Dimension Migration​

Changing Dimensions​

Next Steps​