Memory Orchestration

The heart of Cortex: automatic orchestration across all memory layers with batteries-included defaults.

Overview

Cortex is an AI memory orchestration platform that manages the complete lifecycle of agent memory. Instead of manually coordinating conversations, vector stores, facts, and graphs, you call a single method and Cortex handles everything.

Building AI agents with persistent memory typically requires:

// Without Cortex: Manual coordination
await storeConversation(conversationId, userMessage, agentResponse);
await generateEmbedding(userMessage + agentResponse);
await storeInVectorDB(embedding, content, metadata);
await extractFacts(userMessage, agentResponse);
await storeFacts(facts, conversationId);
// Graph sync handled automatically when CORTEX_GRAPH_SYNC=true (v0.29.0+)
await linkEverythingTogether(ids);
// ...and handle failures, retries, consistency

// With Cortex: One call, full orchestration
await cortex.memory.remember({
  memorySpaceId: "user-123-personal",
  conversationId: "conv-456",
  userMessage: "I prefer TypeScript for backend work",
  agentResponse: "I'll remember that preference!",
  userId: "user-123",
  userName: "Alex",
});
// Conversations, vectors, facts, and graph all updated automatically

Core Concept: The 4-Layer Architecture

Cortex organizes memory into four layers, each serving a specific purpose:

Memory Orchestration Layers

Layer 4: Convenience API

cortex.memory.remember() • cortex.memory.recall() • Start here

Layer 3: Facts Store

LLM-extracted knowledge • Belief revision • 60-90% token savings

Layer 2: Vector Index

Semantic search • Embedded memories • Fast retrieval

Layer 1: ACID Stores

Conversations (L1a) • Immutable KB (L1b) • Mutable Config (L1c)

Graph Database (Optional)

Neo4j/Memgraph • Multi-hop traversal • Entity relationships

Data flows from convenience API through extraction to storage

The Four Layers

Purpose: Developer experience - orchestrates all layers

cortex.memory.* handles everything automatically
Recommended starting point for all use cases

// Does L1a + L2 + L3 + graph sync
await cortex.memory.remember({...});

// Searches L2 + L3 + graph expansion
await cortex.memory.recall({...});

Purpose: Extracted knowledge with 60-90% token savings

LLM-extracted structured facts
Versioned and searchable
Belief revision for conflicting facts

await cortex.facts.store(memorySpaceId, {
  fact: "User prefers dark mode",
  factType: "preference",
  confidence: 95,
});

Purpose: Semantic search

Embedded memories for fast retrieval
References Layer 1 stores
Scoped per memory space

await cortex.vector.store(memorySpaceId, {
  content: "...",
  embedding: [...],
  conversationRef: {...},  // Links to L1a
});

await cortex.vector.search(memorySpaceId, query);

Purpose: Immutable source of truth

Three sub-stores:

L1a: Conversations - Raw message history (per memory space)
L1b: Immutable - KB articles, policies (shared)
L1c: Mutable - Config, counters (shared, live)

// Layer 1a
await cortex.conversations.addMessage(conversationId, {...});

// Layer 1b
await cortex.immutable.store({...});

// Layer 1c
await cortex.mutable.set(namespace, key, value);

Which Layer Should I Use?

Start with Layer 4 (cortex.memory.*). It handles all orchestration for you.

Use lower layers only when you need:

Direct vector control (Layer 2)
Custom fact management (Layer 3)
Raw conversation access (Layer 1)

The Two Core Methods

remember() - Store with Full Orchestration

remember() is the primary method for storing conversations. It orchestrates all layers automatically:

const result = await cortex.memory.remember({
  // Required
  memorySpaceId: "user-123-personal",
  conversationId: "conv-456",
  userMessage: "My favorite color is blue and I work at Acme Corp",
  agentResponse: "Nice! I'll remember your preference and workplace.",
  userId: "user-123",
  userName: "Alex",

  // Optional: Custom fact extraction
  extractFacts: async (userMsg, agentResp) => [
    {
      fact: "User's favorite color is blue",
      factType: "preference",
      subject: "user-123",
      predicate: "favoriteColor",
      object: "blue",
      confidence: 95,
    },
  ],
});

What remember() does automatically:

Validates inputs

Checks memorySpaceId, userId/agentId ownership

Auto-registers entities

Creates memory space, user profile, agent if needed

Stores in Layer 1a

Adds messages to ACID conversation store

Indexes in Layer 2

Creates vector memories with conversationRef

Extracts facts (Layer 3)

If LLM configured, extracts facts with belief revision

Syncs to graph

If graph adapter configured, syncs entities and relationships

recall() - Retrieve with Full Orchestration

recall() searches across all layers, merges results, and returns ready-to-use context:

const result = await cortex.memory.recall({
  memorySpaceId: "user-123-personal",
  query: "What are user's preferences?",

  // Optional: Pre-computed embedding for better semantic search
  embedding: await embed("What are user's preferences?"),

  // Optional: Filter options
  userId: "user-123",

  // NEW in v0.31.0: Configurable limits (or use env vars)
  limits: {
    memories: 20,  // Max vector memories
    facts: 15,     // Max facts
    total: 30,     // Final result cap
  },
});

// Result includes unified context from all layers
console.log(result.items);          // Merged, deduped, ranked results
console.log(result.sources.vector); // Vector memories breakdown
console.log(result.sources.facts);  // Structured facts breakdown
console.log(result.sources.graph);  // Graph-expanded entities
console.log(result.context);        // Ready-to-inject LLM context string

What recall() does automatically:

Searches vector memories (Layer 2)

Semantic search with optional embedding

Searches facts directly (Layer 3)

Facts as primary source, not just enrichment

Queries graph relationships

If configured, discovers related context via entity connections

Merges and ranks results

Multi-signal scoring algorithm with deduplication

Formats for LLM injection

Returns ready-to-use context string

Batteries-Included Defaults

Both remember() and recall() work out-of-the-box with sensible defaults:

remember() Defaults

Feature	Default	What It Does
Memory Space	Auto-register	Creates space if it doesn't exist
User Profile	Auto-create	Creates user profile for new users
Conversation	Always stored	Messages saved to ACID layer
Vector Memory	Always created	Searchable memory entry
Facts	Extracted if configured	LLM extracts structured facts
Graph Sync	Enabled if configured	Syncs entities and relationships
Belief Revision	Enabled if LLM configured	Handles conflicting facts

recall() Defaults

Feature	Default	What It Does
Vector Search	Enabled	Searches Layer 2
Facts Search	Enabled	Searches Layer 3 directly
Graph Expansion	Enabled if configured	Discovers related context
LLM Context	Generated	Ready-to-inject string
Deduplication	Enabled	Removes duplicates
Ranking	Enabled	Multi-signal scoring
Configurable Limits	v0.31.0+	Per-source data caps

New in v0.31.0: Configurable Limits

Control how much data recall() fetches from each source to prevent Convex read limit errors:

await cortex.memory.recall({
  memorySpaceId: "user-space",
  query: "preferences",
  limits: {
    memories: 20,  // Vector memories
    facts: 15,     // Facts
    graphHops: 2,  // Graph traversal depth
    total: 30,     // Final result cap
  },
});

Configure defaults via environment variables (CORTEX_RECALL_LIMIT_*) or override per-call. See Configuration.

Quick Start

Basic Usage

import { Cortex } from "@cortexmemory/sdk";

const cortex = new Cortex({
  convexUrl: process.env.CONVEX_URL!,
});

// Store a conversation
await cortex.memory.remember({
  memorySpaceId: "my-agent",
  conversationId: "conv-123",
  userMessage: "Hello! I'm building a Next.js app",
  agentResponse: "Great! I can help with Next.js development.",
  userId: "user-1",
  userName: "Developer",
});

// Retrieve relevant context
const result = await cortex.memory.recall({
  memorySpaceId: "my-agent",
  query: "What is the user working on?",
});

console.log(result.context);
// "User is building a Next.js app..."

With Embeddings

import { embed } from "@ai-sdk/openai";

await cortex.memory.remember({
  memorySpaceId: "my-agent",
  conversationId: "conv-123",
  userMessage: "I prefer TypeScript over JavaScript",
  agentResponse: "TypeScript is great for type safety!",
  userId: "user-1",
  userName: "Developer",

  // Add embedding for semantic search
  generateEmbedding: async (content) => {
    const { embedding } = await embed({
      model: openai.embedding("text-embedding-3-small"),
      value: content,
    });
    return embedding;
  },
});

Streaming Support

For real-time AI responses, use rememberStream():

import { streamText } from "ai";
import { openai } from "@ai-sdk/openai";

const response = await streamText({
  model: openai("gpt-4"),
  messages: [{ role: "user", content: "Tell me about AI" }],
});

// Store streaming response automatically
const result = await cortex.memory.rememberStream({
  memorySpaceId: "my-agent",
  conversationId: "conv-123",
  userMessage: "Tell me about AI",
  responseStream: response.textStream,
  userId: "user-1",
  userName: "User",
});

console.log(result.fullResponse); // Complete response text

The Power of Orchestration

Automatic Cross-Layer Linking

When you call remember(), all layers are automatically linked:

const result = await cortex.memory.remember({
  memorySpaceId: "my-agent",
  conversationId: "conv-123",
  userMessage: "I like Python",
  agentResponse: "Python is great!",
  userId: "user-1",
  userName: "Dev",
});

// Result contains IDs from all layers
console.log(result.conversation); // ACID message IDs
console.log(result.memories);     // Vector memory entries
console.log(result.facts);        // Extracted facts

// Each vector memory links back to its ACID source
// Each fact links to its source conversation
// Graph nodes link to their Convex counterparts

One call deletes from all layers:

// Delete user and all their data across ALL layers
await cortex.users.delete("user-123", { cascade: true });

// Automatically deletes from:
// - User profile
// - All conversations
// - All vector memories
// - All facts
// - All graph nodes

Multi-Tenancy

All operations are automatically scoped to the tenant:

const cortex = new Cortex({
  convexUrl: process.env.CONVEX_URL!,
  auth: {
    userId: "user-123",
    tenantId: "tenant-acme",  // All operations scoped to this tenant
  },
});

// Everything is automatically filtered by tenant
await cortex.memory.remember({...});  // Stored in tenant-acme
await cortex.memory.recall({...});    // Only searches tenant-acme

API Namespaces

cortex.memory.*

Layer 4: Convenience API (start here)

cortex.vector.*

Layer 2: Direct vector operations

cortex.facts.*

Layer 3: Fact operations

cortex.conversations.*

Layer 1a: ACID conversations

cortex.immutable.*

Layer 1b: Shared knowledge

cortex.mutable.*

Layer 1c: Live state

Advanced: Skipping Layers

For specific use cases, you can skip certain layers:

await cortex.memory.remember({
  memorySpaceId: "my-agent",
  conversationId: "conv-123",
  userMessage: "Quick note",
  agentResponse: "Got it!",
  userId: "user-1",
  userName: "User",

  // Skip specific layers
  skipLayers: ["facts", "graph"],
});

Skip	Effect
`users`	Don't auto-create user profile
`agents`	Don't auto-register agent
`conversations`	Don't store in ACID layer
`vector`	Don't create vector memory
`facts`	Don't extract facts
`graph`	Don't sync to graph

Summary

Memory Orchestration is the core value of Cortex:

One method to store (remember()) - Handles all layers automatically
One method to retrieve (recall()) - Unified search across all sources
Batteries included - Sensible defaults, minimal configuration
Full control when needed - Drop to lower layers for specific use cases
Enterprise ready - GDPR, multi-tenancy, resilience built-in

Overview​

Core Concept: The 4-Layer Architecture​

The Four Layers​

The Two Core Methods​

remember() - Store with Full Orchestration​

Validates inputs

Auto-registers entities

Stores in Layer 1a

Indexes in Layer 2

Extracts facts (Layer 3)

Syncs to graph

recall() - Retrieve with Full Orchestration​

Searches vector memories (Layer 2)

Searches facts directly (Layer 3)

Queries graph relationships

Merges and ranks results

Formats for LLM injection

Batteries-Included Defaults​

remember() Defaults​

recall() Defaults​

Quick Start​

Basic Usage​

With Embeddings​

Streaming Support​

The Power of Orchestration​

Automatic Cross-Layer Linking​

GDPR Cascade Deletion​

Multi-Tenancy​

API Namespaces​