Skip to main content

Fact Extraction

60-90% Token Savings

Instead of feeding entire conversation histories to your LLM, extract salient facts and store them efficiently. This enables:

  • Infinite context - Recall from millions of messages
  • Cost reduction - 60-90% fewer tokens per request
  • Better retrieval - Structured facts are more searchable
Tip

v0.30.0: Facts now support semantic search via embeddings. Configure embedding once at SDK init (or set CORTEX_EMBEDDING=true) and embeddings are auto-generated for all remember() and recall() calls - no manual embedding code needed!

Tip

v0.30.1: LLM extraction now returns enriched entities with semantic types (person, organization, place, product, concept) and relation triples that sync to the graph as typed edges (e.g., WORKS_AT, LOCATED_IN). A new EXTRACTED_WITH edge links Facts to their source Memory for bidirectional traceability.


Quick Example

Raw Conversation (402 tokens):
User: "I moved from Paris to London last week. I'm working at Acme Corp
as a senior engineer. My commute is 30 minutes on the tube."

Extracted Facts (45 tokens):
1. User moved from Paris to London (last week)
2. User works at Acme Corp as Senior Engineer
3. User's commute: 30 minutes via tube

Storage: 89% reduction

Extraction Modes

Configure an LLM once, facts extracted automatically:

import { Cortex } from '@cortex-platform/sdk';
import { openai } from '@ai-sdk/openai';

// Configure Cortex with LLM client
const cortex = new Cortex({
convexUrl: process.env.CONVEX_URL!,
llm: openai('gpt-4'), // Enables automatic fact extraction
});

// Facts extracted automatically - no callback needed!
await cortex.memory.remember({
memorySpaceId: 'user-123-space',
conversationId: 'conv-123',
userMessage: 'My favorite color is blue',
agentResponse: 'Got it!',
userId: 'user-123',
userName: 'Alice',
// No extractFacts needed - LLM extracts automatically!
});

// Result: Facts extracted + belief revision applied + stored
Zero Code per Call

Configure LLM once, get automatic extraction + deduplication + belief revision for all remember() calls. This is the batteries-included approach.

Skip extraction for specific calls:

// Skip fact extraction for this conversation
await cortex.memory.remember({
// ... params
skipLayers: ['facts'], // Don't extract facts this time
});

Fact Schema

ParameterTypeRequiredDefaultDescription
factstringYesThe extracted fact statement
factTypestringYes'preference' | 'identity' | 'knowledge' | 'relationship' | 'event'
subjectstringNoEntity the fact is about (e.g., 'user-123')
predicatestringNoRelationship type (e.g., 'favorite_color')
objectstringNoRelated value (e.g., 'blue')
confidencenumberYesConfidence score (0-100)
tagsstring[]NoCategorization tags
aliasesstring[]NoAlternative phrasings for search

Belief Revision

Automatic Conflict Resolution (v0.24.0+)

When you extract a fact that conflicts with an existing fact, Cortex handles it automatically:

  • ADD: New fact (no conflict)
  • SUPERSEDE: Old fact marked as superseded, new fact is current
  • MERGE: Compatible facts combined
  • IGNORE: Duplicate fact skipped
// Old fact: "User prefers blue"
// New conversation: "Actually, I like purple now"

await cortex.memory.remember({
memorySpaceId: 'user-123-space',
conversationId: 'conv-456',
userMessage: 'Actually, I like purple now',
agentResponse: 'Updated!',
userId: 'user-123',
userName: 'Alice',

extractFacts: async () => [{
fact: 'User prefers purple',
factType: 'preference',
subject: 'user-123',
predicate: 'favorite_color',
object: 'purple',
confidence: 95,
}],

beliefRevision: true, // Automatically supersedes old "blue" fact
});

Preview Conflicts

const conflicts = await cortex.facts.checkConflicts({
memorySpaceId: 'user-123-space',
fact: 'User prefers purple',
subject: 'user-123',
predicate: 'favorite_color',
});

if (conflicts.hasConflicts) {
console.log('Would supersede:', conflicts.conflictingFacts);
}

Querying Facts

// Search facts directly
const facts = await cortex.facts.search(spaceId, 'color preference');

// Or semantic search via memory API (embedding auto-generated!)
const memories = await cortex.memory.search(spaceId, 'what colors?', {
// No manual embedding needed - auto-generated from query!
contentType: 'fact',
limit: 5,
});

Deduplication

Automatic Deduplication (v0.22.0+)

memory.remember() automatically deduplicates facts using semantic matching. Same fact stated differently won't create duplicates.

StrategyHow it WorksSpeedAccuracy
semanticEmbedding similarity (default)SlowerHighest
structuralSubject + predicate + object matchFastMedium
exactNormalized text matchFastestLow
// Use structural deduplication (faster)
await cortex.memory.remember({
// ... params
factDeduplication: 'structural',
});

// Disable deduplication
await cortex.memory.remember({
// ... params
factDeduplication: false,
});

Fact History

// Get change history for a specific fact
const changes = await cortex.facts.history('fact-123');

changes.forEach(event => {
console.log(`${event.action} at ${new Date(event.timestamp).toISOString()}`);
console.log(` Reason: ${event.reason}`);
});

// Get the supersession chain (evolution over time)
const chain = await cortex.facts.getSupersessionChain('fact-123');
// Returns: [oldest] -> [older] -> [current]

// Get all versions of a fact
const versions = await cortex.facts.getHistory(spaceId, 'fact-123');
versions.forEach(v => console.log(`v${v.version}: ${v.fact}`));

LLM Model Benchmarks

Benchmark Methodology

All models tested under identical conditions using Cortex's standard extraction prompt with structured JSON output. Metrics include extracted facts, enriched entities, relation triples, and end-to-end latency. Results may vary based on conversation complexity and model updates.

Cortex supports multiple OpenAI models for fact extraction. Here are the performance characteristics:

ModelFactsEntitiesRelationsAvg Latency
gpt-4o-2024-11-201315611.6s
gpt-5.112151318.6s
gpt-5.212181420.3s
gpt-5-mini-2025-08-0712261950.1s
gpt-5-nano122214100.5s
gpt-4o1110113.0s
gpt-4o-mini-2024-07-181113015.8s
gpt-4o-mini1115016.0s
gpt-59181182.7s

OpenAI model comparison for fact extraction (January 2026)

Recommendations

Best Balance

gpt-4o-2024-11-20 offers top-tier quality (13 facts) with fastest latency (11.6s)

Best Quality

gpt-5-mini extracts more entities/relations but at 5x latency cost

Budget Option

gpt-4o-mini provides good extraction at lower cost, but extracts fewer relations

Production Tip

Use faster models for high-volume workloads; premium models for complex extraction

GPT-5 Series

gpt-5.1 and gpt-5.2 offer good balance between quality and speed for mid-tier workloads


Best Practices

Tune Confidence Threshold
// Discard low-confidence facts
extractFacts: async (user, agent) => {
const facts = await extractWithLLM(user, agent);
return facts.filter(f => f.confidence >= 70);
},
Use Structured Facts for Dedup
// Good: Enables structural deduplication
{ fact: 'User prefers dark mode', subject: 'user-123', predicate: 'prefers', object: 'dark-mode' }

// Less good: No structure, relies on semantic dedup only
{ fact: 'User prefers dark mode' }
Always Link to Conversations
// Facts automatically get conversationRef when using remember()
// This enables tracing facts back to source for audit

Token Savings Analysis

async function analyzeTokenSavings(conversationId: string) {
const conv = await cortex.conversations.get(conversationId);
const rawTokens = estimateTokens(conv.messages.map(m => m.text).join('\n'));

const facts = await cortex.facts.list({
'conversationRef.conversationId': conversationId,
});
const factTokens = estimateTokens(facts.map(f => f.fact).join('\n'));

const savings = ((rawTokens - factTokens) / rawTokens) * 100;

return { rawTokens, factTokens, savingsPercent: savings.toFixed(1) };
}

// Example: { rawTokens: 1250, factTokens: 125, savingsPercent: '90.0' }

Next Steps