Fact Integration in Memory API

Last Updated: 2025-10-31

Automatic extraction and storage of structured facts from conversations, integrated into the complete three-layer memory pipeline.

Overview

The Facts layer completes Cortex's three-layer architecture:

Layer 1 (ACID): Raw conversation storage
Layer 2 (Vector): Searchable memory embeddings
Layer 3 (Facts): Structured knowledge extraction

When you call memory.remember() with an extractFacts callback, Cortex automatically:

Stores the conversation in ACID
Creates searchable memory embeddings in Vector
Extracts and stores structured facts
Links all three layers together via sourceRef
Syncs to graph database (if configured)

Quick Start

import { Cortex } from "@cortexclick/cortex";

const cortex = new Cortex({ client: convexClient });

// Define fact extraction function
const extractFacts = async (userMessage: string, agentResponse: string) => {
  // Use LLM or custom logic to extract facts
  return [
    {
      fact: "User prefers dark mode",
      factType: "preference",
      subject: "user-123",
      predicate: "prefers",
      object: "dark mode",
      confidence: 95,
      tags: ["ui", "settings"],
    },
  ];
};

// Remember with fact extraction
const result = await cortex.memory.remember({
  memorySpaceId: "agent-1",
  conversationId: "conv-123",
  userMessage: "I like dark mode better",
  agentResponse: "I'll remember your preference!",
  userId: "user-123",
  userName: "Alex",
  extractFacts, // Facts automatically extracted and stored
});

console.log(`Extracted ${result.facts.length} facts`);
// result.facts contains the extracted, stored facts with IDs

Fact Structure

Each fact includes:

interface ExtractedFact {
  fact: string; // Human-readable fact statement
  factType: FactType; // Category of fact
  subject?: string; // Primary entity (e.g., "user-123")
  predicate?: string; // Relationship (e.g., "prefers")
  object?: string; // Secondary entity (e.g., "dark mode")
  confidence: number; // 0-100: extraction confidence
  tags?: string[]; // Classification tags
}

type FactType =
  | "preference"
  | "identity"
  | "knowledge"
  | "relationship"
  | "event"
  | "custom";

Fact Types

Type	Use Case	Example
preference	User likes/dislikes	"User prefers dark mode"
identity	Who/what someone is	"User is a software engineer"
knowledge	Information/skills	"User knows Python and TypeScript"
relationship	Connections between entities	"User works_at Google"
event	Time-based occurrences	"User joined on 2025-01-15"
custom	Domain-specific facts	"User has_subscription premium"

Integration with Memory Operations

Remember with Facts

Facts are automatically stored and linked:

const result = await cortex.memory.remember({
  memorySpaceId: "agent-1",
  conversationId: "conv-123",
  userMessage: "I'm a developer at Microsoft working on AI",
  agentResponse: "Great! What projects are you working on?",
  userId: "user-123",
  userName: "Alex",
  extractFacts: async (user, agent) => {
    return [
      {
        fact: "User is a developer",
        factType: "identity",
        subject: "user-123",
        confidence: 95,
        tags: ["profession"],
      },
      {
        fact: "User works at Microsoft",
        factType: "relationship",
        subject: "user-123",
        predicate: "works_at",
        object: "Microsoft",
        confidence: 98,
        tags: ["employment"],
      },
      {
        fact: "User works on AI projects",
        factType: "knowledge",
        subject: "user-123",
        confidence: 90,
        tags: ["expertise"],
      },
    ];
  },
});

// Access extracted facts
result.facts.forEach((fact) => {
  console.log(`Fact: ${fact.fact}`);
  console.log(`Linked to memory: ${fact.sourceRef?.memoryId}`);
  console.log(`Linked to conversation: ${fact.sourceRef?.conversationId}`);
});

Retrieve with Fact Enrichment

Facts are automatically included in enriched results:

// Get single memory with facts
const enriched = await cortex.memory.get("agent-1", "mem-123", {
  includeConversation: true, // Enables enrichment including facts
});

if (enriched.facts) {
  console.log(`Found ${enriched.facts.length} related facts`);
}

// Search with fact enrichment
const results = await cortex.memory.search("agent-1", "user preferences", {
  embedding: await embed("user preferences"),
  enrichConversation: true, // Facts automatically included
});

results.forEach((result) => {
  console.log(`Memory: ${result.memory.content}`);
  if (result.facts) {
    result.facts.forEach((fact) => {
      console.log(`  - Fact: ${fact.fact} (${fact.confidence}% confidence)`);
    });
  }
});

Cascade Delete

Facts are automatically deleted when memories are removed:

// Delete memory - facts cascade deleted
const deleteResult = await cortex.memory.forget("agent-1", "mem-123", {
  deleteConversation: true,
});

console.log(`Deleted ${deleteResult.factsDeleted} facts`);
console.log(`Fact IDs: ${deleteResult.factIds.join(", ")}`); // Audit trail

Fact Extraction Strategies

Strategy 1: LLM-Based Extraction

Use an LLM to intelligently extract facts:

import OpenAI from "openai";

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const extractFactsWithLLM = async (
  userMessage: string,
  agentResponse: string,
) => {
  const prompt = `Extract structured facts from this conversation:
User: ${userMessage}
Agent: ${agentResponse}

Return facts as JSON array with format:
[{
  "fact": "clear fact statement",
  "factType": "preference|identity|knowledge|relationship|event|custom",
  "subject": "who/what this is about",
  "predicate": "relationship type (optional)",
  "object": "related entity (optional)",
  "confidence": 0-100,
  "tags": ["relevant", "tags"]
}]`;

  const response = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [{ role: "user", content: prompt }],
    response_format: { type: "json_object" },
  });

  const extracted = JSON.parse(response.choices[0].message.content);
  return extracted.facts || [];
};

// Use in remember
await cortex.memory.remember({
  memorySpaceId: "agent-1",
  conversationId: "conv-123",
  userMessage: "I'm allergic to peanuts and love Italian food",
  agentResponse: "I'll remember your dietary needs!",
  userId: "user-123",
  userName: "Alex",
  extractFacts: extractFactsWithLLM,
});

Strategy 2: Rule-Based Extraction

Use patterns and keywords for deterministic extraction:

const extractFactsWithRules = async (
  userMessage: string,
  agentResponse: string,
) => {
  const facts = [];
  const text = userMessage.toLowerCase();

  // Preference patterns
  if (
    text.includes("prefer") ||
    text.includes("like") ||
    text.includes("love")
  ) {
    const preferenceMatch = text.match(/(prefer|like|love)s?\s+([^.,!?]+)/);
    if (preferenceMatch) {
      facts.push({
        fact: `User prefers ${preferenceMatch[2]}`,
        factType: "preference",
        confidence: 85,
        tags: ["auto-extracted"],
      });
    }
  }

  // Identity patterns
  if (text.includes("i am") || text.includes("i'm")) {
    const identityMatch = text.match(/i(?:'m| am)\s+(?:a|an)?\s*([^.,!?]+)/);
    if (identityMatch) {
      facts.push({
        fact: `User is ${identityMatch[1]}`,
        factType: "identity",
        confidence: 90,
        tags: ["auto-extracted"],
      });
    }
  }

  return facts;
};

Strategy 3: Hybrid Approach

Combine rule-based and LLM extraction:

const extractFactsHybrid = async (
  userMessage: string,
  agentResponse: string,
) => {
  // Quick rule-based extraction for common patterns
  const ruleFacts = await extractFactsWithRules(userMessage, agentResponse);

  // Use LLM only if message is complex or rule-based found nothing
  if (ruleFacts.length === 0 && userMessage.length > 50) {
    return await extractFactsWithLLM(userMessage, agentResponse);
  }

  return ruleFacts;
};

Confidence Scoring

Confidence indicates extraction reliability (0-100):

Range	Interpretation	Use Case
90-100	Very high confidence	Direct quotes, explicit statements
70-89	High confidence	Clear implications, strong context
50-69	Medium confidence	Reasonable inferences
30-49	Low confidence	Weak signals, ambiguous
0-29	Very low confidence	Speculative, requires verification

const extractFactsWithConfidence = async (
  userMsg: string,
  agentMsg: string,
) => {
  const facts = [];

  // Direct statement - high confidence
  if (userMsg.includes("My email is")) {
    facts.push({
      fact: "User provided email address",
      factType: "identity",
      confidence: 98, // Very high - explicit
      tags: ["contact"],
    });
  }

  // Implicit preference - medium confidence
  if (userMsg.includes("usually use")) {
    facts.push({
      fact: "User has usage preference",
      factType: "preference",
      confidence: 65, // Medium - implied
      tags: ["behavior"],
    });
  }

  return facts;
};

Error Handling

Fact extraction errors don't break memory storage:

const extractFacts = async (userMessage: string, agentResponse: string) => {
  try {
    // Attempt extraction
    const response = await llm.extract(userMessage, agentResponse);
    return response.facts;
  } catch (error) {
    // Extraction failed - log but don't throw
    console.warn("Fact extraction failed:", error);
    return []; // Return empty array, memory still saved
  }
};

// Memory is stored even if fact extraction fails
const result = await cortex.memory.remember({
  memorySpaceId: "agent-1",
  conversationId: "conv-123",
  userMessage: "Test",
  agentResponse: "Response",
  userId: "user-123",
  userName: "Alex",
  extractFacts, // May fail, that's OK
});

// result.memories will always be populated
// result.facts may be empty if extraction failed

Querying Facts

Access facts through the Facts API or via memory enrichment:

// Direct fact query
const userFacts = await cortex.facts.list({
  memorySpaceId: "agent-1",
  subject: "user-123",
  factType: "preference",
});

// Search facts
const foodFacts = await cortex.facts.search("agent-1", "food preferences", {
  factType: "preference",
  minConfidence: 70,
});

// Via memory enrichment (recommended)
const memories = await cortex.memory.search("agent-1", "user preferences", {
  enrichConversation: true, // Facts automatically included
});

Best Practices

1. Extract Atomic Facts

// ❌ Bad: Compound fact
{
  fact: "User is 25 years old from California and likes hiking",
  factType: "identity",
  confidence: 80,
}

// ✅ Good: Atomic facts
[
  {
    fact: "User is 25 years old",
    factType: "identity",
    confidence: 95,
  },
  {
    fact: "User is from California",
    factType: "identity",
    confidence: 95,
  },
  {
    fact: "User likes hiking",
    factType: "preference",
    confidence: 90,
  },
]

2. Use Consistent Subject IDs

// ✅ Good: Consistent subject identification
const extractFacts = async (userMsg, agentMsg, userId) => {
  return [
    {
      fact: "User prefers email notifications",
      factType: "preference",
      subject: userId, // Always use same ID format
      confidence: 90,
    },
  ];
};

3. Tag for Organization

const facts = [
  {
    fact: "User speaks English and Spanish",
    factType: "knowledge",
    confidence: 95,
    tags: ["language", "communication", "multilingual"], // Multiple relevant tags
  },
];

4. Set Appropriate Confidence

// Base confidence on source reliability
const extractFacts = async (userMsg, agentMsg) => {
  if (userMsg.startsWith("My name is")) {
    return [
      {
        fact: "User stated their name",
        confidence: 99, // Direct statement
      },
    ];
  }

  if (userMsg.includes("might prefer")) {
    return [
      {
        fact: "User may have preference",
        confidence: 50, // Uncertain
      },
    ];
  }
};

Performance Considerations

Async Extraction

Fact extraction happens in parallel with other operations:

// Extraction doesn't block memory storage
const result = await cortex.memory.remember({
  memorySpaceId: "agent-1",
  conversationId: "conv-123",
  userMessage: "Long message with lots of facts...",
  agentResponse: "Response",
  userId: "user-123",
  userName: "Alex",
  extractFacts: async (user, agent) => {
    // This can take time without blocking
    return await slowLLMExtraction(user, agent);
  },
});
// Memories stored immediately, facts stored in parallel

Batch Extraction

For high-volume scenarios, batch fact extraction:

const pendingExtractions = [];

// Queue extractions
for (const conversation of conversations) {
  pendingExtractions.push(
    cortex.memory.remember({
      ...conversation,
      extractFacts: quickExtraction, // Fast, simple extraction
    }),
  );
}

// Process in parallel
await Promise.all(pendingExtractions);

Next Steps

Semantic Search - Query facts via memory search
Context Chains - Propagate facts across agents
Facts API Reference - Complete facts API documentation
Memory API Reference - Memory operations with fact integration
Fact Extraction (Storage Trade-offs) - Detailed analysis of facts vs raw storage

Questions? Ask in GitHub Discussions or Discord.

Overview​

Quick Start​

Fact Structure​

Fact Types​

Integration with Memory Operations​

Remember with Facts​

Retrieve with Fact Enrichment​

Cascade Delete​

Fact Extraction Strategies​

Strategy 1: LLM-Based Extraction​

Strategy 2: Rule-Based Extraction​

Strategy 3: Hybrid Approach​

Confidence Scoring​

Error Handling​

Querying Facts​

Best Practices​

1. Extract Atomic Facts​

2. Use Consistent Subject IDs​

3. Tag for Organization​

4. Set Appropriate Confidence​

Performance Considerations​

Async Extraction​

Batch Extraction​

Next Steps​