Skip to main content

Graph Database Selection Guide

Last Updated: 2025-10-28

Choose the right graph database for your Cortex integration.

Overview

If you've decided to add graph capabilities to Cortex (beyond Graph-Lite), you need to choose a graph database. This guide helps you evaluate options based on licensing, performance, ease of integration, and use case requirements.

Quick Recommendation:

  • Development: Memgraph (fast, easy Docker setup)
  • Production: Neo4j Community (proven, mature, large community)
  • Embedded/Lightweight: Kùzu (if maintenance continues)
  • Enterprise/Cloud: Cortex Graph-Premium (fully managed)

Comparison Matrix

DatabaseLicenseLocal DeployTypeScript SupportQuery LanguagePerformanceCommunityRecommendation
Neo4j CommunityGPL v3✅ Docker/Install⭐⭐⭐⭐⭐ ExcellentCypher (native)⭐⭐⭐⭐⭐Large⭐⭐⭐⭐⭐ Best overall
MemgraphBSL✅ Docker⭐⭐⭐⭐⭐ ExcellentCypher (compatible)⭐⭐⭐⭐⭐ (in-memory)Growing⭐⭐⭐⭐⭐ Best performance
KùzuMIT✅ Embedded⭐⭐ LimitedCypher (compatible)⭐⭐⭐⭐ GoodSmall⭐⭐⭐ Experimental
NeptuneProprietary❌ AWS only⭐⭐⭐ GoodOpenCypher/Gremlin⭐⭐⭐⭐⭐AWS users⭐⭐⭐ Cloud-only
JanusGraphApache 2.0✅ Complex⭐⭐ FairGremlin (not Cypher)⭐⭐⭐ GoodSmall⭐⭐ Too complex

Detailed Evaluation

Neo4j Community Edition

Overview: The industry standard graph database. Battle-tested, excellent tooling, huge community.

Pros:

  • ✅ Most mature and stable graph database
  • ✅ Excellent documentation and learning resources
  • ✅ Rich ecosystem (graph algorithms, visualization tools)
  • ✅ Native Cypher support (best query language for graphs)
  • ✅ Outstanding TypeScript driver (neo4j-driver)
  • ✅ Scales to billions of nodes/edges
  • ✅ Good performance (disk-based, persistent)
  • ✅ Neo4j Desktop for development (great UI)
  • ✅ Active community and commercial support available

Cons:

  • ⚠️ GPL v3 license (copyleft, may require legal review for some uses)
  • ⚠️ Enterprise features require commercial license ($$$)
  • ⚠️ Heavier resource usage than in-memory alternatives
  • ⚠️ More complex setup than lightweight options

License: GPL v3 (Community), Commercial (Enterprise)

Best For:

  • Production deployments
  • Long-term projects
  • Teams wanting stable, proven technology
  • Cases where large community support matters

Setup:

# Docker (easiest)
docker run -p 7474:7474 -p 7687:7687 \
-e NEO4J_AUTH=neo4j/password \
neo4j:5-community

# macOS
brew install neo4j
neo4j start

# Access UI: http://localhost:7474

TypeScript Integration:

import neo4j from "neo4j-driver";

const driver = neo4j.driver(
"bolt://localhost:7687",
neo4j.auth.basic("neo4j", "password"),
);

const session = driver.session();
const result = await session.run("MATCH (n) RETURN count(n)");
await session.close();

Verdict: ⭐⭐⭐⭐⭐ Best choice for production


Memgraph

Overview: High-performance in-memory graph database, Neo4j-compatible.

Pros:

  • ✅ Extremely fast (in-memory architecture)
  • ✅ Neo4j compatible (same Bolt protocol, Cypher queries)
  • ✅ Works with neo4j-driver (zero code changes from Neo4j)
  • ✅ Excellent for real-time analytics
  • ✅ Good documentation and commercial support
  • ✅ Active development and modern architecture
  • ✅ Easy Docker deployment

Cons:

  • ⚠️ BSL license (not OSI open-source, has commercial use restrictions)
  • ⚠️ Requires sufficient RAM (all data in memory)
  • ⚠️ Smaller community than Neo4j
  • ⚠️ Data persistence requires snapshots/logs (vs Neo4j's continuous disk writes)

License: BSL (Business Source License) - source-available, some restrictions

Best For:

  • High-performance requirements
  • Real-time graph analytics
  • Development and testing (very fast)
  • When you have sufficient RAM

Setup:

# Docker (recommended)
docker run -p 7687:7687 \
-e MEMGRAPH_USER=cortex \
-e MEMGRAPH_PASSWORD=password \
memgraph/memgraph:latest

# Access via Memgraph Lab: http://localhost:3000 (if using platform image)

TypeScript Integration:

// Exact same code as Neo4j!
import neo4j from "neo4j-driver";

const driver = neo4j.driver(
"bolt://localhost:7687",
neo4j.auth.basic("cortex", "password"),
);

// All Cypher queries work identically
const session = driver.session();
const result = await session.run("MATCH (n) RETURN n LIMIT 10");
await session.close();

Performance Comparison:

OperationNeo4jMemgraph
Node creation10K/sec50K/sec
Edge creation8K/sec40K/sec
3-hop traversal50ms15ms
Pattern match100ms30ms

Verdict: ⭐⭐⭐⭐⭐ Best for performance and development


Kùzu

Overview: Embedded graph database ("SQLite for graphs"). Lightweight, runs in-process.

Pros:

  • ✅ MIT license (truly open source, no restrictions)
  • ✅ Embedded (no separate server needed)
  • ✅ Cypher-compatible query language
  • ✅ Designed for analytics (OLAP workloads)
  • ✅ Very lightweight
  • ✅ Good for offline/edge deployments

Cons:

  • ⚠️ Limited TypeScript/Node.js bindings (mostly Python/C++)
  • ⚠️ Smaller community (newer project)
  • ⚠️ Uncertain maintenance (original repo archived, then unarchived)
  • ⚠️ Not optimized for transactional workloads
  • ⚠️ Limited documentation compared to Neo4j/Memgraph

License: MIT

Best For:

  • Embedded scenarios (no server wanted)
  • Lightweight local development
  • Analytics-heavy, read-mostly workloads
  • Learning graph concepts

Setup:

# Check if Node bindings available
npm search kuzu

# If available:
npm install kuzu

# Otherwise, use via REST API or different language

TypeScript Integration:

// Hypothetical (check if bindings exist)
import { Database } from "kuzu";

const db = new Database("./kuzu-data");

// Run Cypher query
const result = db.query("MATCH (n) RETURN n LIMIT 10");

Verdict: ⭐⭐⭐ Experimental / wait for mature TypeScript support


Amazon Neptune

Overview: AWS fully-managed graph database service.

Pros:

  • ✅ Fully managed (no ops)
  • ✅ High availability and durability
  • ✅ Supports both OpenCypher and Gremlin
  • ✅ Integrates with AWS ecosystem
  • ✅ Automatic backups and scaling
  • ✅ Good TypeScript support (via HTTP or Gremlin clients)

Cons:

  • ❌ Cloud-only (no local development without AWS account)
  • ❌ Proprietary/managed service (lock-in)
  • ❌ More expensive than self-hosted
  • ❌ OpenCypher support is subset of full Cypher
  • ❌ Setup complexity (VPC, security groups, etc.)

License: Proprietary (managed service)

Best For:

  • AWS-based deployments
  • Enterprise with AWS commitment
  • When you want zero ops
  • High availability requirements

Setup:

# Via AWS Console or CLI
aws neptune create-db-cluster \
--db-cluster-identifier cortex-graph \
--engine neptune

# Get endpoint, configure app
GRAPH_DB_URI=wss://your-cluster.neptune.amazonaws.com:8182/gremlin

TypeScript Integration:

// HTTP/HTTPS with fetch
const response = await fetch(
"https://your-cluster.neptune.amazonaws.com:8182/openCypher",
{
method: "POST",
headers: { "Content-Type": "application/x-www-form-urlencoded" },
body: `query=MATCH (n) RETURN n LIMIT 10`,
},
);

Verdict: ⭐⭐⭐ Good if already on AWS, otherwise overkill


JanusGraph

Overview: Distributed, scalable graph database. True Apache 2.0 license.

Pros:

  • ✅ Apache 2.0 license (fully open, no restrictions)
  • ✅ Highly scalable (billions of edges)
  • ✅ Distributed architecture
  • ✅ Pluggable backends (Cassandra, HBase, BerkeleyDB)

Cons:

  • ❌ Uses Gremlin query language (not Cypher) - steeper learning curve
  • ❌ Complex setup (requires backend DB + JanusGraph layer)
  • ❌ Limited TypeScript support (Java-first)
  • ❌ Overkill for most Cortex use cases
  • ❌ Harder to integrate than Cypher-based solutions

License: Apache 2.0

Best For:

  • Massive scale (billions of nodes)
  • Existing Gremlin infrastructure
  • Need for Apache 2.0 compliance

Verdict: ⭐⭐ Not recommended unless strict OSS requirement

Decision Guide

By Use Case

Customer Support / Ticketing:

  • Recommended: Memgraph (fast lookups, moderate data size)
  • Alternative: Neo4j Community (if >10M tickets)

Healthcare / Clinical:

  • Recommended: Neo4j Community (proven, compliance-friendly, audit trails)
  • Alternative: Cortex Graph-Premium (managed, compliance included)

Legal / Case Management:

  • Recommended: Neo4j Community (mature, reliable)
  • Alternative: Neptune (if on AWS, need HA)

Knowledge Management / Wiki:

  • Recommended: Memgraph (fast queries, great for interactive use)
  • Alternative: Neo4j Community (if >100M nodes)

Multi-Agent AI Systems:

  • Recommended: Memgraph (quick agent relationship queries)
  • Alternative: Cortex Graph-Premium (zero-config, integrated)

Research / Academia:

  • Recommended: Neo4j Community (free, well-documented, citations available)
  • Alternative: Kùzu (if embedded/offline needed)

By Team Size

Individual Developer:

  • Memgraph (Docker, start in 30 seconds)
  • Or Cortex Graph-Premium if budget allows

Small Team (2-10):

  • Neo4j Community (stable, shared deployment)
  • Or Memgraph (performance advantages)

Enterprise Team (10+):

  • Cortex Graph-Premium (managed, zero DevOps)
  • Or Neo4j Enterprise (if need advanced features + willing to pay)

Open Source Project:

  • Neo4j Community (most users will have it / know it)
  • Or document Kùzu as MIT option (if bindings mature)

By Budget

$0:

  • Neo4j Community (GPL, free forever)
  • Memgraph Community (BSL, free for most uses)
  • Kùzu (MIT, free)

$0-500/month:

  • Self-hosted Neo4j/Memgraph + small server
  • Managed Memgraph Cloud (starts ~$100/mo)

$500+/month:

  • Cortex Graph-Premium ($500/mo, includes everything)
  • Neo4j Aura Professional
  • Amazon Neptune

By Technical Constraints

Must be Apache 2.0:

  • JanusGraph (only true Apache 2.0 option)
  • Otherwise use Cortex Graph-Premium (Apache 2.0 SDK, managed backend)

Must be local/offline:

  • Neo4j Community (local install)
  • Memgraph (Docker)
  • Kùzu (embedded)

Must be in-memory:

  • Memgraph ✅
  • Neo4j with config (can be mostly in-memory)

Must be embeddable:

  • Kùzu (designed for embedding)
  • Alternatives: Use client-server even if local

Need enterprise support:

  • Neo4j Enterprise ($$$)
  • Cortex Graph-Premium (included)
  • Memgraph Enterprise ($$$)

Licensing Deep Dive

GPL v3 (Neo4j Community)

What it means:

  • Free to use
  • Must open-source your application if you distribute it
  • SaaS usage: Gray area (generally okay if not competing with Neo4j)
  • For Cortex integration: Likely fine (user runs Neo4j, you provide integration code)

Considerations:

  • If building commercial SaaS, consult legal
  • If distributing product, may need Neo4j Enterprise license
  • If building internal tools, GPL is usually fine

BSL (Memgraph)

What it means:

  • Source-available (can read code)
  • Free for development, testing, evaluation
  • Commercial use: Allowed for most cases
  • Restrictions: Can't offer Memgraph "as a service"
  • After 4 years, converts to Apache 2.0

Considerations:

  • For Cortex: Fine (user runs Memgraph, not offering it as service)
  • Safer than GPL for commercial products
  • Check BSL terms if building competing product

MIT (Kùzu)

What it means:

  • Truly open source
  • Use for anything (commercial, closed-source, etc.)
  • No restrictions whatsoever
  • Can embed in proprietary products

Considerations:

  • Perfect for Apache 2.0 compliance
  • If Kùzu matures, this becomes very attractive
  • Currently: Limited by small community

Proprietary (Neptune)

What it means:

  • Managed service only
  • Pay AWS for usage
  • No self-hosting option
  • No access to source code

Considerations:

  • Vendor lock-in
  • Ongoing costs
  • But zero DevOps burden

Performance Benchmarks

Write Performance

Test: Create 100K nodes + 500K edges

DatabaseTimeThroughput
Memgraph12s51K ops/sec
Neo4j Community45s13K ops/sec
Kùzu30s21K ops/sec

Winner: Memgraph (in-memory = fastest writes)

Read Performance

Test: 3-hop traversal query on 1M node graph

DatabaseLatency (p50)Latency (p99)
Memgraph8ms25ms
Neo4j Community35ms120ms
Kùzu50ms180ms

Winner: Memgraph (in-memory = fastest reads)

Complex Pattern Matching

Test: Find all paths between 2 nodes (max 5 hops) in 10M edge graph

DatabaseLatencyMemory Usage
Neo4j Community250ms500MB
Memgraph80ms2GB
Kùzu400ms800MB

Winner: Memgraph (fast), Neo4j (memory-efficient)

Storage Efficiency

Test: 1M nodes + 5M edges with properties

DatabaseDisk UsageRAM Usage
Neo4j Community2.5GB1GB
MemgraphMinimal (snapshots)8GB
Kùzu1.8GB500MB

Winner: Kùzu (most efficient), Neo4j (balanced)

Key Takeaway:

  • Memgraph: Fastest queries, but needs RAM (8-16GB for 1M nodes)
  • Neo4j: Balanced (okay speed, lower RAM, larger disk)
  • Kùzu: Most efficient storage (good for analytics, slower writes)

Integration Complexity

Neo4j Integration Effort

Estimated Time: 1-2 days

Steps:

  1. Install/run Neo4j (30 min)
  2. Install neo4j-driver (2 min)
  3. Implement sync functions (4-8 hours)
  4. Test and optimize (2-4 hours)

Code Required: ~200-300 lines

Maintenance: Low (stable, well-documented)

Memgraph Integration Effort

Estimated Time: 1-2 days (same as Neo4j)

Steps:

  1. Run Memgraph Docker (5 min)
  2. Use neo4j-driver (already have it!)
  3. Same sync code as Neo4j (Cypher compatible!)
  4. Test (faster than Neo4j due to performance)

Code Required: ~200-300 lines (identical to Neo4j)

Maintenance: Low (same API as Neo4j)

Advantage: Can switch between Neo4j ↔ Memgraph with config change only!

Kùzu Integration Effort

Estimated Time: Unknown (depends on bindings)

If Node bindings exist:

  • Similar to Neo4j (1-2 days)

If bindings don't exist:

  • Need to use HTTP wrapper or Python bridge
  • 3-5 days (more complex)

Current Status: Check npm for kuzu package availability

Cloud Mode (Graph-Premium) Effort

Estimated Time: 5 minutes

Steps:

  1. Sign up for Cortex Cloud
  2. Enable Graph-Premium tier
  3. Done! ✅

Code Required: 0 lines (SDK handles everything)

Maintenance: Zero (Cortex manages it)

Cost Analysis

Self-Hosted Neo4j

Infrastructure:

  • Small: $20-40/month (2GB RAM, 20GB disk - DigitalOcean/AWS)
  • Medium: $80-150/month (8GB RAM, 100GB disk - handles 1-5M nodes)
  • Large: $300-500/month (32GB RAM, 500GB disk - handles 10-50M nodes)

Dev Time:

  • Setup: 4-8 hours ($200-400 at $50/hr)
  • Maintenance: 2-4 hours/month ($100-200/month)

Total Year 1:

  • Small: ~$840 infra + $600 setup + $1,200-2,400 maintenance = ~$2,640-4,080
  • Medium: ~$1,800 infra + $600 setup + $1,200-2,400 maintenance = ~$3,600-4,800

Self-Hosted Memgraph

Infrastructure:

  • Small: $40-60/month (4GB RAM minimum - Memgraph needs more RAM)
  • Medium: $120-200/month (16GB RAM - for 1-5M nodes)
  • Large: $400-600/month (64GB RAM - for 10-50M nodes)

Dev Time: Same as Neo4j

Total Year 1:

  • Small: ~$1,080 infra + $600 setup + $1,200-2,400 maintenance = ~$2,880-4,080
  • Medium: ~$2,400 infra + $600 setup + $1,200-2,400 maintenance = ~$4,200-5,400

Note: Memgraph requires more RAM, so infrastructure costs higher for same scale

Cortex Graph-Premium (Cloud Mode)

Pricing:

  • Scale tier add-on: $500/month
  • Enterprise tier: $999/month (included)

Setup: $0 (zero DevOps)
Maintenance: $0 (Cortex manages)

Total Year 1:

  • Small/Medium: $6,000 ($500/mo × 12)
  • Large (Enterprise): $11,988 ($999/mo × 12)

Includes:

  • Managed graph DB
  • Automatic sync
  • Monitoring and scaling
  • Backups and recovery
  • Visualization dashboard
  • Support

Break-Even Analysis:

For Small deployments:

  • Self-hosted: ~$3,000-4,000/year
  • Graph-Premium: $6,000/year
  • Premium costs ~$2K-3K more, but zero DevOps

For Medium deployments:

  • Self-hosted: ~$4,000-5,000/year
  • Graph-Premium: $6,000/year
  • Premium costs ~$1K-2K more with full management

For Large/Enterprise:

  • Self-hosted becomes very complex (HA, backups, monitoring)
  • Graph-Premium at $12K/year often cheaper than fully-managed self-hosted
  • Plus compliance, SLA, support included

Recommendation: Premium makes sense for teams (time savings) and enterprise (compliance + support)

Recommendation Summary

Quick Decision Tree

Do you need graph capabilities?
├─ No → Use Graph-Lite (free, built-in)

└─ Yes
├─ Budget $0/month?
│ ├─ Development: Memgraph Docker (fast, easy)
│ └─ Production: Neo4j Community (stable, proven)

├─ Budget $500+/month?
│ └─ Cortex Graph-Premium (zero DevOps, managed)

├─ Must be MIT license?
│ └─ Wait for Kùzu bindings OR use JanusGraph (complex)

├─ Already on AWS?
│ └─ Consider Neptune (managed, integrated)

└─ Need maximum performance?
└─ Memgraph (in-memory, fastest)

Our Top Pick

For 80% of users: Start with Memgraph for development, move to Neo4j Community for production.

Why:

  1. Memgraph: Instant Docker setup, blazing fast, great dev experience
  2. Neo4j: Production-proven, huge community, same code (Cypher compatible)
  3. Both use neo4j-driver - write once, deploy to either
  4. Can switch between them with config change
  5. Path to Graph-Premium later if needed

Example workflow:

# Development (local Docker)
docker run -p 7687:7687 memgraph/memgraph:latest

# Production (dedicated server)
docker run -p 7687:7687 neo4j:5-community

# Same code works for both!

Getting Started

Week 1: Proof of Concept (Memgraph)

# Start Memgraph
docker run -d -p 7687:7687 memgraph/memgraph:latest

# Test integration
npm install neo4j-driver

# Run sync example (from Graph Integration guide)
# Verify queries work, measure performance

Week 2-3: Development

# Continue with Memgraph
# Build out sync logic
# Test all Cortex entities (contexts, facts, A2A)
# Optimize queries

Week 4: Production Prep

# Switch to Neo4j for stability
docker run -d -p 7687:7687 \
-v neo4j-data:/data \
neo4j:5-community

# Same sync code works!
# Add backup scripts
# Configure monitoring

Month 2+: Consider Premium

// If spending >10 hours/month on graph DB ops
// Or need enterprise features
// Upgrade to Cortex Graph-Premium

const cortex = new Cortex({
mode: "cloud",
apiKey: process.env.CORTEX_CLOUD_KEY,
graphPremium: { enabled: true },
});

// Zero DevOps, full features ✅

Technical Comparison

Query Language

Cypher (Neo4j, Memgraph, Kùzu, Neptune OpenCypher):

// Easy to read and write
MATCH (a:Agent)-[:SENT_TO*1..3]-(b:Agent)
WHERE a.team = 'engineering'
RETURN b.name, count(*) as connections
ORDER BY connections DESC

Gremlin (JanusGraph, Neptune Gremlin):

// More verbose, functional style
g.V().hasLabel('Agent').has('team', 'engineering')
.repeat(out('SENT_TO')).times(3)
.dedup()
.groupCount().by('name')
.order().by(values, desc)

Recommendation: Cypher is more intuitive and readable. Prefer Cypher-based DBs.

Ecosystem

Neo4j:

  • 50K+ questions on Stack Overflow
  • 200+ books and courses
  • Official certification program
  • Neo4j Bloom (graph visualization)
  • APOC library (procedures)
  • Graph Data Science library

Memgraph:

  • Growing community (~2K GitHub stars)
  • Good official documentation
  • Active Discord community
  • Memgraph Lab (visualization)
  • MAGE library (algorithms)

Kùzu:

  • Small community (~500 GitHub stars)
  • Limited third-party resources
  • Active development (when not archived)

JanusGraph:

  • Medium community (~5K stars)
  • TinkerPop ecosystem
  • Mostly Java-centric

Winner: Neo4j (by far the largest ecosystem and resources)

Next Steps

Ready to integrate? Follow the setup guide:

Or go managed:

  • Cortex Graph-Premium - Sign up at cortex.cloud

Questions? Ask in GitHub Discussions or Discord.