Beyond RAG: Building an AI Companion with Deep Memory using Knowledge Graphs
- Authors

- Name
- Nino
- Occupation
- Senior Tech Editor
In the current landscape of Large Language Model (LLM) development, we often talk about RAG (Retrieval-Augmented Generation) as the gold standard for giving AI access to external data. However, as AI transitions from generic chatbots to deeply personalized companions, the limitations of standard Vector RAG are becoming apparent.
I recently faced a challenge: my wife uses LLMs as a therapist, life coach, and sounding board. Over a year, she built a 35,000-token 'Master Prompt' in Notion containing her medical history, emotional triggers, and life goals. Copying this into every new chat was a 'context tax' that made the AI feel less like a companion and more like a forgetful assistant. She didn't need a search engine; she needed a continuous brain.
To solve this, I built Synapse AI Chat. This architecture moves beyond simple vector similarity to create a "Deep Memory" system using Knowledge Graphs. By leveraging high-performance APIs like those found on n1n.ai, developers can implement similar architectures that understand causality, not just keywords.
The Problem with Vector-Only RAG
Most AI memory systems today rely on Vector RAG. You chunk text, convert it into embeddings (vectors), and retrieve chunks based on semantic similarity. While this is efficient for finding a specific clause in a 500-page PDF, it fails at modeling human life.
Vectors find similarity, but Knowledge Graphs find structure. If a user says, "Project A caused me stress, which led to a burnout," a vector search might find 'Project A' and 'burnout' separately. A Knowledge Graph, however, understands the relationship: Project A -> CAUSED -> Stress -> RESULTED_IN -> Burnout.
The Architecture: Body vs. Brain
To build a system that can handle this complexity, I split the project into two distinct parts:
- The Body (Frontend): Built with React 19 and Convex. Convex handles real-time data syncing, ensuring the chat feels snappy and responsive.
- The Cortex (Brain): A Python FastAPI backend that handles the heavy lifting of graph indexing and entity extraction.
For the models, I utilized the Gemini family due to their massive context windows, but for production stability and high-speed access to multiple models (like Claude 3.5 Sonnet or DeepSeek-V3), using an aggregator like n1n.ai is highly recommended to avoid single-provider downtime.
Phase 1: Context Hydration
When the user starts a session, we don't just send a blank prompt. We "hydrate" the system prompt with a natural language summary of the Knowledge Graph.
| Feature | Vector RAG | Knowledge Graph RAG |
|---|---|---|
| Data Structure | Unstructured Chunks | Structured Entities & Relations |
| Search Logic | Semantic Similarity | Relational Traversal |
| Context | Localized Snippets | Holistic Narrative |
| Best Use Case | Documentation Search | Personal CRM / Life Coaching |
Phase 2: Memory Consolidation (The "Nap" Phase)
Memory isn't updated in real-time during the conversation because graph extraction is computationally expensive and slow (60-200 seconds). Instead, we use a "Consolidation" phase. When the user stops chatting, the system "takes a nap" to process the transcript.
We use a high-reasoning model (like Gemini 1.5 Pro or OpenAI o1 via n1n.ai) to extract entities. The logic follows a specific pattern:
- Identify new entities (e.g., "Medication Y").
- Identify state changes (e.g., "Stopped Medication X").
- Update relationships.
Implementation: The Compilation Logic
Once the graph is updated in Neo4j (using the Graphiti framework), we need to flatten it back into text for the LLM to read. Here is the Python logic used to format the graph into a "User Manual":
def _format_compilation(definitions: list[str], relationships: list[str]) -> str:
sections = []
if definitions:
sections.append(
"#### 1. CONCEPTUAL DEFINITIONS & IDENTITY ####\n"
"# (Understanding what these concepts mean specifically for this user)\n"
+ "\n".join(definitions)
)
if relationships:
sections.append(
"#### 2. RELATIONAL DYNAMICS & CAUSALITY ####\n"
"# (How these concepts interact and evolve over time)\n"
+ "\n".join(relationships)
)
return "\n\n".join(sections) if sections else ""
Engineering for Reliability: The Retry Loop
When dealing with heavy graph processing, LLM APIs can occasionally return 503 errors. To handle this, I implemented an event-driven retry system with exponential backoff using Convex's internal scheduler.
export const RETRY_DELAYS_MS = [
0, // Immediate
2 * 60_000, // +2 minutes
10 * 60_000, // +10 minutes
30 * 60_000, // +30 minutes
];
export const processJob = internalAction({
args: { jobId: v.id("cortex_jobs") },
handler: async (ctx, args) => {
const job = await ctx.runQuery(internal.cortexJobs.get, { id: args.jobId });
try {
await ingestGraphData(ctx, job.payload);
await ctx.runMutation(internal.cortexJobs.complete, { jobId: args.jobId });
} catch (error) {
const nextAttempt = job.attempts + 1;
if (nextAttempt < job.maxAttempts) {
const delay = RETRY_DELAYS_MS[nextAttempt] ?? 30 * 60_000;
await ctx.scheduler.runAfter(delay, internal.processor.processJob, { jobId: args.jobId });
}
}
},
});
Pro Tip: Human-in-the-Loop
AI memory shouldn't be a black box. I built a visualizer using react-force-graph so my wife can see her "brain." If the AI incorrectly identifies a relationship—for example, thinking she likes mushrooms when she actually hates them—she can manually edit the node. This builds trust and ensures the "Deep Memory" remains accurate over years of use.
Conclusion
Moving from horizontal AI (knowing a little about everything) to vertical AI (knowing everything about one person) requires a shift in architecture. While Vector RAG is great for retrieval, Knowledge Graphs provide the reasoning framework necessary for true companionship. By using stable API endpoints from n1n.ai, you can ensure your memory consolidation jobs never fail due to rate limits or provider instability.
Get a free API key at n1n.ai