Building an Incremental LLM Pipeline to Transform Meeting Notes into a Knowledge Graph with CocoIndex and Neo4j

In the fast-paced world of tech, meeting notes are where critical institutional knowledge goes to die. They are scattered across Google Docs, buried in Notion pages, or lost in Slack threads. Recently, we decided to solve this by building an automated, open-source pipeline that converts these unstructured notes into a live Knowledge Graph. The result? Our LinkedIn post documenting the process exploded with over 200,000 impressions. Developers and enterprises are hungry for a way to make their data actionable without breaking the bank on LLM costs.

The Problem: Why Traditional Pipelines Fail

Most RAG (Retrieval-Augmented Generation) systems rely on vector databases. While vector search is great for finding similar text, it struggles with complex relationships—like 'Which project did Sarah mention in the meeting three weeks ago that is also linked to the Q4 budget?' This is where a Knowledge Graph shines.

However, building a Knowledge Graph from a constantly updating source like Google Drive presents a major challenge: cost and efficiency. Traditional pipelines are 'stateless'—whenever a single file changes, they re-process the entire directory. If you have 1,000 documents and edit one, you pay an LLM to extract entities from all 1,000 again. To solve this, we integrated n1n.ai for high-speed LLM processing and used CocoIndex for incremental indexing. By using n1n.ai, we ensured that our extraction layer was both stable and cost-efficient, allowing the Knowledge Graph to scale indefinitely.

The Architecture of an Incremental Knowledge Graph

The pipeline consists of four main components:

Data Connector: Monitors Google Drive for new or modified files.
Incremental Processor (CocoIndex): Tracks the hash of every document. It only triggers the LLM for files that have actually changed.
LLM Extraction Layer (n1n.ai): Uses advanced models to identify entities (People, Projects, Decisions) and relationships (Works_On, Decided_In).
Graph Database (Neo4j): Stores the structured data as a queryable Knowledge Graph.

Step-by-Step Implementation

To build your own Knowledge Graph, you first need to define your schema. A typical meeting-focused Knowledge Graph includes nodes for Person, Meeting, Decision, and Topic.

1. Setting up the LLM Client

We recommend using n1n.ai because it aggregates the best models (like GPT-4o or Claude 3.5 Sonnet) into a single, high-performance API. This is critical for the Knowledge Graph extraction phase where consistency is key.

import openai

# Configure the client to use n1n.ai aggregator
client = openai.OpenAI(
    api_key="YOUR_N1N_API_KEY",
    base_url="https://api.n1n.ai/v1"
)

def extract_graph_data(text):
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "system", "content": "Extract entities and relationships for a Knowledge Graph..."},
                  {"role": "user", "content": text}]
    )
    return response.choices[0].message.content

2. Incremental Logic

The 'Magic' happens in the incremental check. By only processing the delta, you reduce your Knowledge Graph maintenance costs by up to 90%.

Feature	Traditional Pipeline	Incremental Knowledge Graph
Compute Cost	High (re-processes everything)	Low (processes changes only)
Latency	Minutes/Hours	Real-time
LLM API Usage	Redundant	Optimized via n1n.ai
Scalability	Linear Growth in Cost	Logarithmic Growth in Cost

Pro Tip: Entity Disambiguation in a Knowledge Graph

One of the hardest parts of building a Knowledge Graph is ensuring 'John Doe' in Meeting A is the same 'John Doe' in Meeting B. We solved this by providing the LLM with a 'Global Context' from the existing Knowledge Graph. Before extraction, the pipeline queries Neo4j for existing entities, helping the LLM map new notes to the correct nodes in the Knowledge Graph.

Why the Knowledge Graph Went Viral

The reason this project garnered 200,000 impressions is simple: it moves beyond the hype of 'Chat with your PDF.' It provides a structured, enterprise-grade way to visualize corporate memory. When a new employee joins, they don't need to read 500 docs; they can query the Knowledge Graph to see the evolution of a project.

Using a Knowledge Graph allows for complex Cypher queries like:

MATCH (p:Person {name: 'Alice'})-[:DECIDED]->(d:Decision)<-[:PART_OF]-(m:Meeting)
RETURN d.description, m.date

This level of precision is impossible with standard vector RAG, making the Knowledge Graph the ultimate tool for organizational intelligence.

Conclusion

Building a live Knowledge Graph is no longer a luxury for big tech. By combining open-source tools like CocoIndex and Neo4j with the robust LLM infrastructure provided by n1n.ai, any developer can build a viral-worthy data pipeline. The shift from static documents to an evolving Knowledge Graph is the next frontier in AI-driven productivity.

Get a free API key at n1n.ai

Source: https://dev.to/badmonster0/we-built-an-open-source-pipeline-that-turned-meeting-notes-into-a-live-knowledge-graph-and-it-4450