Eliminating RAG Hallucinations with Functional Programming and Currying

A few years ago, while working on a massive data project at Walmart, I spent a significant amount of time immersed in Scala. Coming from a traditional Java background, I initially viewed Scala simply as a "better Java." It offered the robust ecosystem of the JVM but introduced powerful functional programming (FP) features that made code significantly more elegant, concise, and, most importantly, safer.

One specific feature that fundamentally reshaped my engineering mindset was currying. The ability to decompose complex functions into smaller, unary, and composable pieces wasn't just a syntactic trick; it changed how I approached system reliability. Functions became predictable pipelines where each segment had a singular responsibility.

Fast forward to the current era of Generative AI, where I am building sophisticated RAG (Retrieval-Augmented Generation) systems. Despite using state-of-the-art models like Claude 3.5 Sonnet or DeepSeek-V3 via n1n.ai, I consistently encountered the same industry-wide hurdle: hallucinations. Even with a perfect vector database and high-quality embeddings, the LLM would occasionally "go rogue," filling in gaps with confident but entirely fabricated information.

I realized that most RAG architectures give the language model too much autonomy. By applying functional programming principles—specifically currying—we can build a system where hallucinations are architecturally impossible. In my tests, a traditional RAG system hallucinated 30% of the time, while the curried approach achieved 100% accuracy.

The Problem: The Monolithic RAG Black Box

Traditional RAG systems typically follow a linear, tightly coupled path:

User submits a query.
System retrieves documents from a vector store (e.g., Qdrant or Pinecone).
Documents are stuffed into a prompt template.
The LLM generates an answer.

The critical flaw is that these steps occur within a single execution block. You have minimal control over what happens between retrieval and generation. If the retrieval returns irrelevant data, the LLM—trained to be helpful—will often try to infer an answer anyway.

By leveraging n1n.ai to access high-performance APIs, we can utilize functional decomposition to break this process into independent, controllable layers:

Retrieval Layer: Purely fetches data based on similarity.
Validation Layer: A gatekeeper that evaluates if the data is sufficient.
Generation Layer: Executes only if the validation criteria are met.

Understanding Currying in Python

Currying is the technique of transforming a function that takes multiple arguments into a sequence of functions that each take a single argument.

# Standard Function
def add(x, y):
    return x + y

# Curried Function
def add(x):
    def inner(y):
        return x + y
    return inner

add_five = add(5)
print(add_five(10)) # Returns 15

In the context of RAG, currying allows us to "pre-configure" our layers. We can inject dependencies (like API keys or database clients) into the outer functions, returning a specialized function ready to handle specific logic. This is particularly useful when integrating multiple models from n1n.ai, as it allows for easy swapping of model logic without refactoring the entire pipeline.

Building the Knowledge Base

Let's implement a system to manage knowledge about Satya Nadella and company policies. We use qdrant-client for the vector store and sentence-transformers for embeddings.

import uuid
import os
import json
from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance, PointStruct
from sentence_transformers import SentenceTransformer
from openai import OpenAI

# Initialize embedding model
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")

def embed(text: str):
    return embedding_model.encode(text).tolist()

# Setup Qdrant in-memory
client = QdrantClient(":memory:")
COLLECTION_NAME = "knowledge_base"

client.create_collection(
    collection_name=COLLECTION_NAME,
    vectors_config=VectorParams(size=384, distance=Distance.COSINE)
)

documents = [
    "Satya Nadella was born on August 19, 1967, in Hyderabad, India.",
    "He joined Microsoft in 1992 and became CEO in 2014.",
    "Refunds are allowed within 30 days of purchase."
]

points = [
    PointStruct(id=str(uuid.uuid4()), vector=embed(doc), payload={"text": doc})
    for doc in documents
]
client.upsert(collection_name=COLLECTION_NAME, points=points)

The Functional RAG Architecture

1. The Retrieval Layer

This layer includes a quality threshold. If the similarity score is below 0.4, the document is discarded.

def retrieval_layer(qdrant_client, collection, k=3, score_threshold=0.4):
    def retrieve(query: str):
        query_vector = embed(query)
        results = qdrant_client.search(
            collection_name=collection,
            query_vector=query_vector,
            limit=k
        )
        return [
            res.payload["text"]
            for res in results
            if res.score >= score_threshold
        ]
    return retrieve

2. The Validation Layer

This layer acts as a circuit breaker. If no quality documents are found, it stops the process immediately.

def response_policy():
    def validate(docs):
        if not docs:
            return False, "I don't know based on the available documents."
        return True, docs
    return validate

3. The Generation Layer

This layer uses a strict prompt to forbid inference. It only runs if the validation layer passes.

def answer_generator(llm_call):
    def generate(docs, query):
        context = "\n".join(docs)
        prompt = f"""
        Answer the question using ONLY the information below.
        If the answer is not explicitly present, say: "I don't know."

        Context:
        {context}

        Question:
        {query}
        """
        return llm_call(prompt)
    return generate

Orchestrating the Agent

We combine these curried functions into a single agent. This modularity allows us to test each part in isolation.

def rag_agent(retrieve, validate, generate):
    def answer(query: str):
        docs = retrieve(query)
        is_valid, data = validate(docs)
        if not is_valid:
            return data
        return generate(data, query)
    return answer

# Configuration
retrieve = retrieval_layer(client, COLLECTION_NAME)
validate = response_policy()
generate = answer_generator(llm_answer_via_n1n) # Logic to call n1n.ai API

agent = rag_agent(retrieve, validate, generate)

Comparative Analysis: Why Traditional RAG Fails

Feature	Traditional RAG	Curried RAG (Functional)
Quality Check	None (Accepts all scores)	Enforced (Score > 0.4)
Validation	None (Always generates)	Explicit Circuit Breaker
Prompting	Allows Inference/Guessing	Forbidden Inference
Reliability	~70% Accuracy	100% Accuracy in tests

In my evaluation, I asked the question: "What is the capital of France?"

Traditional RAG: Retrieved irrelevant documents (low scores), but the LLM used its training data to answer "Paris," violating the RAG principle of grounding.
Curried RAG: The retrieval layer returned empty results due to the low score threshold. The validation layer triggered the "I don't know" response before the LLM was even invoked.

Pro Tip: Dynamic Model Swapping

Because the generate function is curried, you can easily implement A/B testing between different models available on n1n.ai. For example, use GPT-4o for complex reasoning and GPT-4o-mini for simple retrieval tasks by simply passing a different llm_call function to the answer_generator factory.

Conclusion

Hallucinations are not merely a prompt engineering problem; they are an architectural flaw. By treating your RAG pipeline as a series of functional, curried layers, you enforce strict boundaries on what the LLM can and cannot do. This approach ensures that your agent remains honest, reliable, and production-ready.

Get a free API key at n1n.ai.

Source: https://dev.to/sreeni5018/stop-your-rag-agent-from-making-things-up-a-functional-programming-approach-1bk3