Addressing Hallucinations in Generative AI Agents using Observability and Dual Memory Knowledge Graphs
Implementation repository for the paper:
Matharaarachchi et al., 2026 — Knowledge-Based Systems
Addressing Hallucinations in Generative AI Agents using Observability and Dual Memory Knowledge Graphs
https://www.sciencedirect.com/science/article/pii/S0950705126002121
This framework reduces hallucinations in agentic LLM systems using:
- Observability logging
- Diagnostics modules
- Root Cause Analysis (RCA)
- Knowledge-Based Verification (KBV)
- Human-In-the-Loop review (HIL)
- Dual Memory Knowledge Graph
- Experience memory (successful traces)
- Insight memory (failure explanations)
- Reasoning agents
- ReAct
- Reflexion
agents/ # ReAct + Reflexion agents (baseline + dual memory)
classifier/ # Intent / entity / attribute classification
diagnostics/ # RCA, KBV, HIL
eval/ # Experimental evaluation
knowledge_graph/ # Dual memory Neo4j KG
log_transformation/ # LangSmith → ReAct trace pipeline
scripts/ # End-to-end execution scripts
common/ # Shared config, models, logging
- Python 3.10+
- Neo4j (vector index support)
- LangSmith account
- Azure OpenAI LLM provider configured
python -m venv .venvActivate it:
source .venv/bin/activatepip install --upgrade pip
pip install -r requirements.txtCreate a .env file following the .env.example file:
OPENAI_API_KEY=...
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=...
NEO4J_DATABASE=neo4j
LANGSMITH_PROJECT_ID=...
AZURE_OPENAI_API_KEY=...
AZURE_OPENAI_ENDPOINT=...
AZURE_OPENAI_DEPLOYMENT_NAME=...
Below is the recommended end-to-end workflow.
python scripts/run_export.pyOutput:
output/langsmith_runs_<...>.json
python scripts/run_format_trace.pyOutput:
output/langsmith_runs_<...>.react.txt
python scripts/run_rca.pyOutput:
output/langsmith_runs_<...>.rca.json
python scripts/run_kbv.pyOutput:
output/langsmith_runs_<...>.kbv.json
python scripts/run_hil_streamlit.pyThis launches the Streamlit interface for rating traces.
Output:
output/langsmith_runs_<...>.hil.json
python scripts/run_classify.pyOutputs:
output/langsmith_runs_<...>.classified.json
output/langsmith_runs_<...>.classified_insights.json
Make sure Neo4j is running.
python scripts/run_insert_obs.pyThis:
- Creates vector index
- Inserts embeddings
- Stores experience + insight memory
python scripts/run_react_agent.pypython scripts/run_reflexion_agent.pyEvaluation compares:
- ReAct
- Reflexion
- ReAct + Dual Memory
- Reflexion + Dual Memory
Metrics:
- Exact match accuracy
- Relevancy
- Faithfulness
- Consistency
- Latency
- Cost
Results are written to CSV files in eval/data default.
If you use this repository:
@article{matharaarachchi2026addressing,
title={Addressing Hallucinations in Generative AI Agents using Observability and Dual Memory Knowledge Graphs},
author={Matharaarachchi, Amali and Moraliyage, Harsha and Mills, Nishan and Gamage, Gihan and De Silva, Daswin and Manic, Milos},
journal={Knowledge-Based Systems},
pages={115469},
year={2026},
publisher={Elsevier}
}