Autonomous Institutional Memory

AIM

Autonomous Institutional Memory

A local-first GraphRAG system that turns engineering context into a queryable, provenance-aware memory layer.

Run locally · Benchmark · Architecture · Slack ingest · Limitations

Overview

Most workplace RAG systems are document search with a chat box. AIM is built for questions where the answer is a relationship, not a paragraph:

Which service did this incident affect?
Who responded, and which team reported it?
Which decision superseded the policy that caused the outage?
What path connects a Slack message, a runbook, a service, and an owner?

AIM ingests operational context, extracts typed entities and relationships, stores them in Neo4j and Qdrant, and answers with evidence paths instead of flat snippets. The goal is not just retrieval. The goal is institutional memory that can explain where an answer came from.

What AIM Does

Capability	What it means
Live Slack ingest	Slack Events API -> signed FastAPI webhook -> extractor -> Neo4j + Qdrant in seconds.
GraphRAG + vector retrieval	Typed graph traversal is combined with semantic vector search.
Multi-hop reasoning	The agent decomposes a query, expands graph neighborhoods, scores paths, and synthesizes a grounded answer.
Provenance maps	Responses carry graph nodes, graph edges, source IDs, citations, and a reasoning trace.
Exact-incident guardrails	Direct incident questions answer from recorded facts or abstain; nearby incidents are not allowed to bleed in.
Local-first inference	The default path runs with Ollama and local embeddings. API-backed models are optional.
Demo-grade frontend	Next.js console with retrieved sources, streaming status, and a 3D knowledge nebula.

Demo Flow

Live ingest	Streaming retrieval	Grounded answer

The important part: there is no nightly re-indexing job in this demo path. A Slack message arrives, AIM extracts entities and relationships, writes graph and vector records, and the next query can retrieve the new fact.

Benchmark

Latest saved run: docs/benchmarks/eval_report_after_teacher_bfs.md

Fixture: 34 gold-labeled items in tests/eval/fixtures/ground_truth.yaml

System	Overall NDCG@10	Multi-hop NDCG@10	Multi-hop Path Acc	Multi-hop Citation	p50 Latency
`vector_only`	0.344	0.460	0.000	0.393	6.2s
`graph_only`	0.548	0.799	0.720	0.500	6.3s
`aim_full`	0.659	0.836	0.839	0.363	29.1s

What this supports:

AIM beats vector_only by +37.7 percentage points on multi-hop NDCG.
AIM beats graph_only on overall NDCG, multi-hop NDCG, and multi-hop path accuracy.
graph_only still wins on citation precision and latency.

This is not a SOTA claim. It is evidence that AIM is more than a standard vector RAG wrapper on this fixture. Full methodology, ablations, and per-category tables are in BENCHMARKS.md.

Architecture

Slack / Jira / Confluence
        |
        v
FastAPI signed webhooks
        |
        v
Ingest worker -> LLM extractor -> deduplicator -> Neo4j + Qdrant

User query
        |
        v
FastAPI /query or /query/stream
        |
        v
LangGraph agent
  decomposer        -> sub-queries, intent, entity pairs
  graph_searcher    -> Neo4j hybrid search, path scoring, exact-incident checks
  vector_retriever  -> Qdrant approximate nearest neighbor retrieval
  mcp_fetcher       -> optional live tool/context fetch
  synthesizer       -> grounded answer, citations, provenance graph
        |
        v
Next.js frontend -> decision console, sources, knowledge nebula

Single-worker note: the compiled LangGraph and in-process token buckets are module-level singletons. Run one worker per process and scale horizontally behind a load balancer.

Stack

Layer	Choice	Why
API	FastAPI	Async routes, signed webhooks, SSE streaming.
Agent orchestration	LangGraph	Stateful graph pipeline with reducers and parallel fan-out.
Knowledge graph	Neo4j 5.24 + APOC	Cypher path queries with fulltext and vector indexes nearby.
Vector store	Qdrant by default, Pinecone optional	Local-first by default; hosted option available.
LLM	Ollama-compatible local model by default	Keeps the full demo runnable without paid API keys.
API LLMs	Anthropic or OpenAI optional	Higher-quality synthesis path when keys are available.
Embeddings	`nomic-embed-text`	Local 768-dimensional embeddings.
Frontend	Next.js standalone	Production build can run with Node directly.
Cache and threads	Redis optional	Falls back to in-memory behavior when Redis is absent.
Webhook security	HMAC-SHA256	Slack-style signing-secret verification with replay checks.

Run Locally

Requires Python 3.12+, Node 22+, Neo4j 5.24+, Qdrant 1.11+, and either Ollama for local-LLM mode or an Anthropic/OpenAI API key.

Backend:

pip install -e ".[dev]"
cp .env.example .env

# Set NEO4J_PASSWORD at minimum.
# Start Neo4j on bolt://localhost:7687
# Start Qdrant on http://localhost:6333
# Start Ollama on http://localhost:11434/v1, or configure an API LLM provider.

python -m aim.scripts.seed_demo
uvicorn aim.main:app --workers 1 --port 8000

Frontend:

cd frontend
cp .env.local.example .env.local
npm install
npm run build
node .next/standalone/server.js

Open http://localhost:3000.

Do not use next start for this project. The frontend is configured for the Next.js standalone runner.

Public Demo With Cloudflare Tunnel

For a temporary public demo without committing secrets:

# Backend
cloudflared tunnel --url http://localhost:8000

# Frontend, in a separate terminal
cloudflared tunnel --url http://localhost:3000

Share the frontend tunnel URL. Use the backend tunnel URL for webhook providers:

https://<your-backend-tunnel>/webhooks/slack/events

Quick-tunnel URLs are temporary and change when cloudflared restarts. For a permanent demo, use a named Cloudflare tunnel with a custom domain, or deploy the Next.js frontend to a platform such as Vercel and host the backend separately.

Slack Live Ingest

Create a Slack app.
Set WEBHOOK_SLACK_SIGNING_SECRET and SLACK_BOT_TOKEN in .env.
Expose the backend with a tunnel.
Set Slack Event Subscriptions to:
```
https://<tunnel>/webhooks/slack/events
```
Subscribe to message.channels, reinstall the app, and invite the bot to a channel.

Post a relationship-explicit incident message:

INC-2025-100 was caused by the Auth Service rate limiter rejecting requests
after the 10am deploy. Marcus from the SRE team is leading the rollback.

Ask AIM:

Which service did INC-2025-100 affect, and who is leading the response?

Live extraction works best when messages include explicit relationship language: caused by, impacted, owned by, approved by, reported by, leading, or superseded. If the graph does not contain the edge, AIM is designed to answer narrowly or abstain instead of borrowing facts from nearby incidents.

You can also replay a signed Slack event locally:

python scripts/replay_slack_event.py

Tests And Evaluation

pytest
PYTHONIOENCODING=utf-8 python scripts/eval_live.py --out eval_report.md

Targeted checks for incident guardrails and streaming:

pytest tests/unit/test_exact_incident_fast_path.py \
       tests/unit/test_extraction.py \
       tests/integration/test_streaming.py

The live benchmark runs vector_only, graph_only, and aim_full against the same fixture and reports NDCG, path accuracy, citation behavior, negative rejection, and p50 latency.

Limitations

AIM is ready to demo and evaluate, but it is still a research-grade system. The core graph retrieval loop is working; the remaining work is larger evaluation, production hardening, and reducing dependence on small local models.

The benchmark is intentionally transparent but small: 34 labeled questions. The next serious validation step is HotpotQA, MuSiQue, or 2WikiMultihopQA.
Citation quality is the weakest measured area on the local model path. The graph often finds the right path, but the local synthesizer is not always disciplined about citing it.
Deep multi-hop answers are not instant. The saved eval run has a p50 latency of 29.1 seconds for aim_full.
Slack ingest has been exercised end-to-end. Jira and Confluence support are represented in the architecture, but need real-workspace soak testing.
Before production use, the security layer needs a deployment-specific pass for prompt injection, PII redaction, tenant access policy, retention, and audit logging.

The detailed roadmap is in LIMITATIONS.md.

Notable Implementation Detail

The most important retrieval improvement is in aim/agents/nodes/graph_searcher.py: score boosting by path participation.

Multi-hop answers are often complete paths, not isolated nodes. Hybrid search can find an intermediate node as a 2-hop neighbor, but because the intermediate name does not always text-match the query, it can land outside NDCG@10. AIM boosts entities that appear on discovered paths and re-sorts the graph results. That lifts path intermediates into the top results without inventing new facts.

Publishing And Secrets

This repository is structured so it can be shared publicly without exposing local credentials or runtime state. Real API keys and deployment-specific configuration belong in .env and frontend/.env.local; both are excluded from version control. The committed example files document the required variables without containing working secrets.

Recommended release checks:

npm --prefix frontend audit --audit-level=moderate
python -m pip_audit
python -m pytest -p no:cacheprovider tests/unit tests/eval -q

Credits

Built solo in April 2026 as an exploration of what graph-backed retrieval can add beyond vector RAG, and what an institutional-memory tool would need before it becomes useful inside an engineering organization.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
aim		aim
docs		docs
frontend		frontend
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
BENCHMARKS.md		BENCHMARKS.md
Dockerfile		Dockerfile
LIMITATIONS.md		LIMITATIONS.md
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Autonomous Institutional Memory

Overview

What AIM Does

Demo Flow

Benchmark

Architecture

Stack

Run Locally

Public Demo With Cloudflare Tunnel

Slack Live Ingest

Tests And Evaluation

Limitations

Notable Implementation Detail

Publishing And Secrets

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Autonomous Institutional Memory

Overview

What AIM Does

Demo Flow

Benchmark

Architecture

Stack

Run Locally

Public Demo With Cloudflare Tunnel

Slack Live Ingest

Tests And Evaluation

Limitations

Notable Implementation Detail

Publishing And Secrets

Credits

About

Topics

Resources

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages