You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You're tired of AI agents writing code that 'just works' but fits like a square peg in a round hole - not your conventions, not your architecture, not your repo. Even with well-curated instructions. You correct the agent, it doesn't remember. Next session, same mistakes.
7
+
You're tired of AI agents writing code that "just works" but still misses how your team actually builds things. They search too broadly, pick generic examples, and spend tokens exploring before they understand the shape of the repo.
8
8
9
-
This MCP gives agents _just enough_ context so they match _how_ your team codes, know _why_, and _remember_ every correction.
9
+
`codebase-context` changes the first step. Start with a bounded conventions map that shows the architecture, dominant patterns, and strongest local examples. Then search for the exact file, symbol, or workflow you need.
10
10
11
11
Here's what codebase-context does:
12
12
13
-
**Finds the right context** - Search that doesn't just return code. Each result comes back with analyzed and quantified coding patterns and conventions, related team memories, file relationships, and quality indicators. It knows whether you're looking for a specific file, a concept, or how things wire together - and filters out the noise (test files, configs, old utilities) before the agent sees them. The agent gets curated context, not raw hits.
13
+
**Starts with a bounded conventions map** - The first call shows architecture layers, active patterns, golden files, and next calls without dumping vendored repos, fixtures, generated output, or oversized entrypoint lists into the default surface.
14
14
15
-
**Knows your conventions** - Detected from your code and git history, not only from rules you wrote. Seeks team consensus and direction by adoption percentages and trends (rising/declining), golden files. Tells the difference between code that's _common_ and code that's _current_ - what patterns the team is moving toward and what's being left behind.
15
+
**Finds the right local example** - Search does not just return code. Each result comes back with pattern signals, file relationships, and quality indicators so the agent can move from the map to the most relevant local example instead of wandering through raw hits.
16
16
17
-
**Remembers across sessions** - Decisions, failures, workarounds that look wrong but exist for a reason - the battle scars that aren't in the comments. Recorded once, surfaced automatically so the agent doesn't "clean up" something you spent a week getting right. Conventional git commits (`refactor:`, `migrate:`, `fix:`) auto-extract into memory with zero effort. Stale memories decay and get flagged instead of blindly trusted.
17
+
**Knows what is current** - Conventions are detected from your code and git history, not only from rules you wrote. The map distinguishes what is common from what is rising or declining, and points at the files that best represent the current direction.
18
18
19
-
**Checks before editing** - Before editing something, you get a decision card showing whether there's enough evidence to proceed. If a symbol has four callers and only two appear in your search results, the card shows that coverage gap. If coverage is low, `whatWouldHelp` lists the specific searches to run before you touch anything.
19
+
**Adds support signals when you need them** - Team memory and edit-readiness checks stay available, but as supporting context after the map and search have already narrowed the work.
20
20
21
-
One tool call returns all of it. Local-first - your code never leaves your machine by default.
21
+
Map first, search second, local-first throughout. Your code never leaves your machine by default.
22
22
23
-
See the [current discovery benchmark](./docs/benchmark.md) for the checked-in proof results and current gate truth.
23
+
See the [current discovery benchmark](https://github.com/PatrickSys/codebase-context/blob/master/docs/benchmark.md) for the checked-in discovery-only proof. The gate is still `pending_evidence`, and `claimAllowed` remains `false`.
24
24
25
25
### What it looks like
26
26
@@ -38,7 +38,7 @@ This is the part most tools miss: what the team is doing now, what it is moving
38
38
39
39
When the agent searches with edit intent, it gets a compact decision card: confidence, whether it's safe to proceed, which patterns apply, the best example, and which files are likely to be affected.
40
40
41
-
More CLI examples in [`docs/cli.md`](./docs/cli.md). Full walkthrough: [`docs/demo.md`](./docs/demo.md).
41
+
More CLI examples in [`docs/cli.md`](./docs/cli.md). Full walkthrough: [demo.md on GitHub](https://github.com/PatrickSys/codebase-context/blob/master/docs/demo.md).
42
42
43
43
## Quick Start
44
44
@@ -71,7 +71,7 @@ Full per-client setup, HTTP server instructions, and local build testing: [`docs
71
71
72
72
## First Use
73
73
74
-
Get a conventions map of your codebase before exploring or searching:
74
+
Get a conventions map of your codebase before exploring or editing:
75
75
76
76
```bash
77
77
# See your codebase conventions — architecture layers, patterns, golden files
@@ -85,20 +85,20 @@ Your AI agent uses the same map via the `codebase://context` MCP resource on fir
85
85
86
86
## Common First Commands
87
87
88
-
Three commands to get what usually takes a new developer weeks to piece together:
88
+
Three commands to understand a repo before you edit it:
89
89
90
90
```bash
91
-
# What tech stack, architecture, and file count?
92
-
npx -y codebase-context metadata
91
+
# What are the main conventions and best examples?
Copy file name to clipboardExpand all lines: docs/benchmark.md
+9-7Lines changed: 9 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -37,7 +37,7 @@ From `results/gate-evaluation.json`:
37
37
-`claimAllowed`: `false`
38
38
-`totalTasks`: `24`
39
39
-`averageUsefulness`: `0.75`
40
-
-`averageEstimatedTokens`: `1822.25`
40
+
-`averageEstimatedTokens`: `1827.0833`
41
41
-`bestExampleUsefulnessRate`: `0.125`
42
42
43
43
Repo-level outputs from the same rerun:
@@ -53,30 +53,32 @@ The gate is intentionally still blocked.
53
53
54
54
- The combined suite covers both public repos.
55
55
-`claimAllowed` remains `false` because comparator evidence still does not support a benchmark-win claim.
56
-
- Two comparator lanes now return `status: "ok"`, but both are effectively near-empty on the frozen tasks and contribute `0` average usefulness.
57
-
- Three comparator lanes still fail setup entirely.
56
+
- Two comparator artifacts now return `status: "ok"`, but that does not yet close the gate:
57
+
-`raw Claude Code` still leaves the baseline `pending_evidence` because `averageFirstRelevantHit` is `null`
58
+
-`codebase-memory-mcp` now has real current metrics, but the gate still marks it `failed` on the frozen tolerance rule
59
+
- Three comparator lanes still fail setup entirely: `GrepAI`, `jCodeMunch`, and `CodeGraphContext`.
58
60
59
61
## Comparator Reality
60
62
61
63
The current comparator artifact records incomplete comparator evidence, not benchmark wins.
62
64
63
65
| Comparator | Status | Current reason |
64
66
| --- | --- | --- |
65
-
|`codebase-memory-mcp`|`ok`| Runs, but the checked-in artifact still averages `0` usefulness and `5` estimated tokens per task, so it does not yet contribute meaningful benchmark evidence|
67
+
|`codebase-memory-mcp`|comparator artifact: `ok`; gate: `failed`| Runs through the repaired graph-backed path and now records real metrics (`averageUsefulness: 0.1875`, `averageFirstRelevantHit: 1.2857`, `bestExampleUsefulnessRate: 0.5`), but the frozen gate still fails it on the required usefulness comparisons|
|`raw Claude Code`|`ok`| Runs, but the checked-in artifact still averages `0` usefulness and only `18.5` estimated tokens per task, so it does not yet contribute meaningful benchmark evidence |
71
+
|`raw Claude Code`|comparator artifact: `ok`; gate: `pending_evidence`| The explicit Haiku CLI runner now returns current metrics (`averageUsefulness: 0.0278`, `averageEstimatedTokens: 32.1667`), but the baseline still lacks `averageFirstRelevantHit`, so the gate keeps this lane as missing evidence |
70
72
71
73
`CodeGraphContext` remains part of the frozen comparison frame. It is not omitted from the public story just because the lane still fails to start.
72
74
73
75
## Important Limitations
74
76
75
77
- This benchmark measures discovery usefulness and payload cost only.
76
78
- It does not measure implementation correctness, patch quality, or end-to-end task completion.
77
-
- Comparator setup remains environment-sensitive, and the checked-in comparator outputs are still too weak to justify a claim.
79
+
- Comparator setup remains environment-sensitive, and the checked-in comparator outputs still do not satisfy the frozen claim gate.
78
80
- The reranker cache is currently corrupted on this machine. During the proof rerun, search fell back to original ordering after `Protobuf parsing failed` while still completing the harness.
79
-
-`averageFirstRelevantHit` remains `null` in the current gate output because this compact response surface does not expose a comparable ranked-hit metric across the incomplete comparator set.
81
+
-`averageFirstRelevantHit` remains `null` in the current gate output, which is enough to keep the raw-Claude baseline in `pending_evidence`.
Copy file name to clipboardExpand all lines: docs/capabilities.md
+2-1Lines changed: 2 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Capabilities Reference
2
2
3
-
Technical reference for what `codebase-context` ships today. For the user-facing overview, see [README.md](../README.md).
3
+
Technical reference for what `codebase-context` ships today. The public product posture is map first, find second: the bounded conventions map is the first-call surface, and search narrows to the right local example after that. For the user-facing overview, see [README.md](../README.md).
4
4
5
5
## Transport Modes
6
6
@@ -298,6 +298,7 @@ Reproducible evaluation is shipped as a CLI entrypoint backed by shared scoring/
298
298
-**Retrieval metrics:** Top-1 accuracy, Top-3 recall, spec contamination rate, and a gate pass/fail
299
299
-**Discovery metrics:** usefulness score, payload bytes, estimated tokens, first relevant hit, and best-example usefulness
300
300
-**Discovery gate:** discovery mode evaluates the frozen ship gate only when the full public suite and comparator metrics are available; missing comparator evidence is reported as pending, not silently treated as pass/fail
301
+
-**Current checked-in gate truth:**`results/gate-evaluation.json` remains `pending_evidence` with `claimAllowed: false`; the raw-Claude baseline still lacks `averageFirstRelevantHit`, `codebase-memory-mcp` still fails the frozen usefulness comparisons, and the remaining named lanes are still `setup_failed`
301
302
-**Limits:** discovery mode is discovery-only, uses current shipped surfaces only, and does not claim implementation quality; named competitor runs remain a documented hybrid/manual lane rather than a built-in automated benchmark
The conventions map — run this first on an unfamiliar repo. Shows architecture layers, active patterns with adoption rates and trend direction, and the golden files the team treats as the strongest examples. This is also what the MCP server delivers to AI agents via the `codebase://context` resource on first call.
54
+
The conventions map - run this first on an unfamiliar repo. It shows architecture layers, active patterns with adoption rates and trend direction, and the golden files the team treats as the strongest examples. This is also what the MCP server delivers to AI agents via the `codebase://context` resource on first call, before search narrows to a specific local example.
Copy file name to clipboardExpand all lines: docs/client-setup.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Client Setup
2
2
3
-
Full setup instructions for each AI client. For the quick-start summary, see [README.md](../README.md).
3
+
Full setup instructions for each AI client. This guide is about transport and wiring, not a different product mode: each client gets the same bounded conventions map first and local-pattern discovery second. For the quick-start summary, see [README.md](../README.md).
Copy file name to clipboardExpand all lines: docs/comparison-table.md
+7-5Lines changed: 7 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,17 +5,19 @@ It is a setup-status table first, not a marketing scoreboard.
5
5
6
6
| Comparator | Intended role in gate | Current status | Evidence summary |
7
7
| --- | --- | --- | --- |
8
-
|`raw Claude Code`| Baseline for payload cost and at least one usefulness comparison |`setup_failed`| The local `claude` CLI baseline is unavailable in this environment, so the gate records missing baseline metrics. |
8
+
|`raw Claude Code`| Baseline for payload cost and at least one usefulness comparison |comparator artifact: `ok`; gate: `pending_evidence`| The Haiku-backed Claude CLI runner now returns current payloads, but the checked-in baseline still has `averageFirstRelevantHit: null`, so the gate still records missing baseline metrics. |
9
9
|`GrepAI`| Named MCP comparator |`setup_failed`| Requires the GrepAI binary plus a local Ollama embedding setup that is not present in this proof environment. |
10
10
|`jCodeMunch`| Named MCP comparator |`setup_failed`| The MCP server still closes on startup during the current rerun, so no comparable discovery metrics were produced. |
11
-
|`codebase-memory-mcp`| Named MCP comparator |`setup_failed`| The documented install path still depends on the external shell installer instead of a working local benchmark path. |
11
+
|`codebase-memory-mcp`| Named MCP comparator |comparator artifact: `ok`; gate: `failed`| The repaired graph-backed runner now produces real current metrics, but the frozen gate still fails this lane because `codebase-context` does not stay within tolerance on every required usefulness metric. |
12
12
|`CodeGraphContext`| Graph-native comparator in the relaunch frame |`setup_failed`| The MCP server still closes on startup during the current rerun, so this lane remains missing evidence. |
13
13
14
14
## Reading This Table
15
15
16
16
-`setup_failed` means the lane was attempted and did not reach a credible metric-producing state.
17
+
-`pending_evidence` in the gate means the lane is still missing one or more required metrics.
18
+
-`failed` in the gate means the lane has real metrics, but the frozen comparison rule still does not pass.
17
19
- A missing metric is not treated as a win for `codebase-context`.
18
-
- The combined gate in `results/gate-evaluation.json` remains `pending_evidence` until these lanes produce real metrics.
20
+
- The combined gate in `results/gate-evaluation.json` remains `pending_evidence`, and `claimAllowed` stays `false`, until these lanes produce real metrics.
19
21
20
22
## Current codebase-context result
21
23
@@ -25,8 +27,8 @@ For reference, the current combined discovery output across `angular-spotify` an
0 commit comments