Skip to content

Commit db80888

Browse files
authored
Merge pull request #114 from PatrickSys/launch-readiness-safer-edits
feat: consolidate launch readiness improvements
2 parents 0458be8 + 77ae70b commit db80888

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

61 files changed

+4871
-702
lines changed

.github/workflows/publish-npm-on-release.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,9 @@ on:
88
workflow_dispatch:
99
inputs:
1010
tag:
11-
description: 'Tag to publish (e.g. v1.6.2)'
11+
description: 'Tag to publish (e.g. v2.2.0)'
1212
required: true
13-
default: 'v1.6.2'
13+
default: 'v2.2.0'
1414

1515
permissions:
1616
contents: read

.release-please-manifest.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
{
2-
".": "1.10.0"
2+
".": "2.2.0"
33
}

CHANGELOG.md

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,18 @@
11
# Changelog
22

3-
## Unreleased
3+
## [2.2.0](https://github.com/PatrickSys/codebase-context/compare/v1.10.0...v2.2.0) (2026-04-17)
4+
5+
### Features
6+
7+
* relaunch around a bounded conventions map and local-pattern discovery for `map + find`
8+
* add explicit full-map resources while keeping the default first-call map bounded and action-oriented
9+
* align public proof surfaces to the discovery-only benchmark posture (`pending_evidence`, `claimAllowed: false`)
10+
11+
### Bug Fixes
12+
13+
* make the packaged README tarball-safe by sending benchmark, demo, motivation, and contributing links to stable GitHub URLs
14+
* quarantine historical v1.8.x launch-planning docs so they no longer read as current release guidance
15+
* stop the built CLI entrypoint from eagerly importing MCP server runtime modules before CLI subcommand dispatch
416

517
## [1.10.0](https://github.com/PatrickSys/codebase-context/compare/v1.9.0...v1.10.0) (2026-04-14)
618

README.md

Lines changed: 25 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,26 @@
11
# codebase-context
22

3-
## Stop paying for AI agents to explore your codebase. codebase-context pre-maps the architecture, conventions, and team memory so they don't have to.
3+
## Map your team's conventions before your AI agent starts searching.
44

5-
[![npm version](https://img.shields.io/npm/v/codebase-context)](https://www.npmjs.com/package/codebase-context) [![license](https://img.shields.io/npm/l/codebase-context)](./LICENSE) [![node](https://img.shields.io/node/v/codebase-context)](./package.json)
5+
[![npm version](https://img.shields.io/npm/v/codebase-context)](https://www.npmjs.com/package/codebase-context) [![license](https://img.shields.io/npm/l/codebase-context)](./LICENSE) [![node](https://img.shields.io/node/v/codebase-context)](https://github.com/PatrickSys/codebase-context/blob/master/package.json)
66

7-
You're tired of AI agents writing code that 'just works' but fits like a square peg in a round hole - not your conventions, not your architecture, not your repo. Even with well-curated instructions. You correct the agent, it doesn't remember. Next session, same mistakes.
7+
You're tired of AI agents writing code that "just works" but still misses how your team actually builds things. They search too broadly, pick generic examples, and spend tokens exploring before they understand the shape of the repo.
88

9-
This MCP gives agents _just enough_ context so they match _how_ your team codes, know _why_, and _remember_ every correction.
9+
`codebase-context` changes the first step. Start with a bounded conventions map that shows the architecture, dominant patterns, and strongest local examples. Then search for the exact file, symbol, or workflow you need.
1010

1111
Here's what codebase-context does:
1212

13-
**Finds the right context** - Search that doesn't just return code. Each result comes back with analyzed and quantified coding patterns and conventions, related team memories, file relationships, and quality indicators. It knows whether you're looking for a specific file, a concept, or how things wire together - and filters out the noise (test files, configs, old utilities) before the agent sees them. The agent gets curated context, not raw hits.
13+
**Starts with a bounded conventions map** - The first call shows architecture layers, active patterns, golden files, and next calls without dumping vendored repos, fixtures, generated output, or oversized entrypoint lists into the default surface.
1414

15-
**Knows your conventions** - Detected from your code and git history, not only from rules you wrote. Seeks team consensus and direction by adoption percentages and trends (rising/declining), golden files. Tells the difference between code that's _common_ and code that's _current_ - what patterns the team is moving toward and what's being left behind.
15+
**Finds the right local example** - Search does not just return code. Each result comes back with pattern signals, file relationships, and quality indicators so the agent can move from the map to the most relevant local example instead of wandering through raw hits.
1616

17-
**Remembers across sessions** - Decisions, failures, workarounds that look wrong but exist for a reason - the battle scars that aren't in the comments. Recorded once, surfaced automatically so the agent doesn't "clean up" something you spent a week getting right. Conventional git commits (`refactor:`, `migrate:`, `fix:`) auto-extract into memory with zero effort. Stale memories decay and get flagged instead of blindly trusted.
17+
**Knows what is current** - Conventions are detected from your code and git history, not only from rules you wrote. The map distinguishes what is common from what is rising or declining, and points at the files that best represent the current direction.
1818

19-
**Checks before editing** - Before editing something, you get a decision card showing whether there's enough evidence to proceed. If a symbol has four callers and only two appear in your search results, the card shows that coverage gap. If coverage is low, `whatWouldHelp` lists the specific searches to run before you touch anything.
19+
**Adds support signals when you need them** - Team memory and edit-readiness checks stay available, but as supporting context after the map and search have already narrowed the work.
2020

21-
One tool call returns all of it. Local-first - your code never leaves your machine by default.
21+
Map first, search second, local-first throughout. Your code never leaves your machine by default.
2222

23-
See the [current discovery benchmark](./docs/benchmark.md) for the checked-in proof results and current gate truth.
23+
See the [current discovery benchmark](https://github.com/PatrickSys/codebase-context/blob/master/docs/benchmark.md) for the checked-in discovery-only proof. The gate is still `pending_evidence`, and `claimAllowed` remains `false`.
2424

2525
### What it looks like
2626

@@ -38,7 +38,7 @@ This is the part most tools miss: what the team is doing now, what it is moving
3838

3939
When the agent searches with edit intent, it gets a compact decision card: confidence, whether it's safe to proceed, which patterns apply, the best example, and which files are likely to be affected.
4040

41-
More CLI examples in [`docs/cli.md`](./docs/cli.md). Full walkthrough: [`docs/demo.md`](./docs/demo.md).
41+
More CLI examples in [`docs/cli.md`](./docs/cli.md). Full walkthrough: [demo.md on GitHub](https://github.com/PatrickSys/codebase-context/blob/master/docs/demo.md).
4242

4343
## Quick Start
4444

@@ -71,7 +71,7 @@ Full per-client setup, HTTP server instructions, and local build testing: [`docs
7171

7272
## First Use
7373

74-
Get a conventions map of your codebase before exploring or searching:
74+
Get a conventions map of your codebase before exploring or editing:
7575

7676
```bash
7777
# See your codebase conventions — architecture layers, patterns, golden files
@@ -85,20 +85,20 @@ Your AI agent uses the same map via the `codebase://context` MCP resource on fir
8585

8686
## Common First Commands
8787

88-
Three commands to get what usually takes a new developer weeks to piece together:
88+
Three commands to understand a repo before you edit it:
8989

9090
```bash
91-
# What tech stack, architecture, and file count?
92-
npx -y codebase-context metadata
91+
# What are the main conventions and best examples?
92+
npx -y codebase-context map
9393

94-
# What does the team actually code like right now?
95-
npx -y codebase-context patterns
94+
# Then search for the local example you need
95+
npx -y codebase-context search --query "auth middleware"
9696

97-
# What team decisions were made (and why)?
98-
npx -y codebase-context memory list
97+
# What patterns is the team actually using right now?
98+
npx -y codebase-context patterns
9999
```
100100

101-
This is also what your AI agent consumes automatically via MCP tools; the CLI is the human-readable version.
101+
This is also what your AI agent consumes automatically via MCP tools; the CLI is the human-readable version of the same map-plus-search flow.
102102

103103
## What it does
104104

@@ -224,14 +224,14 @@ These are the behaviors that make the most difference day-to-day. Copy, trim wha
224224

225225
## Links
226226

227-
- [Benchmark](./docs/benchmark.md) — current discovery suite results and gate truth
228-
- [Demo](./docs/demo.md) — real CLI walkthrough
227+
- [Benchmark](https://github.com/PatrickSys/codebase-context/blob/master/docs/benchmark.md) — current discovery suite results and gate truth
228+
- [Demo](https://github.com/PatrickSys/codebase-context/blob/master/docs/demo.md) — real CLI walkthrough
229229
- [Client Setup](./docs/client-setup.md) — per-client config, HTTP setup, local build testing
230230
- [Capabilities Reference](./docs/capabilities.md) — tool API, retrieval pipeline, decision card schema
231231
- [CLI Gallery](./docs/cli.md) — formatted command output examples
232-
- [Motivation](./MOTIVATION.md) — research and design rationale
233-
- [Contributing](./CONTRIBUTING.md) — dev setup and eval harness
234-
- [Changelog](./CHANGELOG.md)
232+
- [Motivation](https://github.com/PatrickSys/codebase-context/blob/master/MOTIVATION.md) — research and design rationale
233+
- [Contributing](https://github.com/PatrickSys/codebase-context/blob/master/CONTRIBUTING.md) — dev setup and eval harness
234+
- [Changelog](https://github.com/PatrickSys/codebase-context/blob/master/CHANGELOG.md)
235235

236236
## License
237237

docs/benchmark.md

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ From `results/gate-evaluation.json`:
3737
- `claimAllowed`: `false`
3838
- `totalTasks`: `24`
3939
- `averageUsefulness`: `0.75`
40-
- `averageEstimatedTokens`: `1822.25`
40+
- `averageEstimatedTokens`: `1827.0833`
4141
- `bestExampleUsefulnessRate`: `0.125`
4242

4343
Repo-level outputs from the same rerun:
@@ -53,30 +53,32 @@ The gate is intentionally still blocked.
5353

5454
- The combined suite covers both public repos.
5555
- `claimAllowed` remains `false` because comparator evidence still does not support a benchmark-win claim.
56-
- Two comparator lanes now return `status: "ok"`, but both are effectively near-empty on the frozen tasks and contribute `0` average usefulness.
57-
- Three comparator lanes still fail setup entirely.
56+
- Two comparator artifacts now return `status: "ok"`, but that does not yet close the gate:
57+
- `raw Claude Code` still leaves the baseline `pending_evidence` because `averageFirstRelevantHit` is `null`
58+
- `codebase-memory-mcp` now has real current metrics, but the gate still marks it `failed` on the frozen tolerance rule
59+
- Three comparator lanes still fail setup entirely: `GrepAI`, `jCodeMunch`, and `CodeGraphContext`.
5860

5961
## Comparator Reality
6062

6163
The current comparator artifact records incomplete comparator evidence, not benchmark wins.
6264

6365
| Comparator | Status | Current reason |
6466
| --- | --- | --- |
65-
| `codebase-memory-mcp` | `ok` | Runs, but the checked-in artifact still averages `0` usefulness and `5` estimated tokens per task, so it does not yet contribute meaningful benchmark evidence |
67+
| `codebase-memory-mcp` | comparator artifact: `ok`; gate: `failed` | Runs through the repaired graph-backed path and now records real metrics (`averageUsefulness: 0.1875`, `averageFirstRelevantHit: 1.2857`, `bestExampleUsefulnessRate: 0.5`), but the frozen gate still fails it on the required usefulness comparisons |
6668
| `jCodeMunch` | `setup_failed` | `MCP error -32000: Connection closed` |
6769
| `GrepAI` | `setup_failed` | Local Go binary and Ollama model path not present |
6870
| `CodeGraphContext` | `setup_failed` | `MCP error -32000: Connection closed` |
69-
| `raw Claude Code` | `ok` | Runs, but the checked-in artifact still averages `0` usefulness and only `18.5` estimated tokens per task, so it does not yet contribute meaningful benchmark evidence |
71+
| `raw Claude Code` | comparator artifact: `ok`; gate: `pending_evidence` | The explicit Haiku CLI runner now returns current metrics (`averageUsefulness: 0.0278`, `averageEstimatedTokens: 32.1667`), but the baseline still lacks `averageFirstRelevantHit`, so the gate keeps this lane as missing evidence |
7072

7173
`CodeGraphContext` remains part of the frozen comparison frame. It is not omitted from the public story just because the lane still fails to start.
7274

7375
## Important Limitations
7476

7577
- This benchmark measures discovery usefulness and payload cost only.
7678
- It does not measure implementation correctness, patch quality, or end-to-end task completion.
77-
- Comparator setup remains environment-sensitive, and the checked-in comparator outputs are still too weak to justify a claim.
79+
- Comparator setup remains environment-sensitive, and the checked-in comparator outputs still do not satisfy the frozen claim gate.
7880
- The reranker cache is currently corrupted on this machine. During the proof rerun, search fell back to original ordering after `Protobuf parsing failed` while still completing the harness.
79-
- `averageFirstRelevantHit` remains `null` in the current gate output because this compact response surface does not expose a comparable ranked-hit metric across the incomplete comparator set.
81+
- `averageFirstRelevantHit` remains `null` in the current gate output, which is enough to keep the raw-Claude baseline in `pending_evidence`.
8082

8183
## What This Proof Can Support
8284

docs/capabilities.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Capabilities Reference
22

3-
Technical reference for what `codebase-context` ships today. For the user-facing overview, see [README.md](../README.md).
3+
Technical reference for what `codebase-context` ships today. The public product posture is map first, find second: the bounded conventions map is the first-call surface, and search narrows to the right local example after that. For the user-facing overview, see [README.md](../README.md).
44

55
## Transport Modes
66

@@ -298,6 +298,7 @@ Reproducible evaluation is shipped as a CLI entrypoint backed by shared scoring/
298298
- **Retrieval metrics:** Top-1 accuracy, Top-3 recall, spec contamination rate, and a gate pass/fail
299299
- **Discovery metrics:** usefulness score, payload bytes, estimated tokens, first relevant hit, and best-example usefulness
300300
- **Discovery gate:** discovery mode evaluates the frozen ship gate only when the full public suite and comparator metrics are available; missing comparator evidence is reported as pending, not silently treated as pass/fail
301+
- **Current checked-in gate truth:** `results/gate-evaluation.json` remains `pending_evidence` with `claimAllowed: false`; the raw-Claude baseline still lacks `averageFirstRelevantHit`, `codebase-memory-mcp` still fails the frozen usefulness comparisons, and the remaining named lanes are still `setup_failed`
301302
- **Limits:** discovery mode is discovery-only, uses current shipped surfaces only, and does not claim implementation quality; named competitor runs remain a documented hybrid/manual lane rather than a built-in automated benchmark
302303

303304
## Limitations

docs/cli.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
# CLI Gallery (Human-readable)
22

3-
`codebase-context` exposes its tools as a local CLI so humans can:
3+
`codebase-context` exposes its tools as a local CLI so humans can follow the same map-first workflow the MCP server gives to agents:
44

5-
- Get the conventions map before exploring or editing (`map`)
5+
- Get the bounded conventions map before exploring or editing (`map`)
6+
- Search for the right local example after the map narrows the repo shape
67
- Onboard themselves onto an unfamiliar repo
78
- Debug what the MCP server is doing
89
- Use outputs in CI/scripts (via `--json`)
@@ -50,7 +51,7 @@ CODEBASE_CONTEXT_ASCII=1 npx -y codebase-context patterns
5051
npx -y codebase-context map
5152
```
5253

53-
The conventions map run this first on an unfamiliar repo. Shows architecture layers, active patterns with adoption rates and trend direction, and the golden files the team treats as the strongest examples. This is also what the MCP server delivers to AI agents via the `codebase://context` resource on first call.
54+
The conventions map - run this first on an unfamiliar repo. It shows architecture layers, active patterns with adoption rates and trend direction, and the golden files the team treats as the strongest examples. This is also what the MCP server delivers to AI agents via the `codebase://context` resource on first call, before search narrows to a specific local example.
5455

5556
Example output (truncated):
5657

docs/client-setup.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Client Setup
22

3-
Full setup instructions for each AI client. For the quick-start summary, see [README.md](../README.md).
3+
Full setup instructions for each AI client. This guide is about transport and wiring, not a different product mode: each client gets the same bounded conventions map first and local-pattern discovery second. For the quick-start summary, see [README.md](../README.md).
44

55
## Transport modes
66

docs/comparison-table.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,17 +5,19 @@ It is a setup-status table first, not a marketing scoreboard.
55

66
| Comparator | Intended role in gate | Current status | Evidence summary |
77
| --- | --- | --- | --- |
8-
| `raw Claude Code` | Baseline for payload cost and at least one usefulness comparison | `setup_failed` | The local `claude` CLI baseline is unavailable in this environment, so the gate records missing baseline metrics. |
8+
| `raw Claude Code` | Baseline for payload cost and at least one usefulness comparison | comparator artifact: `ok`; gate: `pending_evidence` | The Haiku-backed Claude CLI runner now returns current payloads, but the checked-in baseline still has `averageFirstRelevantHit: null`, so the gate still records missing baseline metrics. |
99
| `GrepAI` | Named MCP comparator | `setup_failed` | Requires the GrepAI binary plus a local Ollama embedding setup that is not present in this proof environment. |
1010
| `jCodeMunch` | Named MCP comparator | `setup_failed` | The MCP server still closes on startup during the current rerun, so no comparable discovery metrics were produced. |
11-
| `codebase-memory-mcp` | Named MCP comparator | `setup_failed` | The documented install path still depends on the external shell installer instead of a working local benchmark path. |
11+
| `codebase-memory-mcp` | Named MCP comparator | comparator artifact: `ok`; gate: `failed` | The repaired graph-backed runner now produces real current metrics, but the frozen gate still fails this lane because `codebase-context` does not stay within tolerance on every required usefulness metric. |
1212
| `CodeGraphContext` | Graph-native comparator in the relaunch frame | `setup_failed` | The MCP server still closes on startup during the current rerun, so this lane remains missing evidence. |
1313

1414
## Reading This Table
1515

1616
- `setup_failed` means the lane was attempted and did not reach a credible metric-producing state.
17+
- `pending_evidence` in the gate means the lane is still missing one or more required metrics.
18+
- `failed` in the gate means the lane has real metrics, but the frozen comparison rule still does not pass.
1719
- A missing metric is not treated as a win for `codebase-context`.
18-
- The combined gate in `results/gate-evaluation.json` remains `pending_evidence` until these lanes produce real metrics.
20+
- The combined gate in `results/gate-evaluation.json` remains `pending_evidence`, and `claimAllowed` stays `false`, until these lanes produce real metrics.
1921

2022
## Current codebase-context result
2123

@@ -25,8 +27,8 @@ For reference, the current combined discovery output across `angular-spotify` an
2527
| --- | ---: |
2628
| `totalTasks` | 24 |
2729
| `averageUsefulness` | 0.75 |
28-
| `averagePayloadBytes` | 3613.6667 |
29-
| `averageEstimatedTokens` | 903.7083 |
30+
| `averagePayloadBytes` | 7306.4583 |
31+
| `averageEstimatedTokens` | 1827.0833 |
3032
| `bestExampleUsefulnessRate` | 0.125 |
3133
| `gate.status` | `pending_evidence` |
3234

0 commit comments

Comments
 (0)