Skip to content

Commit ab04ab5

Browse files
committed
docs: dedupe cross-links; canonical fingerprints in architecture
- Single layering pointer in README; CONTRIBUTING line merged with JSDoc note - docs/README: one conventions + contributors line - architecture § Schema: Fingerprints paragraph; files.content_hash points to it - benchmark: Results blurb links to architecture; fix fixtures anchor; trim Key Takeaways - why-codemap: one benchmark link from Solution
1 parent ff0b7d8 commit ab04ab5

5 files changed

Lines changed: 23 additions & 32 deletions

File tree

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ await cm.index({ quiet: true });
7373
const rows = cm.query("SELECT name FROM symbols LIMIT 5");
7474
```
7575

76-
`createCodemap` configures a process-global runtime (`initCodemap`); only **one active project per process** is supported. Advanced: `runCodemapIndex` for an open DB handle. Layering (`cli``application` → infrastructure): [docs/architecture.md](docs/architecture.md).
76+
`createCodemap` configures a process-global runtime (`initCodemap`); only **one active project per process** is supported. Advanced: `runCodemapIndex` for an open DB handle. **Module layout:** [docs/architecture.md § Layering](docs/architecture.md#layering).
7777

7878
---
7979

@@ -94,7 +94,7 @@ bun run check # build + format:check + lint + test + typecheck
9494
bun run fix # oxlint --fix, then oxfmt
9595
```
9696

97-
**Readability & DX:** Prefer clear names and small functions over cleverness. **Public API** surface (`createCodemap`, `Codemap`, config types, `runCodemapIndex`, adapter exports) should stay **documented with JSDoc** so consumers get good hovers and published `.d.ts` stay useful. **Layering** (`cli``application``adapters` / parsers → SQLite): see [docs/architecture.md](docs/architecture.md). More for contributors: [.github/CONTRIBUTING.md](.github/CONTRIBUTING.md).
97+
**Readability & DX:** Prefer clear names and small functions; keep **JSDoc** on public exports. [.github/CONTRIBUTING.md](.github/CONTRIBUTING.md) has contributor workflow and conventions.
9898

9999
---
100100

docs/README.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,4 @@ Technical docs for **[@stainless-code/codemap](https://github.com/stainless-code
1313
| [roadmap.md](./roadmap.md) | Forward-looking backlog (not a `src/` inventory) |
1414
| [why-codemap.md](./why-codemap.md) | Why index + SQL for agents |
1515

16-
**Conventions:** one topic per file; link with relative paths; no hardcoded symbol/file counts (use `codemap query` / `bun run dev query`); no source line numbers. **Contributors:** keep public API JSDoc useful; run `bun run check` — see [CONTRIBUTING](../.github/CONTRIBUTING.md).
17-
18-
**Also:** [.gitignore](../.gitignore) (`.codemap.db`), [.oxfmtrc.json](../.oxfmtrc.json) / [.oxlintrc.json](../.oxlintrc.json), [.agents/](../.agents/) / [.cursor/](../.cursor/)[CONTRIBUTING](../.github/CONTRIBUTING.md).
16+
**Conventions:** one topic per file; relative links; no symbol/file counts or source line numbers in docs (use `codemap query` / `bun run dev query` to measure). **Contributors:** `bun run check`, JSDoc on public API — [.github/CONTRIBUTING.md](../.github/CONTRIBUTING.md) (tooling, `.agents/` / `.cursor/`, `.gitignore` / format config).

docs/architecture.md

Lines changed: 13 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -141,21 +141,23 @@ The npm package exports **`createCodemap`**, **`Codemap`** (`query`, `index`), *
141141

142142
## Schema
143143

144-
Current schema version: **2**see [Schema Versioning](#schema-versioning) for details
144+
**Fingerprints:** incremental runs compare **`files.content_hash`**SHA-256 hex of raw file bytes from [`src/hash.ts`](../src/hash.ts) (same on Node and Bun). Details in the **`files`** table below.
145145

146-
All tables use `STRICT` mode. Tables marked with `WITHOUT ROWID` store data directly in the primary key B-tree. See [SQLite Performance Configuration](#sqlite-performance-configuration) for details.
146+
Current schema version: **2** — see [Schema Versioning](#schema-versioning) for details.
147+
148+
All tables use `STRICT` mode. Tables marked with `WITHOUT ROWID` store data directly in the primary key B-tree. PRAGMAs and index design: [SQLite Performance Configuration](#sqlite-performance-configuration).
147149

148150
### `files` — Every indexed file (`STRICT`)
149151

150-
| Column | Type | Description |
151-
| ------------- | ------- | ------------------------------------------------- |
152-
| path | TEXT PK | Relative path from project root |
153-
| content_hash | TEXT | SHA-256 hex (`src/hash.ts`, same on Node and Bun) |
154-
| size | INTEGER | File size in bytes |
155-
| line_count | INTEGER | Total lines |
156-
| language | TEXT | `ts`, `tsx`, `css`, `md`, etc. |
157-
| last_modified | INTEGER | File mtime (epoch ms) |
158-
| indexed_at | INTEGER | When this row was written |
152+
| Column | Type | Description |
153+
| ------------- | ------- | ---------------------------------------------- |
154+
| path | TEXT PK | Relative path from project root |
155+
| content_hash | TEXT | SHA-256 hex — see **Fingerprints** at § Schema |
156+
| size | INTEGER | File size in bytes |
157+
| line_count | INTEGER | Total lines |
158+
| language | TEXT | `ts`, `tsx`, `css`, `md`, etc. |
159+
| last_modified | INTEGER | File mtime (epoch ms) |
160+
| indexed_at | INTEGER | When this row was written |
159161

160162
### `symbols` — Functions, variables, classes, interfaces, type aliases, enums (`STRICT`)
161163

docs/benchmark.md

Lines changed: 6 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ Each scenario runs both approaches back-to-back on the same machine, same data.
6262

6363
## Results
6464

65-
Example snapshot from `bun src/benchmark.ts` immediately after `bun src/index.ts --full` on **this repository** (small tree; many scenario result counts are zero — that is expected here). Numbers vary by machine and project shape. Settings: schema v2, SHA-256 content fingerprints (`src/hash.ts`), `db.query()` caching, covering/partial indexes, mmap, worker threads, deferred indexes, `batchInsert` helper.
65+
Example snapshot from `bun src/benchmark.ts` immediately after `bun src/index.ts --full` on **this repository** (small tree; many scenario counts are zero). Numbers vary by machine and project. Schema, indexes, and content fingerprints: [architecture.md § Schema](./architecture.md#schema).
6666

6767
| Scenario | Index Time | Results | Trad. Time | Results | Files Read | Bytes Read | Speedup |
6868
| --------------------------------------- | ---------- | ------- | ---------- | ------- | ---------- | ---------- | -------- |
@@ -76,7 +76,7 @@ Example snapshot from `bun src/benchmark.ts` immediately after `bun src/index.ts
7676

7777
**Totals**: Index ~408µs vs Traditional ~26.7ms (**~65× overall** on a sample run). Traditional bytes read total ~393 KB (not megabytes) because the globbed sets are small.
7878

79-
On a **large app** indexed via `--root`, the same queries typically return non-zero rows; the indexed side stays sub-millisecond while the traditional side reads megabytes for broad globs. [Fixtures (planned)](#fixtures-planned) describes the plan for CI-friendly trees.
79+
On a **large app** indexed via `--root`, the same queries typically return non-zero rows; the indexed side stays sub-millisecond while the traditional side reads megabytes for broad globs. Repeatable numbers: [Fixtures](#fixtures).
8080

8181
### Run-to-run variance
8282

@@ -90,22 +90,15 @@ The indexed CSS scenario uses `ORDER BY name LIMIT 50` — see `benchmark.ts` fo
9090

9191
### Speed
9292

93-
- **Symbol / component queries** — covering indexes resolve from the index B-tree; indexed time stays sub-millisecond while the traditional path reads every matching file for regex
94-
- **TODO markers** — pre-extracted markers across indexed file types vs a narrower traditional glob
95-
- **Imports**`imports` table vs full-file scan for a given module prefix
96-
Indexed SQL timings above are sub-millisecond per scenario. See [architecture.md § SQLite Performance Configuration](./architecture.md#sqlite-performance-configuration) for PRAGMAs and indexes.
93+
Indexed queries use **covering / partial indexes** on the SQLite side; the traditional path scales with **files read** and regex work. PRAGMAs and index design: [architecture.md § SQLite Performance Configuration](./architecture.md#sqlite-performance-configuration).
9794

9895
### Accuracy
9996

100-
- **React components**: Index uses the same JSX/TSX component heuristic as the rest of the tool; regex “export” scans can over- or under-count vs `components`
101-
- **CSS tokens**: Indexed rows are structured; raw `--var` regexes often pick up duplicates and non-token matches
102-
- **TODO markers**: Index scans more configured extensions than a single glob in the benchmark’s traditional path
97+
Structured parsing vs regex tradeoffs (components, CSS, markers, imports): [why-codemap.md § Accuracy Gains](./why-codemap.md#accuracy-gains).
10398

104-
See [why-codemap.md § Accuracy Gains](./why-codemap.md#accuracy-gains) for the full analysis.
99+
### Token impact (AI agents)
105100

106-
### Token Impact (AI Agents)
107-
108-
See [why-codemap.md § Token Efficiency](./why-codemap.md#token-efficiency) for the full analysis. On a large tree, the traditional approach can read tens of megabytes across scenarios; indexed queries return only matching rows.
101+
[why-codemap.md § Token Efficiency](./why-codemap.md#token-efficiency).
109102

110103
### Reindex Cost
111104

docs/why-codemap.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,12 +13,10 @@ This burns context window, wastes tokens, slows response time, and produces less
1313

1414
## The Solution
1515

16-
A pre-built SQLite index (`.codemap.db`) that extracts and structures code metadata at index time. Agents query it with SQL instead of scanning files. Build and query timings: [benchmark.md](./benchmark.md).
16+
A pre-built SQLite index (`.codemap.db`) that extracts and structures code metadata at index time. Agents query it with SQL instead of scanning files. Timings, scenarios, and methodology: [benchmark.md](./benchmark.md).
1717

1818
## Speed Gains
1919

20-
Measured via `bun src/benchmark.ts` — see [benchmark.md](./benchmark.md) for full methodology.
21-
2220
### Headline pattern
2321

2422
Indexed queries stay **sub-millisecond** per scenario on typical trees; the traditional path scales with **how many files** it must read and scan. On a large application, overall speedups on the order of **tens to hundreds ×** are common for structural questions; exact ratios depend on the project and hardware. Re-run the benchmark after major changes or when pointing `--root` at a different repo.

0 commit comments

Comments
 (0)