You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: dedupe cross-links; canonical fingerprints in architecture
- Single layering pointer in README; CONTRIBUTING line merged with JSDoc note
- docs/README: one conventions + contributors line
- architecture § Schema: Fingerprints paragraph; files.content_hash points to it
- benchmark: Results blurb links to architecture; fix fixtures anchor; trim Key Takeaways
- why-codemap: one benchmark link from Solution
const rows =cm.query("SELECT name FROM symbols LIMIT 5");
74
74
```
75
75
76
-
`createCodemap` configures a process-global runtime (`initCodemap`); only **one active project per process** is supported. Advanced: `runCodemapIndex` for an open DB handle. Layering (`cli` → `application` → infrastructure):[docs/architecture.md](docs/architecture.md).
76
+
`createCodemap` configures a process-global runtime (`initCodemap`); only **one active project per process** is supported. Advanced: `runCodemapIndex` for an open DB handle. **Module layout:**[docs/architecture.md § Layering](docs/architecture.md#layering).
77
77
78
78
---
79
79
@@ -94,7 +94,7 @@ bun run check # build + format:check + lint + test + typecheck
94
94
bun run fix # oxlint --fix, then oxfmt
95
95
```
96
96
97
-
**Readability & DX:** Prefer clear names and small functions over cleverness. **Public API** surface (`createCodemap`, `Codemap`, config types, `runCodemapIndex`, adapter exports) should stay **documented with JSDoc**so consumers get good hovers and published `.d.ts` stay useful. **Layering** (`cli` → `application` → `adapters` / parsers → SQLite): see [docs/architecture.md](docs/architecture.md). More for contributors: [.github/CONTRIBUTING.md](.github/CONTRIBUTING.md).
97
+
**Readability & DX:** Prefer clear names and small functions; keep **JSDoc**on public exports. [.github/CONTRIBUTING.md](.github/CONTRIBUTING.md) has contributor workflow and conventions.
Copy file name to clipboardExpand all lines: docs/README.md
+1-3Lines changed: 1 addition & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,6 +13,4 @@ Technical docs for **[@stainless-code/codemap](https://github.com/stainless-code
13
13
|[roadmap.md](./roadmap.md)| Forward-looking backlog (not a `src/` inventory) |
14
14
|[why-codemap.md](./why-codemap.md)| Why index + SQL for agents |
15
15
16
-
**Conventions:** one topic per file; link with relative paths; no hardcoded symbol/file counts (use `codemap query` / `bun run dev query`); no source line numbers. **Contributors:** keep public API JSDoc useful; run `bun run check` — see [CONTRIBUTING](../.github/CONTRIBUTING.md).
**Conventions:** one topic per file; relative links; no symbol/file counts or source line numbers in docs (use `codemap query` / `bun run dev query` to measure). **Contributors:**`bun run check`, JSDoc on public API — [.github/CONTRIBUTING.md](../.github/CONTRIBUTING.md) (tooling, `.agents/` / `.cursor/`, `.gitignore` / format config).
Current schema version: **2** — see [Schema Versioning](#schema-versioning) for details
144
+
**Fingerprints:** incremental runs compare **`files.content_hash`** — SHA-256 hex of raw file bytes from [`src/hash.ts`](../src/hash.ts) (same on Node and Bun). Details in the **`files`** table below.
145
145
146
-
All tables use `STRICT` mode. Tables marked with `WITHOUT ROWID` store data directly in the primary key B-tree. See [SQLite Performance Configuration](#sqlite-performance-configuration) for details.
146
+
Current schema version: **2** — see [Schema Versioning](#schema-versioning) for details.
147
+
148
+
All tables use `STRICT` mode. Tables marked with `WITHOUT ROWID` store data directly in the primary key B-tree. PRAGMAs and index design: [SQLite Performance Configuration](#sqlite-performance-configuration).
Copy file name to clipboardExpand all lines: docs/benchmark.md
+6-13Lines changed: 6 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -62,7 +62,7 @@ Each scenario runs both approaches back-to-back on the same machine, same data.
62
62
63
63
## Results
64
64
65
-
Example snapshot from `bun src/benchmark.ts` immediately after `bun src/index.ts --full` on **this repository** (small tree; many scenario result counts are zero — that is expected here). Numbers vary by machine and project shape. Settings: schema v2, SHA-256 content fingerprints (`src/hash.ts`), `db.query()` caching, covering/partial indexes, mmap, worker threads, deferred indexes, `batchInsert` helper.
65
+
Example snapshot from `bun src/benchmark.ts` immediately after `bun src/index.ts --full` on **this repository** (small tree; many scenario counts are zero). Numbers vary by machine and project. Schema, indexes, and content fingerprints: [architecture.md § Schema](./architecture.md#schema).
66
66
67
67
| Scenario | Index Time | Results | Trad. Time | Results | Files Read | Bytes Read | Speedup |
@@ -76,7 +76,7 @@ Example snapshot from `bun src/benchmark.ts` immediately after `bun src/index.ts
76
76
77
77
**Totals**: Index ~408µs vs Traditional ~26.7ms (**~65× overall** on a sample run). Traditional bytes read total ~393 KB (not megabytes) because the globbed sets are small.
78
78
79
-
On a **large app** indexed via `--root`, the same queries typically return non-zero rows; the indexed side stays sub-millisecond while the traditional side reads megabytes for broad globs. [Fixtures (planned)](#fixtures-planned) describes the plan for CI-friendly trees.
79
+
On a **large app** indexed via `--root`, the same queries typically return non-zero rows; the indexed side stays sub-millisecond while the traditional side reads megabytes for broad globs. Repeatable numbers: [Fixtures](#fixtures).
80
80
81
81
### Run-to-run variance
82
82
@@ -90,22 +90,15 @@ The indexed CSS scenario uses `ORDER BY name LIMIT 50` — see `benchmark.ts` fo
90
90
91
91
### Speed
92
92
93
-
-**Symbol / component queries** — covering indexes resolve from the index B-tree; indexed time stays sub-millisecond while the traditional path reads every matching file for regex
94
-
-**TODO markers** — pre-extracted markers across indexed file types vs a narrower traditional glob
95
-
-**Imports** — `imports` table vs full-file scan for a given module prefix
96
-
Indexed SQL timings above are sub-millisecond per scenario. See [architecture.md § SQLite Performance Configuration](./architecture.md#sqlite-performance-configuration) for PRAGMAs and indexes.
93
+
Indexed queries use **covering / partial indexes** on the SQLite side; the traditional path scales with **files read** and regex work. PRAGMAs and index design: [architecture.md § SQLite Performance Configuration](./architecture.md#sqlite-performance-configuration).
97
94
98
95
### Accuracy
99
96
100
-
-**React components**: Index uses the same JSX/TSX component heuristic as the rest of the tool; regex “export” scans can over- or under-count vs `components`
101
-
-**CSS tokens**: Indexed rows are structured; raw `--var` regexes often pick up duplicates and non-token matches
102
-
-**TODO markers**: Index scans more configured extensions than a single glob in the benchmark’s traditional path
See [why-codemap.md § Accuracy Gains](./why-codemap.md#accuracy-gains) for the full analysis.
99
+
### Token impact (AI agents)
105
100
106
-
### Token Impact (AI Agents)
107
-
108
-
See [why-codemap.md § Token Efficiency](./why-codemap.md#token-efficiency) for the full analysis. On a large tree, the traditional approach can read tens of megabytes across scenarios; indexed queries return only matching rows.
Copy file name to clipboardExpand all lines: docs/why-codemap.md
+1-3Lines changed: 1 addition & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,12 +13,10 @@ This burns context window, wastes tokens, slows response time, and produces less
13
13
14
14
## The Solution
15
15
16
-
A pre-built SQLite index (`.codemap.db`) that extracts and structures code metadata at index time. Agents query it with SQL instead of scanning files. Build and query timings: [benchmark.md](./benchmark.md).
16
+
A pre-built SQLite index (`.codemap.db`) that extracts and structures code metadata at index time. Agents query it with SQL instead of scanning files. Timings, scenarios, and methodology: [benchmark.md](./benchmark.md).
17
17
18
18
## Speed Gains
19
19
20
-
Measured via `bun src/benchmark.ts` — see [benchmark.md](./benchmark.md) for full methodology.
21
-
22
20
### Headline pattern
23
21
24
22
Indexed queries stay **sub-millisecond** per scenario on typical trees; the traditional path scales with **how many files** it must read and scan. On a large application, overall speedups on the order of **tens to hundreds ×** are common for structural questions; exact ratios depend on the project and hardware. Re-run the benchmark after major changes or when pointing `--root` at a different repo.
0 commit comments