You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/context/SCHEMA_VALIDATION.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -133,7 +133,7 @@ The schema completeness test runs as part of the test suite. To ensure schema st
133
133
134
134
-**Deprecated fields**: Fields marked `@deprecated` in TypeScript (e.g., `entryPathAbs`, `entryPathRel`, `os`) may not be in the schema if they're no longer generated. This is intentional for backward compatibility - old files with these fields may still exist.
135
135
136
-
-**Conditional validation**: The schema doesn't currently validate language-specific field combinations (e.g., `style` shouldn't exist on `node:api` contracts). This is planned for v0.8.x (see ROADMAP.md).
136
+
-**Conditional validation**: The schema doesn't currently validate language-specific field combinations (e.g., `style` shouldn't exist on `node:api` contracts). This is planned for a future release (see [ROADMAP.md](../ROADMAP.md)).
137
137
138
138
-**Style mode variants**: The schema supports both `lean` and `full` style modes. Lean mode uses count fields (e.g., `selectorCount`, `componentCount`), while full mode uses arrays (e.g., `selectors`, `components`). Both formats are valid and should be tested separately.
-**full** - Contracts plus complete embedded source
19
17
20
18
## Output Format
21
19
22
-
The output shows three things:
20
+
**1. Token estimation** — Which tokenizers ran, or character fallback.
21
+
22
+
**2. Comparison vs raw source** — See [baselines](#what-the-two-baselines-mean). Negative % means summed header bundles exceed one copy of all source files, but a **single** root’s bundle is often still far smaller than raw because the row totals **every** root, not the one bundle you attach in chat.
23
23
24
-
**1. Token estimation method**- Shows which tokenizers are being used, or if it's falling back to approximations
24
+
**3. Mode breakdown**— Each mode vs **full** (same bundle set): how much smaller than maximum.
25
25
26
-
**2. Comparison vs raw source** - Savings compared to including all source files directly:
26
+
Example tables:
27
27
28
28
```
29
29
Comparison:
@@ -34,10 +34,6 @@ The output shows three things:
34
34
Header+style | 170,466 | 184,864 | 38%
35
35
```
36
36
37
-
Header mode saves ~70% by extracting contracts and signatures without implementation code. Header+style saves ~38% but adds visual context. Full mode actually costs more than raw source (~30% overhead) due to contract structure.
38
-
39
-
**3. Mode breakdown** - All modes compared to the maximum (full):
40
-
41
37
```
42
38
Mode breakdown:
43
39
Mode | Tokens GPT-4o | Tokens Claude | Savings vs Full Context
@@ -50,130 +46,73 @@ Header mode saves ~70% by extracting contracts and signatures without implementa
50
46
51
47
## Token Estimation
52
48
53
-
By default, the tool uses character-based approximations (~4 chars/token for GPT-4o, ~4.5 for Claude). These are usually within 10-15% of actual counts, which is fine for most cases.
54
-
55
-
For accurate counts, LogicStamp includes `@dqbd/tiktoken` (GPT-4) and `@anthropic-ai/tokenizer` (Claude) as optional dependencies. npm installs them automatically when you install `logicstamp-context`. If that works, you get exact token counts. If it fails (normal for optional deps), it falls back to approximation.
56
-
57
-
You only need to install tokenizers manually if:
58
-
- You need exact counts (not approximations)
59
-
- AND automatic installation failed
49
+
Defaults use ~4 chars/token (GPT-4o) and ~4.5 (Claude). Optional deps `@dqbd/tiktoken` and `@anthropic-ai/tokenizer` give exact counts when install succeeds.
Accurate counts matter for production deployments, tight budgets, or comparing tools. For development, approximations are usually fine.
55
+
## What the two baselines mean
66
56
67
-
## Mode Selection Guide
57
+
**Raw source** — Every project `.ts` / `.tsx` (tests excluded), each file **once**, joined. No JSON bundles.
58
+
59
+
**Header / header+style (first table)** — Tokens for **all root bundles**, formatted and concatenated. Anything imported by many roots is repeated; bundle JSON adds overhead. So the table can show header **above** raw on large multi-root graphs, even when **one** feature bundle is still cheap. In chat you usually send one bundle—that is **not** the same as this summed row.
68
60
69
-
**none** - Maximum compression (~18% of raw source)
70
-
- Contracts only, no code or style
71
-
- Good for: CI/CD validation, dependency analysis, architecture reviews
72
-
- Skip if: You need implementation details or visual context
61
+
**Tailwind + header+style** — Style extraction expands utilities into structured text, so that row often grows vs raw more than on SCSS-heavy repos (in addition to duplication above).
73
62
74
-
**header** - Balanced compression (~30% of raw source) *recommended default*
75
-
- Contracts + JSDoc headers + function signatures
76
-
- Good for: Most AI chat workflows, code review, understanding interfaces
77
-
- This is what most people need 90% of the time
63
+
## Mode Selection Guide
78
64
79
-
**header+style** - Visual context (~62% of raw source)
# Creates context_compare_modes.json with structured data for MCP servers
121
-
```
122
-
123
-
## Understanding the Numbers
124
-
125
-
**Savings vs Raw Source:** Shows how much you save compared to just concatenating all source files. Higher is better. Header mode typically saves ~70%, header+style saves ~38%. Full mode actually costs more (~30% overhead) due to contract structure.
126
-
127
-
**Savings vs Full Context:** Shows efficiency compared to the maximum mode. Header saves ~77%, header+style saves ~52%.
128
-
129
-
**GPT-4o vs Claude:** Token counts differ slightly (usually 5-10%) because each model tokenizes differently. Both estimates are shown so you can plan for either.
130
-
131
-
**Accuracy:** Approximations are usually within 10-15% and fine for planning. Tokenizers give exact counts but require installation.
132
-
133
88
## Common Questions
134
89
135
-
**Why are my numbers different from raw file sizes?**
136
-
Token counts ≠ character counts. Tokenizers split text into semantic units—common words are 1 token, rare words are multiple, code symbols vary, whitespace compresses.
90
+
**Why don’t tokens match file size?** Tokens ≠ bytes; tokenizers split code and prose unevenly.
137
91
138
-
**Should I always use accurate tokenizers?**
139
-
Use approximations for development/prototyping. Use tokenizers for production, tight budgets, or comparing tools.
92
+
**When are tokenizers worth it?** Tight budgets, production gates, comparing tools. Approximations are fine for day-to-day.
140
93
141
-
**How much overhead do contracts add?**
142
-
In `full` mode, contracts add ~30% overhead vs raw source due to JSON structure and metadata. The overhead is worth it for structured dependency graphs and better AI comprehension, but `header` mode avoids most of it while still giving you what you need.
94
+
**`full` overhead?** JSON + metadata on top of embedded source; `header` avoids most of that.
143
95
144
-
**Why do the savings percentages seem generous?**
145
-
"Savings vs raw source" compares against simple file concatenation. Header mode saves 70% because it extracts contracts and signatures without implementation code. Full mode actually costs more than raw source (~30% overhead) due to contract structure. The real win: header mode gives you 90% of what you need at 30% of the cost.
96
+
### Why can Header show more tokens than Raw source?
146
97
147
-
**Can I compare specific folders?**
148
-
Yes:
149
-
```bash
150
-
stamp context ./src/components --compare-modes
151
-
```
98
+
[What the two baselines mean](#what-the-two-baselines-mean) — raw is one copy of each file; the header row is **every root bundle** summed.
152
99
153
-
**Does --compare-modes write files?**
154
-
No, it's analysis-only by default. It generates contracts in memory, computes estimates, and displays tables. Use `stamp context` (without the flag) to actually generate context files.
With `--stats`, it writes `context_compare_modes.json` for MCP integration:
157
-
```bash
158
-
stamp context --compare-modes --stats
159
-
```
102
+
**Writes files?** No, unless `--stats` (JSON). Normal `stamp context` writes bundles.
160
103
161
104
## Performance
162
105
163
-
Takes 2-3x longer than normal generation because it regenerates contracts with and without style for accurate comparison. Uses in-memory processing (no disk writes). Typical execution: 5-15 seconds for medium projects (50-150 files).
106
+
~2–3× normal run; in-memory, no bundle files unless `--stats`.
Copy file name to clipboardExpand all lines: docs/context/cli/context.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,7 +13,7 @@ stamp context [path] [options]
13
13
14
14
**Setup:**`stamp context` respects preferences saved in `.logicstamp/config.json` and never prompts. On first run (no config), it defaults to skipping both `.gitignore` and `LLM_CONTEXT.md` setup for CI-friendly behavior. Use [`stamp init`](init.md) to configure these options (non-interactive by default; use `--no-secure` for interactive mode).
15
15
16
-
**File Exclusion:**`stamp context` respects `.stampignore` and excludes those files from context compilation. You'll see how many files were excluded (unless using `--quiet`). Use `stamp ignore <file>` to add files to `.stampignore`. `.stampignore` is completely optional and independent of security scanning. See [stampignore.md](../stampignore.md) for details.
16
+
**File Exclusion:**`stamp context` respects `.stampignore` and excludes those files from context compilation. You'll see how many files were excluded (unless using `--quiet`). Use `stamp ignore <file>` to add files to `.stampignore`. `.stampignore` is completely optional and independent of security scanning. See [stampignore.md](../reference/stampignore.md) for details.
17
17
18
18
**Secret Sanitization:** If a security report (`stamp_security_report.json`) exists, `stamp context` automatically replaces detected secrets with `"PRIVATE_DATA"` in the generated JSON files. **Your source code files are never modified** - only the generated context files contain sanitized values. See [security-scan.md](security-scan.md) for details.
19
19
@@ -187,9 +187,9 @@ Example `.stampignore`:
187
187
}
188
188
```
189
189
190
-
`.stampignore` is completely optional and can be created manually. It's independent of security scanning. See [stampignore.md](../stampignore.md) for complete documentation.
190
+
`.stampignore` is completely optional and can be created manually. It's independent of security scanning. See [stampignore.md](../reference/stampignore.md) for complete documentation.
191
191
192
-
For complete documentation on `.stampignore` file format, see [stampignore.md](../stampignore.md).
192
+
For complete documentation on `.stampignore` file format, see [stampignore.md](../reference/stampignore.md).
193
193
194
194
## Secret Sanitization
195
195
@@ -205,8 +205,8 @@ When generating context files, LogicStamp automatically sanitizes secrets if a s
0 commit comments