Skip to content

Commit fd014d9

Browse files
committed
docs(tok): update README and CHANGELOG with Round 2 porting additions
Document the 8 new features added in the Round 2 rtk + caveman porting session (2026-06-01): - tok.CompressCaveman (Lite/Full/Ultra) - tok.IsSensitiveFilename (3-layer detection) - filter.SmartTruncate (rtk kept+dropped==total invariant) - tok.ExtractJSON/ExtractJSONArray/ExtractAllJSON - tok.NewTracker (SQLite gain tracker) - tok.EstimateTokensFast/WithEncoding/ForModel - filter.CompressWithRetry (validate-fix-retry loop) - filter.NewTOMLFilter (8-stage rtk pipeline) - RetryStats.FinalPipelineStats pointer fix (vet copylocks) The README's "Recent additions" section is split into two chronological sections (2026-06-01 and 2026-04-20) so the latest work is on top. The CHANGELOG adds a new "Round 2 of rtk + caveman porting" subsection under [Unreleased].
1 parent 8f6c4a0 commit fd014d9

2 files changed

Lines changed: 49 additions & 0 deletions

File tree

CHANGELOG.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,42 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1616
- Dollar-based cost savings tracking in Stats
1717
- Model-aware BPE encoding selection
1818

19+
### Added — Round 2 of rtk + caveman porting (2026-06-01)
20+
- **`tok.CompressCaveman(text, intensity)`** — public Go API for the
21+
caveman prompt-compression algorithm. Three intensity levels
22+
(`CavemanLite`, `CavemanFull`, `CavemanUltra`) with drop-lists for
23+
articles / filler / pleasantries and ~150 phrase substitutions.
24+
Auto-clarity: security / destructive segments pass through verbatim.
25+
Returns a `CavemanStats` struct (OriginalBytes, CompressedBytes,
26+
BytesSaved, PercentOff, PassThroughSegments, etc.).
27+
- **`tok.IsSensitiveFilename(path)`** — 3-layer path-based sensitive
28+
detection (exact basename, sensitive directory, name token).
29+
Companion to the content-based `SecretDetector`.
30+
Categories: `CatExactBasename`, `CatSensitiveDirectory`, `CatNameToken`.
31+
- **`filter.SmartTruncate`** with the rtk `kept + dropped == total`
32+
invariant. Returns `TruncateStats` so callers can verify the
33+
accounting from the output text alone.
34+
- **`tok.ExtractJSON` / `tok.ExtractJSONArray` / `tok.ExtractAllJSON`**
35+
brace-balanced JSON extractor. Handles LLM output with surrounding
36+
prose, markdown code fences, and unterminated objects.
37+
- **`tok.NewTracker(ctx)`** — persistent SQLite gain tracker
38+
(WAL mode, 90-day retention, pure Go via `modernc.org/sqlite`).
39+
Records per-event compression stats and supports aggregate
40+
queries. Default path `~/.tok/tracker.db`.
41+
- **`tok.EstimateTokensFast/WithEncoding/ForModel`** — model-aware
42+
token estimation exposed at the top level (was previously internal).
43+
- **`filter.CompressWithRetry`** — validate-fix-retry loop. Caller
44+
supplies a `Validator` and `AdjustFunc`; the loop escalates
45+
mode/intensity and retries up to `MaxRetries` times. Includes
46+
`MustContainsValidator` and `MaxTokensValidator` helpers.
47+
- **`filter.NewTOMLFilter` / `LoadTOMLFilterFile`** — full rtk
48+
8-stage pipeline (strip_ansi, replace regex, match_output
49+
short-circuit, strip/keep_lines, truncate_lines, head/tail,
50+
max_lines, on_empty) as a `Filter` implementation. Plugs into
51+
the main pipeline via the existing `tomlFilterWrapper` slot.
52+
- Bug fix: `RetryStats.FinalPipelineStats` is now `*PipelineStats`
53+
to avoid copying `sync.RWMutex` per Go vet `copylocks` warning.
54+
1955
### Added — Production Hardening (top-50 OSS parity)
2056
- Same-style hardening pass already on this branch:
2157
strict `golangci-lint` v2 config (`errcheck`, `staticcheck`, `gocritic`

README.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,19 @@ Reproduce: `go build -o tok ./cmd/tok && TOK=./tok evals/bench.sh --no-rtk`
130130

131131
## Recent additions
132132

133+
Session 2026-06-01 closed the second round of gaps vs rtk 0.37.1 and caveman:
134+
135+
- **`tok.CompressCaveman(text, intensity)`** — caveman-style prompt compression (Lite / Full / Ultra) as a public Go API. ~150 phrase substitutions, drop-lists for articles / filler / pleasantries, and auto-clarity (security/destructive segments pass through verbatim). `intensity` is monotonic: `len(ultra) <= len(full) <= len(lite)`.
136+
- **`tok.IsSensitiveFilename(path)`** — 3-layer filename detection (exact basename, sensitive directory, name token). Companion to the content-based `SecretDetector`. Catches `.env`, `id_rsa`, `~/.ssh/...`, `test_credentials.json`, etc. Returns the category that fired.
137+
- **`tok.SmartTruncate(content, maxLines, lang)`** — code truncation that preserves function signatures and **always reports the exact drop count** in the marker (`kept + dropped == total`, rtk invariant).
138+
- **`tok.ExtractJSON(text)` / `tok.ExtractJSONArray(text)` / `tok.ExtractAllJSON(text)`** — brace-balanced JSON extraction that handles LLM output with surrounding prose, markdown code fences, and unterminated objects. Apostrophes in English prose are not confused with string delimiters.
139+
- **`tok.NewTracker(ctx)`** — persistent gain tracker (SQLite + WAL, 90-day retention, pure-Go via `modernc.org/sqlite`). Records per-event savings and supports `Aggregate`, `Recent`, `Prune` queries. Default path `~/.tok/tracker.db`.
140+
- **`tok.EstimateTokensFast/WithEncoding/ForModel`** — model-aware token estimation exposed at the top level (was previously internal).
141+
- **`filter.CompressWithRetry`** — validate-fix-retry loop: caller supplies a `Validator` and `AdjustFunc`; the loop escalates mode/intensity and retries up to N times.
142+
- **`filter.NewTOMLFilter` / `LoadTOMLFilterFile`** — full rtk 8-stage pipeline (strip_ansi, replace regex, match_output short-circuit, strip/keep_lines, truncate_lines, head/tail, max_lines, on_empty) as a `Filter` implementation, pluggable into the main pipeline.
143+
144+
### Earlier (2026-04-20)
145+
133146
Session 2026-04-20 closed the last gaps vs rtk 0.37.1 and caveman:
134147

135148
- **`tok commit-msg`** — read staged diff, emit Conventional Commits subject. Rule-based, no LLM.

0 commit comments

Comments
 (0)