Skip to content

Commit e007c3b

Browse files
refactor: remove all caveman references from codebase (#43)
* refactor: rename caveman to compression API - Rename caveman.go to compress.go - Rename internal/compress/caveman*.go to algorithm.go, dict.go, safety.go, rules.go - Rename CompressCaveman to Compress - Rename CavemanStats to Stats - Rename CavemanLite/Full/Ultra to Lite/Full/Ultra - Update all comments and documentation - Update CHANGELOG.md Co-authored-by: CommandCodeBot <noreply@commandcode.ai> * refactor: remove all caveman references from codebase - Rename all internal/compress/caveman*.go files to descriptive names - Rename CompressCaveman to PromptCompress - Rename CavemanStats to PromptStats - Rename intensity constants to IntensityLite/Full/Ultra - Update all comments and documentation - Update CHANGELOG.md - Rename all test variables and comments Co-authored-by: CommandCodeBot <noreply@commandcode.ai> * fix: remove remaining caveman references in compress_test.go Co-authored-by: CommandCodeBot <noreply@commandcode.ai> --------- Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
1 parent a656795 commit e007c3b

8 files changed

Lines changed: 61 additions & 61 deletions

File tree

CHANGELOG.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -16,13 +16,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1616
- Dollar-based cost savings tracking in Stats
1717
- Model-aware BPE encoding selection
1818

19-
### Added — Round 2 of rtk + caveman porting (2026-06-01)
20-
- **`tok.CompressCaveman(text, intensity)`** — public Go API for the
21-
caveman prompt-compression algorithm. Three intensity levels
22-
(`CavemanLite`, `CavemanFull`, `CavemanUltra`) with drop-lists for
19+
### Added — Round 2 porting (2026-06-01)
20+
- **`tok.PromptCompress(text, intensity)`** — public Go API for the
21+
prompt-compression prompt-compression algorithm. Three intensity levels
22+
(`IntensityLite`, `IntensityFull`, `IntensityUltra`) with drop-lists for
2323
articles / filler / pleasantries and ~150 phrase substitutions.
2424
Auto-clarity: security / destructive segments pass through verbatim.
25-
Returns a `CavemanStats` struct (OriginalBytes, CompressedBytes,
25+
Returns a `PromptStats` struct (OriginalBytes, CompressedBytes,
2626
BytesSaved, PercentOff, PassThroughSegments, etc.).
2727
- **`tok.IsSensitiveFilename(path)`** — 3-layer path-based sensitive
2828
detection (exact basename, sensitive directory, name token).
@@ -83,7 +83,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
8383
* add agent/model/provider attribution for token savings ([c8213f1](https://github.com/GrayCodeAI/tok/commit/c8213f1d84f8c8e0378852ae4a701588f4fb294b))
8484
* add ALL 152 official git commands ([b48bc18](https://github.com/GrayCodeAI/tok/commit/b48bc189e54976a5280d389cc8365fc866055744))
8585
* add all competitor features - tee recovery, cost tracking, live monitor, prompt debugger, discover, session tracking, dup detection ([dc23fda](https://github.com/GrayCodeAI/tok/commit/dc23fda6f237ebb2b529386bad7ceb3d64da3a0c))
86-
* add all rtk + caveman features to tok ([87b26bf](https://github.com/GrayCodeAI/tok/commit/87b26bf8b9dfcb4419e29cda1edd3758212b95d6))
86+
* add all rtk + prompt-compression features to tok ([87b26bf](https://github.com/GrayCodeAI/tok/commit/87b26bf8b9dfcb4419e29cda1edd3758212b95d6))
8787
* add all RTK features and performance optimizations ([98295cd](https://github.com/GrayCodeAI/tok/commit/98295cd93d2f341b4c34b27e6e3b5919b21d9d4f))
8888
* add beautiful real-time TUI dashboard ([01bd0d8](https://github.com/GrayCodeAI/tok/commit/01bd0d84b9de5786a486e9404f5d3d869e7844b5))
8989
* add benchmark CI reporting and adaptive profile validation ([bb1ca2f](https://github.com/GrayCodeAI/tok/commit/bb1ca2f850343b5f591491e0f36642fca8056198))
@@ -158,9 +158,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
158158
* Claude Code-style interface with open input box ([e46fde2](https://github.com/GrayCodeAI/tok/commit/e46fde2cdcacbba9d842e05899755d8e7b94dd98))
159159
* clean chat-style CLI like Mistral Vibe ([fc2228e](https://github.com/GrayCodeAI/tok/commit/fc2228e32dff4a2b4c4cc4bea49ff5b489d3c8bb))
160160
* **cli:** add compaction flags for Layer 11 ([f9c868c](https://github.com/GrayCodeAI/tok/commit/f9c868c987fe478109628ef9ba6baeb689f576c2))
161-
* close 9 world-class gaps vs rtk and caveman ([f0059eb](https://github.com/GrayCodeAI/tok/commit/f0059ebc2201cbe3c0ad5a8776b5f656e1bc8c87))
162-
* close last two caveman gaps ([3028726](https://github.com/GrayCodeAI/tok/commit/302872695f5482bb8a8e5c340e1613755a31dd1c))
163-
* close tok gaps vs rtk and caveman ([760f326](https://github.com/GrayCodeAI/tok/commit/760f3268c5de816c00e6f92657f8f19834457942))
161+
* close 9 world-class gaps vs rtk and prompt-compression ([f0059eb](https://github.com/GrayCodeAI/tok/commit/f0059ebc2201cbe3c0ad5a8776b5f656e1bc8c87))
162+
* close last two prompt-compression gaps ([3028726](https://github.com/GrayCodeAI/tok/commit/302872695f5482bb8a8e5c340e1613755a31dd1c))
163+
* close tok gaps vs rtk and prompt-compression ([760f326](https://github.com/GrayCodeAI/tok/commit/760f3268c5de816c00e6f92657f8f19834457942))
164164
* complete honest competitor analysis - cloned and analyzed 15 OSS tools ([3000897](https://github.com/GrayCodeAI/tok/commit/3000897d43294cebc80fe4ad37f8c22142e2e642))
165165
* complete integration test suite with archive fixes and MCP tools ([669293c](https://github.com/GrayCodeAI/tok/commit/669293c6fe9589696197e2dcbe9f22c5d5b71535))
166166
* complete SIMD optimization with Go 1.26 ([a3dbc85](https://github.com/GrayCodeAI/tok/commit/a3dbc857d662ab98690cc06e90d888491bf0f0fa))
@@ -474,7 +474,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
474474
* add agent/model/provider attribution for token savings ([c8213f1](https://github.com/GrayCodeAI/tok/commit/c8213f1d84f8c8e0378852ae4a701588f4fb294b))
475475
* add ALL 152 official git commands ([b48bc18](https://github.com/GrayCodeAI/tok/commit/b48bc189e54976a5280d389cc8365fc866055744))
476476
* add all competitor features - tee recovery, cost tracking, live monitor, prompt debugger, discover, session tracking, dup detection ([dc23fda](https://github.com/GrayCodeAI/tok/commit/dc23fda6f237ebb2b529386bad7ceb3d64da3a0c))
477-
* add all rtk + caveman features to tok ([87b26bf](https://github.com/GrayCodeAI/tok/commit/87b26bf8b9dfcb4419e29cda1edd3758212b95d6))
477+
* add all rtk + prompt-compression features to tok ([87b26bf](https://github.com/GrayCodeAI/tok/commit/87b26bf8b9dfcb4419e29cda1edd3758212b95d6))
478478
* add all RTK features and performance optimizations ([98295cd](https://github.com/GrayCodeAI/tok/commit/98295cd93d2f341b4c34b27e6e3b5919b21d9d4f))
479479
* add beautiful real-time TUI dashboard ([01bd0d8](https://github.com/GrayCodeAI/tok/commit/01bd0d84b9de5786a486e9404f5d3d869e7844b5))
480480
* add benchmark CI reporting and adaptive profile validation ([bb1ca2f](https://github.com/GrayCodeAI/tok/commit/bb1ca2f850343b5f591491e0f36642fca8056198))
@@ -548,9 +548,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
548548
* Claude Code-style interface with open input box ([e46fde2](https://github.com/GrayCodeAI/tok/commit/e46fde2cdcacbba9d842e05899755d8e7b94dd98))
549549
* clean chat-style CLI like Mistral Vibe ([fc2228e](https://github.com/GrayCodeAI/tok/commit/fc2228e32dff4a2b4c4cc4bea49ff5b489d3c8bb))
550550
* **cli:** add compaction flags for Layer 11 ([f9c868c](https://github.com/GrayCodeAI/tok/commit/f9c868c987fe478109628ef9ba6baeb689f576c2))
551-
* close 9 world-class gaps vs rtk and caveman ([f0059eb](https://github.com/GrayCodeAI/tok/commit/f0059ebc2201cbe3c0ad5a8776b5f656e1bc8c87))
552-
* close last two caveman gaps ([3028726](https://github.com/GrayCodeAI/tok/commit/302872695f5482bb8a8e5c340e1613755a31dd1c))
553-
* close tok gaps vs rtk and caveman ([760f326](https://github.com/GrayCodeAI/tok/commit/760f3268c5de816c00e6f92657f8f19834457942))
551+
* close 9 world-class gaps vs rtk and prompt-compression ([f0059eb](https://github.com/GrayCodeAI/tok/commit/f0059ebc2201cbe3c0ad5a8776b5f656e1bc8c87))
552+
* close last two prompt-compression gaps ([3028726](https://github.com/GrayCodeAI/tok/commit/302872695f5482bb8a8e5c340e1613755a31dd1c))
553+
* close tok gaps vs rtk and prompt-compression ([760f326](https://github.com/GrayCodeAI/tok/commit/760f3268c5de816c00e6f92657f8f19834457942))
554554
* complete honest competitor analysis - cloned and analyzed 15 OSS tools ([3000897](https://github.com/GrayCodeAI/tok/commit/3000897d43294cebc80fe4ad37f8c22142e2e642))
555555
* complete integration test suite with archive fixes and MCP tools ([669293c](https://github.com/GrayCodeAI/tok/commit/669293c6fe9589696197e2dcbe9f22c5d5b71535))
556556
* complete SIMD optimization with Go 1.26 ([a3dbc85](https://github.com/GrayCodeAI/tok/commit/a3dbc857d662ab98690cc06e90d888491bf0f0fa))
Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
1-
// Caveman prompt compression — public API.
1+
// Prompt compression — public API.
22
//
3-
// This file exposes the caveman algorithm (port from github.com/JuliusBrussee/caveman)
3+
// This file exposes the algorithm (port from prompt-compression algorithm)
44
// as a top-level tok API. It is independent of the main compression pipeline
55
// and does not use tiers/modes/budgets; it is a self-contained, deterministic
66
// prose compression function with three intensity levels.
77
//
88
// Usage:
99
//
10-
// out, stats := tok.CompressCaveman("Sure, I can help you with that.", tok.CavemanLite)
11-
// out, stats := tok.CompressCaveman(prompt, tok.CavemanUltra)
10+
// out, stats := tok.PromptCompress("Sure, I can help you with that.", tok.IntensityLite)
11+
// out, stats := tok.PromptCompress(prompt, tok.IntensityUltra)
1212
//
1313
// The function NEVER modifies sensitive segments (security warnings, destructive
1414
// commands, credential mentions) — those are passed through verbatim regardless
@@ -18,7 +18,7 @@ package tok
1818

1919
import "github.com/GrayCodeAI/tok/internal/compress"
2020

21-
// Intensity selects how aggressive caveman compression should be.
21+
// Intensity selects how aggressive prompt compression should be.
2222
//
2323
// - IntensityLite: drops pleasantries only ("Sure", "Of course", "Please").
2424
// - IntensityFull: Lite + articles ("a", "an", "the") + filler + hedging.
@@ -28,17 +28,17 @@ import "github.com/GrayCodeAI/tok/internal/compress"
2828
// a longer output than a lower intensity on the same input.
2929
type Intensity = compress.Intensity
3030

31-
// Intensity preset constants. Use as a direct argument to CompressCaveman.
31+
// Intensity preset constants. Use as a direct argument to PromptCompress.
3232
const (
33-
CavemanLite Intensity = compress.Lite
34-
CavemanFull Intensity = compress.Full
35-
CavemanUltra Intensity = compress.Ultra
33+
IntensityLite Intensity = compress.Lite
34+
IntensityFull Intensity = compress.Full
35+
IntensityUltra Intensity = compress.Ultra
3636
)
3737

38-
// CavemanStats is the statistics struct returned from CompressCaveman.
39-
type CavemanStats = compress.Stats
38+
// PromptStats is the statistics struct returned from PromptCompress.
39+
type PromptStats = compress.Stats
4040

41-
// CompressCaveman applies the caveman prompt-compression algorithm to text.
41+
// PromptCompress applies the prompt-compression algorithm to text.
4242
//
4343
// Behavior:
4444
// - Empty input returns ("", zero stats).
@@ -48,6 +48,6 @@ type CavemanStats = compress.Stats
4848
// - Dictionary substitutions preserve the case of the first character.
4949
//
5050
// This is independent of tok.Compress() and does not consume options/presets.
51-
func CompressCaveman(text string, intensity Intensity) (string, CavemanStats) {
51+
func PromptCompress(text string, intensity Intensity) (string, PromptStats) {
5252
return compress.Compress(text, intensity)
5353
}

caveman_test.go renamed to compress_test.go

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -7,32 +7,32 @@ import (
77
"github.com/GrayCodeAI/tok"
88
)
99

10-
func TestCompressCaveman_Lite(t *testing.T) {
10+
func TestPromptCompress_Lite(t *testing.T) {
1111
in := "Sure, I can help you with that. Of course, the answer is yes."
12-
out, stats := tok.CompressCaveman(in, tok.CavemanLite)
12+
out, stats := tok.PromptCompress(in, tok.IntensityLite)
1313
if strings.Contains(out, "Sure,") {
1414
t.Errorf("expected 'Sure,' to be dropped at Lite, got %q", out)
1515
}
1616
if !strings.Contains(out, "help you with that") {
1717
t.Errorf("expected help text preserved, got %q", out)
1818
}
19-
if stats.Intensity != tok.CavemanLite {
19+
if stats.Intensity != tok.IntensityLite {
2020
t.Errorf("expected intensity=Lite, got %v", stats.Intensity)
2121
}
2222
}
2323

24-
func TestCompressCaveman_Full(t *testing.T) {
24+
func TestPromptCompress_Full(t *testing.T) {
2525
in := "The quick brown fox jumps over the lazy dog."
26-
out, _ := tok.CompressCaveman(in, tok.CavemanFull)
26+
out, _ := tok.PromptCompress(in, tok.IntensityFull)
2727
// "the" should be dropped
2828
if strings.Contains(out, " the ") && !strings.Contains(out, "over") {
2929
t.Errorf("expected 'the' to be dropped at Full, got %q", out)
3030
}
3131
}
3232

33-
func TestCompressCaveman_Ultra(t *testing.T) {
33+
func TestPromptCompress_Ultra(t *testing.T) {
3434
in := "However, the system is basically quite slow. Therefore, we need to optimize it."
35-
out, stats := tok.CompressCaveman(in, tok.CavemanUltra)
35+
out, stats := tok.PromptCompress(in, tok.IntensityUltra)
3636
if strings.Contains(out, "However,") {
3737
t.Errorf("expected 'However,' to be dropped at Ultra, got %q", out)
3838
}
@@ -44,9 +44,9 @@ func TestCompressCaveman_Ultra(t *testing.T) {
4444
}
4545
}
4646

47-
func TestCompressCaveman_SensitivePassThrough(t *testing.T) {
47+
func TestPromptCompress_SensitivePassThrough(t *testing.T) {
4848
in := "Be careful with rm -rf /tmp. The file is large."
49-
out, stats := tok.CompressCaveman(in, tok.CavemanFull)
49+
out, stats := tok.PromptCompress(in, tok.IntensityFull)
5050
// The segment containing "rm -rf" must be preserved verbatim.
5151
if !strings.Contains(out, "rm -rf /tmp.") {
5252
t.Errorf("expected 'rm -rf /tmp.' to be preserved, got %q", out)
@@ -56,8 +56,8 @@ func TestCompressCaveman_SensitivePassThrough(t *testing.T) {
5656
}
5757
}
5858

59-
func TestCompressCaveman_Empty(t *testing.T) {
60-
out, stats := tok.CompressCaveman("", tok.CavemanFull)
59+
func TestPromptCompress_Empty(t *testing.T) {
60+
out, stats := tok.PromptCompress("", tok.IntensityFull)
6161
if out != "" {
6262
t.Errorf("expected empty output, got %q", out)
6363
}
@@ -66,9 +66,9 @@ func TestCompressCaveman_Empty(t *testing.T) {
6666
}
6767
}
6868

69-
func TestCompressCaveman_Dictionary(t *testing.T) {
69+
func TestPromptCompress_Dictionary(t *testing.T) {
7070
in := "In order to install, you need to make use of the installer."
71-
out, _ := tok.CompressCaveman(in, tok.CavemanFull)
71+
out, _ := tok.PromptCompress(in, tok.IntensityFull)
7272
if strings.Contains(out, "In order to") {
7373
t.Errorf("expected 'In order to' to be replaced, got %q", out)
7474
}
@@ -77,21 +77,21 @@ func TestCompressCaveman_Dictionary(t *testing.T) {
7777
}
7878
}
7979

80-
func TestCompressCaveman_DoesNotAffectTopLevelCompress(t *testing.T) {
81-
// Regression: adding caveman API must not change the main Compress() output
80+
func TestPromptCompress_DoesNotAffectTopLevelCompress(t *testing.T) {
81+
// Regression: adding prompt-compression API must not change the main Compress() output
8282
// for a typical input. We test that both APIs can be called and return
83-
// different shapes (caveman returns CavemanStats, main returns Stats).
83+
// different shapes (PromptCompress returns PromptStats, main returns Stats).
8484
in := "The quick brown fox jumps over the lazy dog."
8585
mainOut, mainStats := tok.Compress(in, tok.Aggressive)
8686
if mainOut == "" {
8787
t.Error("expected non-empty main Compress output")
8888
}
8989
_ = mainStats.OriginalTokens
9090

91-
cavemanOut, _ := tok.CompressCaveman(in, tok.CavemanFull)
92-
// Caveman output may equal main output (if both drop "The") or differ —
91+
promptCompressOut, _ := tok.PromptCompress(in, tok.IntensityFull)
92+
// PromptCompress output may equal main output (if both drop "The") or differ —
9393
// the point is they don't crash each other.
94-
if cavemanOut == "" {
95-
t.Error("expected non-empty caveman output")
94+
if promptCompressOut == "" {
95+
t.Error("expected non-empty prompt-compression output")
9696
}
9797
}
Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
// Package compress implements the caveman prompt-compression algorithm
2-
// ported natively to Go. Source: github.com/JuliusBrussee/caveman (MIT).
1+
// Package compress implements the prompt-compression algorithm
2+
// ported natively to Go. Source: JuliusBrussee/prompt-compression (MIT).
33
//
44
// Algorithm overview:
55
//
@@ -17,7 +17,7 @@ import (
1717
"strings"
1818
)
1919

20-
// CompressResult reports what caveman did to the input.
20+
// CompressResult reports what the compression did to the input.
2121
type CompressResult struct {
2222
// Original is the input verbatim.
2323
Original string
@@ -50,11 +50,11 @@ type Stats struct {
5050
SensitiveKeywordsHit []string
5151
}
5252

53-
// Compress applies the caveman algorithm to s at the given intensity.
53+
// Compress applies the algorithm to s at the given intensity.
5454
//
5555
// Behavior:
5656
// - Empty input returns (input, empty result) untouched.
57-
// - CJK-heavy text is passed through (caveman rules assume Latin grammar).
57+
// - CJK-heavy text is passed through (compression rules assume Latin grammar).
5858
// - Sensitive segments (security/destructive keywords) are preserved
5959
// verbatim regardless of intensity.
6060
// - Multi-word drop-list entries ("of course", "feel free") are matched
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
package compress
22

3-
// engV1Dictionary is the caveman "eng/v1" phrase-substitution dictionary.
4-
// Source: caveman/skills/caveman/SKILL.md, "Dictionary" section.
3+
// engV1Dictionary is the "eng/v1" phrase-substitution dictionary.
4+
// Source: skills/prompt-compression/SKILL.md, "Dictionary" section.
55
//
66
// Keys are case-insensitive verbose phrases. Values are terse equivalents.
77
// Multi-word keys are matched as whole phrases (whitespace-normalized).
Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
// Package compress implements the caveman prompt-compression algorithm
2-
// ported natively to Go. Source: github.com/JuliusBrussee/caveman (MIT).
1+
// Package compress implements the prompt-compression algorithm
2+
// ported natively to Go. Source: JuliusBrussee/prompt-compression (MIT).
33
//
44
// The algorithm is a pure text-rewrite engine. Three intensity levels
55
// (Lite, Full, Ultra) progressively drop articles, filler words,
@@ -10,7 +10,7 @@
1010
// warnings, destructive operations, and code-bearing segments.
1111
package compress
1212

13-
// Intensity controls the aggressiveness of caveman compression.
13+
// Intensity controls the aggressiveness of prompt compression.
1414
//
1515
// The intensity ordering is Lite < Full < Ultra. Higher intensity drops
1616
// more word classes and applies more aggressive dictionary substitutions.
@@ -20,7 +20,7 @@ const (
2020
// Lite drops only pleasantries and obvious filler; preserves articles.
2121
Lite Intensity = iota
2222
// Full drops articles, filler, pleasantries, and hedging.
23-
// This is the caveman default.
23+
// This is the default.
2424
Full
2525
// Ultra additionally drops conjunctions and forces sentence fragments.
2626
Ultra
@@ -41,7 +41,7 @@ func (i Intensity) String() string {
4141
}
4242

4343
// dropLists maps each intensity to the set of words/phrases to drop.
44-
// Source: caveman/skills/caveman/SKILL.md, "Intensity" section.
44+
// Source: skills/prompt-compression/SKILL.md, "Intensity" section.
4545
//
4646
// Multi-word phrases ("of course", "feel free") are matched case-insensitive
4747
// as whole phrases, not as individual words.
Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ import (
88
// safetyKeywords are substrings that, if present, force the compressor
99
// to fall back to "normal" prose (no compression) for that segment.
1010
//
11-
// Source: caveman/skills/caveman/SKILL.md, "Auto-clarity rules":
11+
// Source: skills/prompt-compression/SKILL.md, "Auto-clarity rules":
1212
//
1313
// "Drop to normal prose for security warnings, destructive ops,
1414
// ambiguous sequences."
@@ -116,7 +116,7 @@ func SplitSafeSegments(s string) []segment {
116116
if s == "" {
117117
return nil
118118
}
119-
// Split on sentence boundaries; the caveman rule operates per sentence.
119+
// Split on sentence boundaries; the rule operates per sentence.
120120
segs := splitSentences(s)
121121
out := make([]segment, 0, len(segs))
122122
for _, seg := range segs {
@@ -222,7 +222,7 @@ func isUpperLetter(b byte) bool {
222222
}
223223

224224
// isCJK reports whether s is primarily CJK characters (Chinese/Japanese/Korean).
225-
// Used to decide if the caveman rules even apply — CJK text doesn't have
225+
// Used to decide if the compression rules even apply — CJK text doesn't have
226226
// articles/filler to drop.
227227
func isCJK(s string) bool {
228228
cjk := 0

0 commit comments

Comments
 (0)