Tok is a Go library (no CLI, no binary) that cuts LLM token costs by 60–90%
through prompt compression, output filtering, cost estimation, and secret
detection. It is consumed by hawk, eyrie, yaad, and any other Go program
that needs to keep LLM context windows lean.
┌─────────────────────────────────────────────────────────────────┐
│ Consumer Application │
│ hawk | eyrie | yaad | custom Go service │
└────────────────────────────┬────────────────────────────────────┘
│ import "github.com/GrayCodeAI/tok"
┌────────────────────────────▼────────────────────────────────────┐
│ tok package │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ Compress │ │ Estimate │ │ Cost / Rate-limit / │ │
│ │ (31-layer │ │ Tokens │ │ Secret detection │ │
│ │ pipeline) │ │ (BPE) │ │ (33 patterns) │ │
│ └──────┬───────┘ └──────┬───────┘ └──────────┬───────────┘ │
│ └────────────────┼─────────────────────┘ │
│ ▼ │
│ internal/filter (31 layers) + internal/core (BPE) │
│ + internal/secrets + internal/cache + internal/extract │
└─────────────────────────────────────────────────────────────────┘
// Compress runs the 31-layer pipeline (plus any opt-in post-stages) and
// returns the compressed text and per-stage stats. Safe to call with no
// options; sensible defaults apply.
func Compress(text string, opts ...Option) (string, Stats)
// EstimateTokens returns the estimated token count for text. BPE-backed
// when a model is supplied, heuristic otherwise.
func EstimateTokens(text string) int
func EstimateTokensForModel(text, model string) int
func EstimateTokensPrecise(text string) int
func EstimateTokensFast(text string) int
// Cost & pricing.
func GetModelPricing(model string) (ModelPricing, bool)
func RegisterModelPricing(model string, inputPer1K, outputPer1K float64)
func EstimateCostSavings(stats Stats, model string) float64
func ListModels() []string
// Secret detection.
type SecretDetector struct{ ... }
func NewSecretDetector() *SecretDetector
func DefaultSecretDetector() *SecretDetector
func IsSensitiveFilename(path string) (bool, secrets.FilenameMatch)
// Output extraction.
func ExtractJSON(text string) (string, bool)
func ExtractJSONArray(text string) (string, bool)
func ExtractAllJSON(text string) []string
func CompressJSON(text string, maxItems int) string
func CompressLog(text string) string
// Reusable compressor.
type Compressor struct{ ... }
func NewCompressor(opts ...Option) *Compressor
func (c *Compressor) Compress(text string) (string, Stats)
// Context-window optimizer.
type ContextOptimizer struct{ ... }
func NewContextOptimizer(opts ...Option) *ContextOptimizer
// Strategy advisor.
type CompressionAdvisor struct{ ... }
func NewCompressionAdvisor() *CompressionAdvisor
// Rate-limit / usage tracker.
type UsageTracker struct{ ... }
func NewUsageTracker(opts ...UsageOption) *UsageTracker
// Persistent gain tracker (SQLite).
type Tracker struct{ ... }
func NewTracker(ctx context.Context) (*Tracker, error)
func NewTrackerAt(path string) (*Tracker, error)tok.WithMode(tok.ModeFull) // Compression intensity
tok.WithBudget(10000) // Hard token budget on output
tok.WithTier(tok.TierCore) // Pipeline tier (surface/trim/extract/core/code/log/thread/adaptive)
tok.WithQuery("relevant context") // Query-aware compression
tok.WithModel("gpt-4o") // Enables cost calculation + BPE
tok.WithCodeAware("go") // Symbol-preserving guard for source code
tok.WithCustomFilters(rules) // Append user TOML regex rules
tok.WithPerplexityGuided(scorer, 0.4) // LLMLingua-style selective droptok.Minimal // Lightest pass — entropy + AST + budget
tok.Aggressive // Full pipeline, every layer flipped on
tok.Surface // Output filtering only (good for already-compressed text)
tok.Adaptive // Auto-detect content type, choose tier
tok.Code // Symbol-preserving, comment stripping, structure kept
tok.Log // Collapse repeated INFO/DEBUG runs, keep ERROR verbatimThe pipeline is a multi-stage compression engine. Each stage mutates the input
in place and updates the shared PipelineContext:
Input Text
│
▼
┌─────────────────────────────────────────────────────────┐
│ PipelineCoordinator │
│ ┌───────────────────────────────────────────────────┐ │
│ │ Content Type Detection → Adaptive Tier Selection │ │
│ └───────────────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────────┐ │
│ │ Pre (0-0.5) : QuantumLock (KV-cache align), │ │
│ │ Photon (image handling) │ │
│ │ Core (1-10) : Entropy, Perplexity, AST, │ │
│ │ Goal-Driven, Contrastive, N-gram, │ │
│ │ Evaluator-Heads, Gist, Hierarchical,│ │
│ │ Budget │ │
│ │ Sem. (11-20) : Compaction, Attribution, H2O, │ │
│ │ AttentionSink, MetaToken, │ │
│ │ SemanticChunk, SketchStore, │ │
│ │ LazyPruner, SemanticAnchor, │ │
│ │ AgentMemory │ │
│ │ Adv. (21-40) : MarginalInfoGain, NearDedup, │ │
│ │ CoTCompress, DiffAdapt, EPiC, │ │
│ │ GraphCoT, and ~15 more │ │
│ │ Spec.(41-50) : ContextCrunch, SearchCrunch, │ │
│ │ AdaptiveLearning (5K+ token input) │ │
│ └───────────────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────────┐ │
│ │ Quality Guardrails → Output Validation │ │
│ └───────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
│
▼
Compressed Text + Stats
| Tier | Layers | Purpose | Auto-Enabled |
|---|---|---|---|
| Pre | 0-0.5 | QuantumLock, Photon | Always |
| Core | 1-10 | Entropy, Perplexity, AST, Goal-Driven, Contrastive, N-gram, Evaluator, Gist, Hierarchical, Budget | Always |
| Semantic | 11-20 | Compaction, Attribution, H2O, AttentionSink, MetaToken, SemanticChunk, SketchStore, LazyPruner, SemanticAnchor, AgentMemory | Always |
| Advanced | 21-40 | 20 research-based layers (MarginalInfoGain, NearDedup, CoTCompress, DiffAdapt, EPiC, GraphCoT, etc.) | Auto for large inputs |
| Specialized | 41-50 | Experimental (ContextCrunch, SearchCrunch, AdaptiveLearning) | Auto for 5K+ tokens |
type Filter interface {
Name() string
Apply(input string, ctx *PipelineContext) (string, error)
}
// Optional interfaces for layer behavior control.
type EnableCheck interface {
Enabled(ctx *PipelineContext) bool
}
type ApplicabilityCheck interface {
Applicable(input string, ctx *PipelineContext) bool
}Layers communicate via PipelineContext:
type PipelineContext struct {
OriginalTokens int
CurrentTokens int
Budget int
Mode CompressionMode
Query string
LayerStats []LayerStat
SharedState map[string]interface{} // Inter-layer data
QualityScore float64 // Running quality metric
}| Package | Purpose | Key Files |
|---|---|---|
tok.go |
Public Compress + EstimateTokens entry points | Compress(), EstimateTokens* |
options.go |
Functional options + preset variables | WithMode(), WithBudget(), WithTier(), Minimal/Aggressive/Surface/Adaptive/Code/Log |
compressor.go |
Reusable Compressor (caches pipeline) |
Compressor, NewCompressor |
stream.go |
Streaming compression (delta-only) | StreamCompressor |
optimizer.go |
Token-budget context optimizer | ContextOptimizer, Greedy/Balanced/PriorityOptimize |
chunker.go |
Source-code chunking (130+ language map) | ChunkCode, RegisterChunker |
advisor.go |
Strategy recommender + content classifier | CompressionAdvisor, ClassifyContent |
ratelimit.go |
Usage tracker w/ thresholds | UsageTracker, FormatUsageBar |
secrets.go |
Secret detection facade (33 patterns internally) | SecretDetector, IsSensitiveFilename |
tracker.go |
Persistent gain tracker (SQLite/WAL) | Tracker, NewTrackerAt |
entropy.go |
Shannon-entropy helpers | ShannonEntropy, IsHighEntropy |
extract.go |
Brace-balanced JSON extraction | ExtractJSON* |
jsoncrunch.go |
JSON array sampler | CompressJSON |
logcrunch.go |
Log-line level detector + run collapse | CompressLog |
profile.go |
Named/versioned compression profiles (TOML) | LoadProfile, BuiltinProfile* |
filters.go |
Custom regex filter DSL (TOML) | LoadFilterRules, CustomFilter |
codeaware.go |
Symbol-preserving code guard | WithCodeAware, codeProtector |
perplexity.go |
LLMLingua-style selective drop | WithPerplexityGuided |
mcp/server.go |
MCP server with real count_tokens, estimate_cost, compress_text, redact_secrets tools |
NewTokServer |
internal/filter/ |
Pipeline engine — 31 layers + tier configs + presets | pipeline_*.go, presets.go, tier_config.go |
internal/core/ |
BPE tokenizer, batch processor, runner | estimator.go, cost.go |
internal/cache/ |
Multi-level cache with git-aware watcher | cache.go, git_watcher.go |
internal/extract/ |
Brace-balanced JSON extraction impl | extract.go |
internal/fastops/ |
SIMD-accelerated primitives | simd_amd64.go, simd_amd64.s |
internal/secrets/ |
33 secret regex patterns + filename detector | secrets.go, filename.go |
internal/tracking/ |
SQLite-backed gain tracker | tracking.go |
internal/utils/ |
slog adapter, helpers | logger.go |
filters/ |
80 per-tool TOML filter configs (jest, eslint, go, kubectl, terraform, etc.) | one TOML per tool |
commands/ |
6 TOML agent-command definitions (pr-review, tok-commit, tok-compress, tok-help, tok-review, tok) | one TOML per command |
config/ |
Example TOML + tokman.yaml | example.toml |
rules/ |
ast-grep no-fmt-println rule + tok agent-activation prompt |
no-fmt-println.yaml, tok-activate.md |
skills/ |
5 Claude-style agent skills (tok, tok-commit, tok-compress, tok-help, tok-review) |
SKILL.md per skill |
benchmarks/ |
Benchmark harness (run.sh + results.md template) | run.sh |
evals/ |
Prompt-compression eval | pipeline-bench.sh, prompts/en.txt |
types/ |
Cross-eco exported types (mirrors hawk's shared/types/) |
finding.go, severity.go |
1. Consumer calls tok.Compress(text, opts...)
2. Options parsed (mode, budget, tier, query, model, code-aware, custom rules)
3. Content type detected (code, log, markdown, data, etc.)
4. Adaptive tier selection based on input size
5. PipelineCoordinator created (from sync.Pool for reuse)
6. Layers executed sequentially:
a. Each layer receives input + PipelineContext
b. Layer transforms text (remove, compress, restructure)
c. PipelineContext updated (tokens saved, quality score)
d. Early exit if budget met
7. Optional post-stages: perplexity-guided drop → custom TOML rules
8. Quality guardrails validate output (no accidental whitespace/structure loss)
9. Stats computed: originalTokens, finalTokens, tokensSaved, reductionPct, cost
10. Result returned (compressed text + stats)
1. Consumer calls det := tok.NewSecretDetector()
2. det.DetectSecrets(text) iterates the 33-pattern registry
3. Each pattern: compiled regex; on match, record (type, span, value)
4. det.RedactSecrets(text) replaces matches with [REDACTED:<type>]
5. Optional: DetectAndRedactWithEntropy(text, threshold) adds Shannon-entropy
pass to catch high-entropy blobs the regex table misses
1. Consumer calls tok.GetModelPricing(model) → ModelPricing
(returns zero-value + false for unknown models; consumer may call
tok.RegisterModelPricing to add custom entries)
2. Cost = (inputTokens/1000)*InputPricePer1K + (outputTokens/1000)*OutputPricePer1K
3. For compression savings: tok.EstimateCostSavings(stats, model)
conservatively assumes saved tokens would have been input tokens
coordinator_pool.go reuses pipeline coordinators via sync.Pool for a
10–20× speedup over per-call NewCompressor() construction.
var coordinatorPool = sync.Pool{
New: func() interface{} { return filter.NewPipelineCoordinator() },
}internal/fastops/ provides SIMD-accelerated string operations on amd64:
| Operation | Generic | SIMD (AVX2) | Speedup |
|---|---|---|---|
| ANSI stripping | 100ns | 30ns | 3.3x |
| Whitespace norm | 80ns | 25ns | 3.2x |
| Char counting | 60ns | 20ns | 3.0x |
Build tag: simd_avx2 (auto-detected at runtime).
Two modes:
- Heuristic (
EstimateTokensFast): ~0.3 ns/op, character-based estimate - BPE (
EstimateTokensPrecise/EstimateTokensForModel): ~2 ns/op, tiktoken-compatible (cl100k, o200k, p50k, r50k encodings)
internal/core/estimator.go uses a 64-shard sharded LRU token cache for BPE
counts (FNV-64a keyed, atomic hit counter).
internal/filter/bytepool.go provides BytePool + FastStringBuilder to
reduce GC pressure on hot paths.
80 per-tool TOML filter configs in filters/ (one per CLI tool: jest, eslint,
go, kubectl, terraform, vitest, playwright, aws, swift, etc.). Each declares
which pipeline layers to run and a per-tool token budget. Loaded via
tok.LoadFilterRules and applied via WithCustomFilters.
# filters/jest.toml
[[rule]]
name = "strip-ansi"
pattern = '\x1b\[[0-9;]*m'
replacement = ""
[[rule]]
name = "collapse-blank-lines"
pattern = '\n{3,}'
replacement = '\n\n'Pattern + entropy-based detection across 33 patterns:
type SecretDetector struct {
patterns []*regexp.Regexp // 33 secret formats
entropy EntropyAnalyzer // optional Shannon-entropy pass
}
func (d *SecretDetector) DetectSecrets(text string) []SecretMatch
func (d *SecretDetector) RedactSecrets(text string) string
func (d *SecretDetector) DetectAndRedactWithEntropy(text string, threshold float64) stringSupported patterns include: AWS access keys, GitHub PATs, Slack tokens, Google API keys, Stripe keys, OpenAI/Anthropic keys, JWTs, RSA/EC/OpenSSH private keys, SendGrid, Twilio, Heroku, DigitalOcean, npm, PyPI, Docker registry, generic API keys, passwords, DB connection strings, Bearer tokens.
tok.IsSensitiveFilename complements content scanning with a 3-layer
filename detector (exact basename, sensitive directory, name token) for
.env, id_rsa, /home/*/.ssh/..., etc.
- Language: Go 1.26+, zero CGO
- Type: Library — no binary, no CLI (
.goreleaser.ymlships source archive + SPDX SBOM only) - Distribution:
go get github.com/GrayCodeAI/tok - Versioning:
VERSIONfile is the single source of truth; embedded via//go:embedinversion.go; bumped by release-please from Conventional Commits - CI: 3 workflows (
ci.ymlfor fmt/vet/lint/test/security,release.ymlfor GoReleaser,scorecard.ymlfor OpenSSF) - Coverage gate: 60% (codecov)
The consumer-facing CLI is hawk tok ..., which embeds this library.