Tokenizer, Compressor & Secrets Scanner for AI Agents

tok is a tokenizer, compression, secrets scanning, and rate limiting library for AI coding agents. It reduces LLM token costs by 60–90% through input compression, output filtering, and transparent command rewriting.
💡 Pure Go library — no network service, no CLI required.
tok/
├── api/openapi.yaml 📜 Library API surface reference
├── tok.go 📤 Public API: Compress(), EstimateTokens()
├── compressor.go 🔄 Reusable Compressor struct
├── options.go ⚙️ Option, Mode, Tier, With* functions, presets
├── secrets.go 🔒 SecretDetector, DetectSecrets(), RedactSecrets()
├── stats.go 📊 Stats returned from Compress()
├── stream.go 📡 Stream processing
└── internal/
├── core/ 🧮 BPE tokenizer, token estimation
├── filter/ 🔧 31-layer filter pipeline, tier configs
├── codeaware/ 💻 Language-specific compression rules
├── secrets/ 🔑 Regex patterns, entropy analysis, allowlists
├── cache/ 💾 Compression result caching
├── fastops/ ⚡ Performance-critical operations
└── config/ ⚙️ Configuration management
// 🗜️ One-shot compression
compressed, stats, err := tok.Compress(text,
tok.WithTier(tok.TierCode),
tok.WithBudget(4000),
tok.WithQuery("implement OAuth flow"),
)
// 🔄 Reusable compressor (caches tokenizer state)
c := tok.NewCompressor(tok.Aggressive)
compressed, stats, err := c.Compress(text)
// 📊 Token estimation
approx := tok.EstimateTokens(text) // fast, ±5%
precise := tok.EstimateTokensPrecise(text) // BPE-accurate
// 🧮 Warmup (call at startup to avoid first-call latency)
tok.WarmupTokenizer()
// 🔒 Secret detection
matches := tok.DefaultSecretDetector().DetectSecrets(text)
redacted := tok.DefaultSecretDetector().RedactSecrets(text)
| Tier |
Description |
Savings |
🟢 TierSurface |
Light deduplication |
~10% |
🟡 TierTrim |
Whitespace + comments |
~20% |
🟠 TierExtract |
Key information extraction |
~35% |
🔵 TierCode |
Code-aware compression |
~45% |
🔴 TierCore |
Semantic core extraction |
~55% |
🟣 TierLog |
Log file optimization |
~70% |
⚡ TierAdaptive |
Adaptive per content type |
varies |
| Strategy |
Description |
| 🔑 Pattern-based |
Regex for API keys, JWTs, connection strings, SSH keys |
| 📊 Entropy-based |
Shannon entropy analysis (threshold: 4.5) |
| 📋 Allowlists |
Prevent false positives on known-safe patterns |
| Consumer |
Usage |
| 🦅 hawk |
Context window management |
| 🦅 eyrie |
Response compression |
| 🧠 yaad |
Token budget enforcement in recall |