✂️ tok Architecture

Tokenizer, Compressor & Secrets Scanner for AI Agents

🎯 Overview

tok is a tokenizer, compression, secrets scanning, and rate limiting library for AI coding agents. It reduces LLM token costs by 60–90% through input compression, output filtering, and transparent command rewriting.

💡 Pure Go library — no network service, no CLI required.

🧱 Components

tok/
├── api/openapi.yaml          📜 Library API surface reference
├── tok.go                    📤 Public API: Compress(), EstimateTokens()
├── compressor.go             🔄 Reusable Compressor struct
├── options.go                ⚙️ Option, Mode, Tier, With* functions, presets
├── secrets.go                🔒 SecretDetector, DetectSecrets(), RedactSecrets()
├── stats.go                  📊 Stats returned from Compress()
├── stream.go                 📡 Stream processing
└── internal/
    ├── core/                 🧮 BPE tokenizer, token estimation
    ├── filter/               🔧 31-layer filter pipeline, tier configs
    ├── codeaware/            💻 Language-specific compression rules
    ├── secrets/              🔑 Regex patterns, entropy analysis, allowlists
    ├── cache/                💾 Compression result caching
    ├── fastops/              ⚡ Performance-critical operations
    └── config/               ⚙️ Configuration management

📤 Public API

// 🗜️ One-shot compression
compressed, stats, err := tok.Compress(text,
    tok.WithTier(tok.TierCode),
    tok.WithBudget(4000),
    tok.WithQuery("implement OAuth flow"),
)

// 🔄 Reusable compressor (caches tokenizer state)
c := tok.NewCompressor(tok.Aggressive)
compressed, stats, err := c.Compress(text)

// 📊 Token estimation
approx  := tok.EstimateTokens(text)         // fast, ±5%
precise := tok.EstimateTokensPrecise(text)  // BPE-accurate

// 🧮 Warmup (call at startup to avoid first-call latency)
tok.WarmupTokenizer()

// 🔒 Secret detection
matches  := tok.DefaultSecretDetector().DetectSecrets(text)
redacted := tok.DefaultSecretDetector().RedactSecrets(text)

📊 Compression Tiers

Tier	Description	Savings
🟢 `TierSurface`	Light deduplication	~10%
🟡 `TierTrim`	Whitespace + comments	~20%
🟠 `TierExtract`	Key information extraction	~35%
🔵 `TierCode`	Code-aware compression	~45%
🔴 `TierCore`	Semantic core extraction	~55%
🟣 `TierLog`	Log file optimization	~70%
⚡ `TierAdaptive`	Adaptive per content type	varies

🔒 Secret Detection

Strategy	Description
🔑 Pattern-based	Regex for API keys, JWTs, connection strings, SSH keys
📊 Entropy-based	Shannon entropy analysis (threshold: 4.5)
📋 Allowlists	Prevent false positives on known-safe patterns

🔗 Ecosystem Usage

Consumer	Usage
🦅 hawk	Context window management
🦅 eyrie	Response compression
🧠 yaad	Token budget enforcement in recall

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✂️ tok Architecture

🎯 Overview

🧱 Components

📤 Public API

📊 Compression Tiers

🔒 Secret Detection

🔗 Ecosystem Usage

FilesExpand file tree

architecture.md

Latest commit

History

architecture.md

File metadata and controls

✂️ tok Architecture

🎯 Overview

🧱 Components

📤 Public API

📊 Compression Tiers

🔒 Secret Detection

🔗 Ecosystem Usage