Skip to content

Latest commit

 

History

History
103 lines (76 loc) · 3.33 KB

File metadata and controls

103 lines (76 loc) · 3.33 KB

✂️ tok Architecture

Tokenizer, Compressor & Secrets Scanner for AI Agents

Go Type


🎯 Overview

tok is a tokenizer, compression, secrets scanning, and rate limiting library for AI coding agents. It reduces LLM token costs by 60–90% through input compression, output filtering, and transparent command rewriting.

💡 Pure Go library — no network service, no CLI required.


🧱 Components

tok/
├── api/openapi.yaml          📜 Library API surface reference
├── tok.go                    📤 Public API: Compress(), EstimateTokens()
├── compressor.go             🔄 Reusable Compressor struct
├── options.go                ⚙️ Option, Mode, Tier, With* functions, presets
├── secrets.go                🔒 SecretDetector, DetectSecrets(), RedactSecrets()
├── stats.go                  📊 Stats returned from Compress()
├── stream.go                 📡 Stream processing
└── internal/
    ├── core/                 🧮 BPE tokenizer, token estimation
    ├── filter/               🔧 31-layer filter pipeline, tier configs
    ├── codeaware/            💻 Language-specific compression rules
    ├── secrets/              🔑 Regex patterns, entropy analysis, allowlists
    ├── cache/                💾 Compression result caching
    ├── fastops/              ⚡ Performance-critical operations
    └── config/               ⚙️ Configuration management

📤 Public API

// 🗜️ One-shot compression
compressed, stats, err := tok.Compress(text,
    tok.WithTier(tok.TierCode),
    tok.WithBudget(4000),
    tok.WithQuery("implement OAuth flow"),
)

// 🔄 Reusable compressor (caches tokenizer state)
c := tok.NewCompressor(tok.Aggressive)
compressed, stats, err := c.Compress(text)

// 📊 Token estimation
approx  := tok.EstimateTokens(text)         // fast, ±5%
precise := tok.EstimateTokensPrecise(text)  // BPE-accurate

// 🧮 Warmup (call at startup to avoid first-call latency)
tok.WarmupTokenizer()

// 🔒 Secret detection
matches  := tok.DefaultSecretDetector().DetectSecrets(text)
redacted := tok.DefaultSecretDetector().RedactSecrets(text)

📊 Compression Tiers

Tier Description Savings
🟢 TierSurface Light deduplication ~10%
🟡 TierTrim Whitespace + comments ~20%
🟠 TierExtract Key information extraction ~35%
🔵 TierCode Code-aware compression ~45%
🔴 TierCore Semantic core extraction ~55%
🟣 TierLog Log file optimization ~70%
TierAdaptive Adaptive per content type varies

🔒 Secret Detection

Strategy Description
🔑 Pattern-based Regex for API keys, JWTs, connection strings, SSH keys
📊 Entropy-based Shannon entropy analysis (threshold: 4.5)
📋 Allowlists Prevent false positives on known-safe patterns

🔗 Ecosystem Usage

Consumer Usage
🦅 hawk Context window management
🦅 eyrie Response compression
🧠 yaad Token budget enforcement in recall