Skip to content

Latest commit

 

History

History
403 lines (326 loc) · 17.9 KB

File metadata and controls

403 lines (326 loc) · 17.9 KB

Tok Architecture

Tok is a Go library (no CLI, no binary) that cuts LLM token costs by 60–90% through prompt compression, output filtering, cost estimation, and secret detection. It is consumed by hawk, eyrie, yaad, and any other Go program that needs to keep LLM context windows lean.


System Overview

┌─────────────────────────────────────────────────────────────────┐
│                     Consumer Application                          │
│         hawk  |  eyrie  |  yaad  |  custom Go service             │
└────────────────────────────┬────────────────────────────────────┘
                             │ import "github.com/GrayCodeAI/tok"
┌────────────────────────────▼────────────────────────────────────┐
│                         tok package                               │
│  ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐       │
│  │ Compress     │ │ Estimate     │ │ Cost / Rate-limit /  │       │
│  │ (31-layer    │ │ Tokens       │ │ Secret detection     │       │
│  │  pipeline)   │ │ (BPE)        │ │ (33 patterns)        │       │
│  └──────┬───────┘ └──────┬───────┘ └──────────┬───────────┘       │
│         └────────────────┼─────────────────────┘                  │
│                          ▼                                        │
│   internal/filter (31 layers) + internal/core (BPE)               │
│   + internal/secrets + internal/cache + internal/extract          │
└─────────────────────────────────────────────────────────────────┘

Core Abstractions

Public API

// Compress runs the 31-layer pipeline (plus any opt-in post-stages) and
// returns the compressed text and per-stage stats. Safe to call with no
// options; sensible defaults apply.
func Compress(text string, opts ...Option) (string, Stats)

// EstimateTokens returns the estimated token count for text. BPE-backed
// when a model is supplied, heuristic otherwise.
func EstimateTokens(text string) int
func EstimateTokensForModel(text, model string) int
func EstimateTokensPrecise(text string) int
func EstimateTokensFast(text string) int

// Cost & pricing.
func GetModelPricing(model string) (ModelPricing, bool)
func RegisterModelPricing(model string, inputPer1K, outputPer1K float64)
func EstimateCostSavings(stats Stats, model string) float64
func ListModels() []string

// Secret detection.
type SecretDetector struct{ ... }
func NewSecretDetector() *SecretDetector
func DefaultSecretDetector() *SecretDetector
func IsSensitiveFilename(path string) (bool, secrets.FilenameMatch)

// Output extraction.
func ExtractJSON(text string) (string, bool)
func ExtractJSONArray(text string) (string, bool)
func ExtractAllJSON(text string) []string
func CompressJSON(text string, maxItems int) string
func CompressLog(text string) string

// Reusable compressor.
type Compressor struct{ ... }
func NewCompressor(opts ...Option) *Compressor
func (c *Compressor) Compress(text string) (string, Stats)

// Context-window optimizer.
type ContextOptimizer struct{ ... }
func NewContextOptimizer(opts ...Option) *ContextOptimizer

// Strategy advisor.
type CompressionAdvisor struct{ ... }
func NewCompressionAdvisor() *CompressionAdvisor

// Rate-limit / usage tracker.
type UsageTracker struct{ ... }
func NewUsageTracker(opts ...UsageOption) *UsageTracker

// Persistent gain tracker (SQLite).
type Tracker struct{ ... }
func NewTracker(ctx context.Context) (*Tracker, error)
func NewTrackerAt(path string) (*Tracker, error)

Functional Options

tok.WithMode(tok.ModeFull)           // Compression intensity
tok.WithBudget(10000)                 // Hard token budget on output
tok.WithTier(tok.TierCore)            // Pipeline tier (surface/trim/extract/core/code/log/thread/adaptive)
tok.WithQuery("relevant context")     // Query-aware compression
tok.WithModel("gpt-4o")               // Enables cost calculation + BPE
tok.WithCodeAware("go")               // Symbol-preserving guard for source code
tok.WithCustomFilters(rules)          // Append user TOML regex rules
tok.WithPerplexityGuided(scorer, 0.4) // LLMLingua-style selective drop

Preset Variables

tok.Minimal     // Lightest pass — entropy + AST + budget
tok.Aggressive  // Full pipeline, every layer flipped on
tok.Surface     // Output filtering only (good for already-compressed text)
tok.Adaptive    // Auto-detect content type, choose tier
tok.Code        // Symbol-preserving, comment stripping, structure kept
tok.Log         // Collapse repeated INFO/DEBUG runs, keep ERROR verbatim

Compression Pipeline

Architecture

The pipeline is a multi-stage compression engine. Each stage mutates the input in place and updates the shared PipelineContext:

Input Text
    │
    ▼
┌─────────────────────────────────────────────────────────┐
│                  PipelineCoordinator                      │
│  ┌───────────────────────────────────────────────────┐  │
│  │ Content Type Detection → Adaptive Tier Selection   │  │
│  └───────────────────────────────────────────────────┘  │
│                                                          │
│  ┌───────────────────────────────────────────────────┐  │
│  │ Pre  (0-0.5)  : QuantumLock (KV-cache align),      │  │
│  │                 Photon (image handling)             │  │
│  │ Core (1-10)   : Entropy, Perplexity, AST,           │  │
│  │                 Goal-Driven, Contrastive, N-gram,   │  │
│  │                 Evaluator-Heads, Gist, Hierarchical,│  │
│  │                 Budget                               │  │
│  │ Sem. (11-20)  : Compaction, Attribution, H2O,        │  │
│  │                 AttentionSink, MetaToken,            │  │
│  │                 SemanticChunk, SketchStore,          │  │
│  │                 LazyPruner, SemanticAnchor,          │  │
│  │                 AgentMemory                          │  │
│  │ Adv. (21-40)  : MarginalInfoGain, NearDedup,         │  │
│  │                 CoTCompress, DiffAdapt, EPiC,        │  │
│  │                 GraphCoT, and ~15 more               │  │
│  │ Spec.(41-50)  : ContextCrunch, SearchCrunch,        │  │
│  │                 AdaptiveLearning (5K+ token input)   │  │
│  └───────────────────────────────────────────────────┘  │
│                                                          │
│  ┌───────────────────────────────────────────────────┐  │
│  │ Quality Guardrails → Output Validation             │  │
│  └───────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────┘
    │
    ▼
Compressed Text + Stats

Tier System

Tier Layers Purpose Auto-Enabled
Pre 0-0.5 QuantumLock, Photon Always
Core 1-10 Entropy, Perplexity, AST, Goal-Driven, Contrastive, N-gram, Evaluator, Gist, Hierarchical, Budget Always
Semantic 11-20 Compaction, Attribution, H2O, AttentionSink, MetaToken, SemanticChunk, SketchStore, LazyPruner, SemanticAnchor, AgentMemory Always
Advanced 21-40 20 research-based layers (MarginalInfoGain, NearDedup, CoTCompress, DiffAdapt, EPiC, GraphCoT, etc.) Auto for large inputs
Specialized 41-50 Experimental (ContextCrunch, SearchCrunch, AdaptiveLearning) Auto for 5K+ tokens

Layer Interface

type Filter interface {
    Name() string
    Apply(input string, ctx *PipelineContext) (string, error)
}

// Optional interfaces for layer behavior control.
type EnableCheck interface {
    Enabled(ctx *PipelineContext) bool
}

type ApplicabilityCheck interface {
    Applicable(input string, ctx *PipelineContext) bool
}

Inter-Layer Communication

Layers communicate via PipelineContext:

type PipelineContext struct {
    OriginalTokens  int
    CurrentTokens   int
    Budget          int
    Mode            CompressionMode
    Query           string
    LayerStats      []LayerStat
    SharedState     map[string]interface{}  // Inter-layer data
    QualityScore    float64                  // Running quality metric
}

Package Structure

Package Purpose Key Files
tok.go Public Compress + EstimateTokens entry points Compress(), EstimateTokens*
options.go Functional options + preset variables WithMode(), WithBudget(), WithTier(), Minimal/Aggressive/Surface/Adaptive/Code/Log
compressor.go Reusable Compressor (caches pipeline) Compressor, NewCompressor
stream.go Streaming compression (delta-only) StreamCompressor
optimizer.go Token-budget context optimizer ContextOptimizer, Greedy/Balanced/PriorityOptimize
chunker.go Source-code chunking (130+ language map) ChunkCode, RegisterChunker
advisor.go Strategy recommender + content classifier CompressionAdvisor, ClassifyContent
ratelimit.go Usage tracker w/ thresholds UsageTracker, FormatUsageBar
secrets.go Secret detection facade (33 patterns internally) SecretDetector, IsSensitiveFilename
tracker.go Persistent gain tracker (SQLite/WAL) Tracker, NewTrackerAt
entropy.go Shannon-entropy helpers ShannonEntropy, IsHighEntropy
extract.go Brace-balanced JSON extraction ExtractJSON*
jsoncrunch.go JSON array sampler CompressJSON
logcrunch.go Log-line level detector + run collapse CompressLog
profile.go Named/versioned compression profiles (TOML) LoadProfile, BuiltinProfile*
filters.go Custom regex filter DSL (TOML) LoadFilterRules, CustomFilter
codeaware.go Symbol-preserving code guard WithCodeAware, codeProtector
perplexity.go LLMLingua-style selective drop WithPerplexityGuided
mcp/server.go MCP server with real count_tokens, estimate_cost, compress_text, redact_secrets tools NewTokServer
internal/filter/ Pipeline engine — 31 layers + tier configs + presets pipeline_*.go, presets.go, tier_config.go
internal/core/ BPE tokenizer, batch processor, runner estimator.go, cost.go
internal/cache/ Multi-level cache with git-aware watcher cache.go, git_watcher.go
internal/extract/ Brace-balanced JSON extraction impl extract.go
internal/fastops/ SIMD-accelerated primitives simd_amd64.go, simd_amd64.s
internal/secrets/ 33 secret regex patterns + filename detector secrets.go, filename.go
internal/tracking/ SQLite-backed gain tracker tracking.go
internal/utils/ slog adapter, helpers logger.go
filters/ 80 per-tool TOML filter configs (jest, eslint, go, kubectl, terraform, etc.) one TOML per tool
commands/ 6 TOML agent-command definitions (pr-review, tok-commit, tok-compress, tok-help, tok-review, tok) one TOML per command
config/ Example TOML + tokman.yaml example.toml
rules/ ast-grep no-fmt-println rule + tok agent-activation prompt no-fmt-println.yaml, tok-activate.md
skills/ 5 Claude-style agent skills (tok, tok-commit, tok-compress, tok-help, tok-review) SKILL.md per skill
benchmarks/ Benchmark harness (run.sh + results.md template) run.sh
evals/ Prompt-compression eval pipeline-bench.sh, prompts/en.txt
types/ Cross-eco exported types (mirrors hawk's shared/types/) finding.go, severity.go

Data Flow

Compression Request

1. Consumer calls tok.Compress(text, opts...)
2. Options parsed (mode, budget, tier, query, model, code-aware, custom rules)
3. Content type detected (code, log, markdown, data, etc.)
4. Adaptive tier selection based on input size
5. PipelineCoordinator created (from sync.Pool for reuse)
6. Layers executed sequentially:
   a. Each layer receives input + PipelineContext
   b. Layer transforms text (remove, compress, restructure)
   c. PipelineContext updated (tokens saved, quality score)
   d. Early exit if budget met
7. Optional post-stages: perplexity-guided drop → custom TOML rules
8. Quality guardrails validate output (no accidental whitespace/structure loss)
9. Stats computed: originalTokens, finalTokens, tokensSaved, reductionPct, cost
10. Result returned (compressed text + stats)

Secret Detection Request

1. Consumer calls det := tok.NewSecretDetector()
2. det.DetectSecrets(text) iterates the 33-pattern registry
3. Each pattern: compiled regex; on match, record (type, span, value)
4. det.RedactSecrets(text) replaces matches with [REDACTED:<type>]
5. Optional: DetectAndRedactWithEntropy(text, threshold) adds Shannon-entropy
   pass to catch high-entropy blobs the regex table misses

Cost Calculation Request

1. Consumer calls tok.GetModelPricing(model) → ModelPricing
   (returns zero-value + false for unknown models; consumer may call
    tok.RegisterModelPricing to add custom entries)
2. Cost = (inputTokens/1000)*InputPricePer1K + (outputTokens/1000)*OutputPricePer1K
3. For compression savings: tok.EstimateCostSavings(stats, model)
   conservatively assumes saved tokens would have been input tokens

Performance

Object Pooling

coordinator_pool.go reuses pipeline coordinators via sync.Pool for a 10–20× speedup over per-call NewCompressor() construction.

var coordinatorPool = sync.Pool{
    New: func() interface{} { return filter.NewPipelineCoordinator() },
}

SIMD Optimization

internal/fastops/ provides SIMD-accelerated string operations on amd64:

Operation Generic SIMD (AVX2) Speedup
ANSI stripping 100ns 30ns 3.3x
Whitespace norm 80ns 25ns 3.2x
Char counting 60ns 20ns 3.0x

Build tag: simd_avx2 (auto-detected at runtime).

Token Estimation

Two modes:

  • Heuristic (EstimateTokensFast): ~0.3 ns/op, character-based estimate
  • BPE (EstimateTokensPrecise / EstimateTokensForModel): ~2 ns/op, tiktoken-compatible (cl100k, o200k, p50k, r50k encodings)

internal/core/estimator.go uses a 64-shard sharded LRU token cache for BPE counts (FNV-64a keyed, atomic hit counter).

Buffer Pooling

internal/filter/bytepool.go provides BytePool + FastStringBuilder to reduce GC pressure on hot paths.


Filter Configuration

80 per-tool TOML filter configs in filters/ (one per CLI tool: jest, eslint, go, kubectl, terraform, vitest, playwright, aws, swift, etc.). Each declares which pipeline layers to run and a per-tool token budget. Loaded via tok.LoadFilterRules and applied via WithCustomFilters.

# filters/jest.toml
[[rule]]
name = "strip-ansi"
pattern = '\x1b\[[0-9;]*m'
replacement = ""

[[rule]]
name = "collapse-blank-lines"
pattern = '\n{3,}'
replacement = '\n\n'

Security

Secret Detection

Pattern + entropy-based detection across 33 patterns:

type SecretDetector struct {
    patterns []*regexp.Regexp  // 33 secret formats
    entropy  EntropyAnalyzer   // optional Shannon-entropy pass
}

func (d *SecretDetector) DetectSecrets(text string) []SecretMatch
func (d *SecretDetector) RedactSecrets(text string) string
func (d *SecretDetector) DetectAndRedactWithEntropy(text string, threshold float64) string

Supported patterns include: AWS access keys, GitHub PATs, Slack tokens, Google API keys, Stripe keys, OpenAI/Anthropic keys, JWTs, RSA/EC/OpenSSH private keys, SendGrid, Twilio, Heroku, DigitalOcean, npm, PyPI, Docker registry, generic API keys, passwords, DB connection strings, Bearer tokens.

tok.IsSensitiveFilename complements content scanning with a 3-layer filename detector (exact basename, sensitive directory, name token) for .env, id_rsa, /home/*/.ssh/..., etc.


Build & Release

  • Language: Go 1.26+, zero CGO
  • Type: Library — no binary, no CLI (.goreleaser.yml ships source archive + SPDX SBOM only)
  • Distribution: go get github.com/GrayCodeAI/tok
  • Versioning: VERSION file is the single source of truth; embedded via //go:embed in version.go; bumped by release-please from Conventional Commits
  • CI: 3 workflows (ci.yml for fmt/vet/lint/test/security, release.yml for GoReleaser, scorecard.yml for OpenSSF)
  • Coverage gate: 60% (codecov)

The consumer-facing CLI is hawk tok ..., which embeds this library.