Tok Architecture

Tok is a Go library (no CLI, no binary) that cuts LLM token costs by 60–90% through prompt compression, output filtering, cost estimation, and secret detection. It is consumed by hawk, eyrie, yaad, and any other Go program that needs to keep LLM context windows lean.

System Overview

┌─────────────────────────────────────────────────────────────────┐
│                     Consumer Application                          │
│         hawk  |  eyrie  |  yaad  |  custom Go service             │
└────────────────────────────┬────────────────────────────────────┘
                             │ import "github.com/GrayCodeAI/tok"
┌────────────────────────────▼────────────────────────────────────┐
│                         tok package                               │
│  ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐       │
│  │ Compress     │ │ Estimate     │ │ Cost / Rate-limit /  │       │
│  │ (31-layer    │ │ Tokens       │ │ Secret detection     │       │
│  │  pipeline)   │ │ (BPE)        │ │ (33 patterns)        │       │
│  └──────┬───────┘ └──────┬───────┘ └──────────┬───────────┘       │
│         └────────────────┼─────────────────────┘                  │
│                          ▼                                        │
│   internal/filter (31 layers) + internal/core (BPE)               │
│   + internal/secrets + internal/cache + internal/extract          │
└─────────────────────────────────────────────────────────────────┘

Core Abstractions

Public API

// Compress runs the 31-layer pipeline (plus any opt-in post-stages) and
// returns the compressed text and per-stage stats. Safe to call with no
// options; sensible defaults apply.
func Compress(text string, opts ...Option) (string, Stats)

// EstimateTokens returns the estimated token count for text. BPE-backed
// when a model is supplied, heuristic otherwise.
func EstimateTokens(text string) int
func EstimateTokensForModel(text, model string) int
func EstimateTokensPrecise(text string) int
func EstimateTokensFast(text string) int

// Cost & pricing.
func GetModelPricing(model string) (ModelPricing, bool)
func RegisterModelPricing(model string, inputPer1K, outputPer1K float64)
func EstimateCostSavings(stats Stats, model string) float64
func ListModels() []string

// Secret detection.
type SecretDetector struct{ ... }
func NewSecretDetector() *SecretDetector
func DefaultSecretDetector() *SecretDetector
func IsSensitiveFilename(path string) (bool, secrets.FilenameMatch)

// Output extraction.
func ExtractJSON(text string) (string, bool)
func ExtractJSONArray(text string) (string, bool)
func ExtractAllJSON(text string) []string
func CompressJSON(text string, maxItems int) string
func CompressLog(text string) string

// Reusable compressor.
type Compressor struct{ ... }
func NewCompressor(opts ...Option) *Compressor
func (c *Compressor) Compress(text string) (string, Stats)

// Context-window optimizer.
type ContextOptimizer struct{ ... }
func NewContextOptimizer(opts ...Option) *ContextOptimizer

// Strategy advisor.
type CompressionAdvisor struct{ ... }
func NewCompressionAdvisor() *CompressionAdvisor

// Rate-limit / usage tracker.
type UsageTracker struct{ ... }
func NewUsageTracker(opts ...UsageOption) *UsageTracker

// Persistent gain tracker (SQLite).
type Tracker struct{ ... }
func NewTracker(ctx context.Context) (*Tracker, error)
func NewTrackerAt(path string) (*Tracker, error)

Functional Options

tok.WithMode(tok.ModeFull)           // Compression intensity
tok.WithBudget(10000)                 // Hard token budget on output
tok.WithTier(tok.TierCore)            // Pipeline tier (surface/trim/extract/core/code/log/thread/adaptive)
tok.WithQuery("relevant context")     // Query-aware compression
tok.WithModel("gpt-4o")               // Enables cost calculation + BPE
tok.WithCodeAware("go")               // Symbol-preserving guard for source code
tok.WithCustomFilters(rules)          // Append user TOML regex rules
tok.WithPerplexityGuided(scorer, 0.4) // LLMLingua-style selective drop

Preset Variables

tok.Minimal     // Lightest pass — entropy + AST + budget
tok.Aggressive  // Full pipeline, every layer flipped on
tok.Surface     // Output filtering only (good for already-compressed text)
tok.Adaptive    // Auto-detect content type, choose tier
tok.Code        // Symbol-preserving, comment stripping, structure kept
tok.Log         // Collapse repeated INFO/DEBUG runs, keep ERROR verbatim

Compression Pipeline

Architecture

The pipeline is a multi-stage compression engine. Each stage mutates the input in place and updates the shared PipelineContext:

Input Text
    │
    ▼
┌─────────────────────────────────────────────────────────┐
│                  PipelineCoordinator                      │
│  ┌───────────────────────────────────────────────────┐  │
│  │ Content Type Detection → Adaptive Tier Selection   │  │
│  └───────────────────────────────────────────────────┘  │
│                                                          │
│  ┌───────────────────────────────────────────────────┐  │
│  │ Pre  (0-0.5)  : QuantumLock (KV-cache align),      │  │
│  │                 Photon (image handling)             │  │
│  │ Core (1-10)   : Entropy, Perplexity, AST,           │  │
│  │                 Goal-Driven, Contrastive, N-gram,   │  │
│  │                 Evaluator-Heads, Gist, Hierarchical,│  │
│  │                 Budget                               │  │
│  │ Sem. (11-20)  : Compaction, Attribution, H2O,        │  │
│  │                 AttentionSink, MetaToken,            │  │
│  │                 SemanticChunk, SketchStore,          │  │
│  │                 LazyPruner, SemanticAnchor,          │  │
│  │                 AgentMemory                          │  │
│  │ Adv. (21-40)  : MarginalInfoGain, NearDedup,         │  │
│  │                 CoTCompress, DiffAdapt, EPiC,        │  │
│  │                 GraphCoT, and ~15 more               │  │
│  │ Spec.(41-50)  : ContextCrunch, SearchCrunch,        │  │
│  │                 AdaptiveLearning (5K+ token input)   │  │
│  └───────────────────────────────────────────────────┘  │
│                                                          │
│  ┌───────────────────────────────────────────────────┐  │
│  │ Quality Guardrails → Output Validation             │  │
│  └───────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────┘
    │
    ▼
Compressed Text + Stats

Tier System

Tier	Layers	Purpose	Auto-Enabled
Pre	0-0.5	QuantumLock, Photon	Always
Core	1-10	Entropy, Perplexity, AST, Goal-Driven, Contrastive, N-gram, Evaluator, Gist, Hierarchical, Budget	Always
Semantic	11-20	Compaction, Attribution, H2O, AttentionSink, MetaToken, SemanticChunk, SketchStore, LazyPruner, SemanticAnchor, AgentMemory	Always
Advanced	21-40	20 research-based layers (MarginalInfoGain, NearDedup, CoTCompress, DiffAdapt, EPiC, GraphCoT, etc.)	Auto for large inputs
Specialized	41-50	Experimental (ContextCrunch, SearchCrunch, AdaptiveLearning)	Auto for 5K+ tokens

Layer Interface

type Filter interface {
    Name() string
    Apply(input string, ctx *PipelineContext) (string, error)
}

// Optional interfaces for layer behavior control.
type EnableCheck interface {
    Enabled(ctx *PipelineContext) bool
}

type ApplicabilityCheck interface {
    Applicable(input string, ctx *PipelineContext) bool
}

Inter-Layer Communication

Layers communicate via PipelineContext:

type PipelineContext struct {
    OriginalTokens  int
    CurrentTokens   int
    Budget          int
    Mode            CompressionMode
    Query           string
    LayerStats      []LayerStat
    SharedState     map[string]interface{}  // Inter-layer data
    QualityScore    float64                  // Running quality metric
}

Package Structure

Package	Purpose	Key Files
`tok.go`	Public Compress + EstimateTokens entry points	`Compress()`, `EstimateTokens*`
`options.go`	Functional options + preset variables	`WithMode()`, `WithBudget()`, `WithTier()`, `Minimal/Aggressive/Surface/Adaptive/Code/Log`
`compressor.go`	Reusable `Compressor` (caches pipeline)	`Compressor`, `NewCompressor`
`stream.go`	Streaming compression (delta-only)	`StreamCompressor`
`optimizer.go`	Token-budget context optimizer	`ContextOptimizer`, `Greedy/Balanced/PriorityOptimize`
`chunker.go`	Source-code chunking (130+ language map)	`ChunkCode`, `RegisterChunker`
`advisor.go`	Strategy recommender + content classifier	`CompressionAdvisor`, `ClassifyContent`
`ratelimit.go`	Usage tracker w/ thresholds	`UsageTracker`, `FormatUsageBar`
`secrets.go`	Secret detection facade (33 patterns internally)	`SecretDetector`, `IsSensitiveFilename`
`tracker.go`	Persistent gain tracker (SQLite/WAL)	`Tracker`, `NewTrackerAt`
`entropy.go`	Shannon-entropy helpers	`ShannonEntropy`, `IsHighEntropy`
`extract.go`	Brace-balanced JSON extraction	`ExtractJSON*`
`jsoncrunch.go`	JSON array sampler	`CompressJSON`
`logcrunch.go`	Log-line level detector + run collapse	`CompressLog`
`profile.go`	Named/versioned compression profiles (TOML)	`LoadProfile`, `BuiltinProfile*`
`filters.go`	Custom regex filter DSL (TOML)	`LoadFilterRules`, `CustomFilter`
`codeaware.go`	Symbol-preserving code guard	`WithCodeAware`, `codeProtector`
`perplexity.go`	LLMLingua-style selective drop	`WithPerplexityGuided`
`mcp/server.go`	MCP server with real `count_tokens`, `estimate_cost`, `compress_text`, `redact_secrets` tools	`NewTokServer`
`internal/filter/`	Pipeline engine — 31 layers + tier configs + presets	`pipeline_*.go`, `presets.go`, `tier_config.go`
`internal/core/`	BPE tokenizer, batch processor, runner	`estimator.go`, `cost.go`
`internal/cache/`	Multi-level cache with git-aware watcher	`cache.go`, `git_watcher.go`
`internal/extract/`	Brace-balanced JSON extraction impl	`extract.go`
`internal/fastops/`	SIMD-accelerated primitives	`simd_amd64.go`, `simd_amd64.s`
`internal/secrets/`	33 secret regex patterns + filename detector	`secrets.go`, `filename.go`
`internal/tracking/`	SQLite-backed gain tracker	`tracking.go`
`internal/utils/`	slog adapter, helpers	`logger.go`
`filters/`	80 per-tool TOML filter configs (jest, eslint, go, kubectl, terraform, etc.)	one TOML per tool
`commands/`	6 TOML agent-command definitions (pr-review, tok-commit, tok-compress, tok-help, tok-review, tok)	one TOML per command
`config/`	Example TOML + tokman.yaml	`example.toml`
`rules/`	ast-grep `no-fmt-println` rule + tok agent-activation prompt	`no-fmt-println.yaml`, `tok-activate.md`
`skills/`	5 Claude-style agent skills (`tok`, `tok-commit`, `tok-compress`, `tok-help`, `tok-review`)	`SKILL.md` per skill
`benchmarks/`	Benchmark harness (run.sh + results.md template)	`run.sh`
`evals/`	Prompt-compression eval	`pipeline-bench.sh`, `prompts/en.txt`
`types/`	Cross-eco exported types (mirrors hawk's `shared/types/`)	`finding.go`, `severity.go`

Data Flow

Compression Request

1. Consumer calls tok.Compress(text, opts...)
2. Options parsed (mode, budget, tier, query, model, code-aware, custom rules)
3. Content type detected (code, log, markdown, data, etc.)
4. Adaptive tier selection based on input size
5. PipelineCoordinator created (from sync.Pool for reuse)
6. Layers executed sequentially:
   a. Each layer receives input + PipelineContext
   b. Layer transforms text (remove, compress, restructure)
   c. PipelineContext updated (tokens saved, quality score)
   d. Early exit if budget met
7. Optional post-stages: perplexity-guided drop → custom TOML rules
8. Quality guardrails validate output (no accidental whitespace/structure loss)
9. Stats computed: originalTokens, finalTokens, tokensSaved, reductionPct, cost
10. Result returned (compressed text + stats)

Secret Detection Request

1. Consumer calls det := tok.NewSecretDetector()
2. det.DetectSecrets(text) iterates the 33-pattern registry
3. Each pattern: compiled regex; on match, record (type, span, value)
4. det.RedactSecrets(text) replaces matches with [REDACTED:<type>]
5. Optional: DetectAndRedactWithEntropy(text, threshold) adds Shannon-entropy
   pass to catch high-entropy blobs the regex table misses

Cost Calculation Request

1. Consumer calls tok.GetModelPricing(model) → ModelPricing
   (returns zero-value + false for unknown models; consumer may call
    tok.RegisterModelPricing to add custom entries)
2. Cost = (inputTokens/1000)*InputPricePer1K + (outputTokens/1000)*OutputPricePer1K
3. For compression savings: tok.EstimateCostSavings(stats, model)
   conservatively assumes saved tokens would have been input tokens

Performance

Object Pooling

coordinator_pool.go reuses pipeline coordinators via sync.Pool for a 10–20× speedup over per-call NewCompressor() construction.

var coordinatorPool = sync.Pool{
    New: func() interface{} { return filter.NewPipelineCoordinator() },
}

SIMD Optimization

internal/fastops/ provides SIMD-accelerated string operations on amd64:

Operation	Generic	SIMD (AVX2)	Speedup
ANSI stripping	100ns	30ns	3.3x
Whitespace norm	80ns	25ns	3.2x
Char counting	60ns	20ns	3.0x

Build tag: simd_avx2 (auto-detected at runtime).

Token Estimation

Two modes:

Heuristic (EstimateTokensFast): ~0.3 ns/op, character-based estimate
BPE (EstimateTokensPrecise / EstimateTokensForModel): ~2 ns/op, tiktoken-compatible (cl100k, o200k, p50k, r50k encodings)

internal/core/estimator.go uses a 64-shard sharded LRU token cache for BPE counts (FNV-64a keyed, atomic hit counter).

Buffer Pooling

internal/filter/bytepool.go provides BytePool + FastStringBuilder to reduce GC pressure on hot paths.

Filter Configuration

80 per-tool TOML filter configs in filters/ (one per CLI tool: jest, eslint, go, kubectl, terraform, vitest, playwright, aws, swift, etc.). Each declares which pipeline layers to run and a per-tool token budget. Loaded via tok.LoadFilterRules and applied via WithCustomFilters.

# filters/jest.toml
[[rule]]
name = "strip-ansi"
pattern = '\x1b\[[0-9;]*m'
replacement = ""

[[rule]]
name = "collapse-blank-lines"
pattern = '\n{3,}'
replacement = '\n\n'

Security

Secret Detection

Pattern + entropy-based detection across 33 patterns:

type SecretDetector struct {
    patterns []*regexp.Regexp  // 33 secret formats
    entropy  EntropyAnalyzer   // optional Shannon-entropy pass
}

func (d *SecretDetector) DetectSecrets(text string) []SecretMatch
func (d *SecretDetector) RedactSecrets(text string) string
func (d *SecretDetector) DetectAndRedactWithEntropy(text string, threshold float64) string

Supported patterns include: AWS access keys, GitHub PATs, Slack tokens, Google API keys, Stripe keys, OpenAI/Anthropic keys, JWTs, RSA/EC/OpenSSH private keys, SendGrid, Twilio, Heroku, DigitalOcean, npm, PyPI, Docker registry, generic API keys, passwords, DB connection strings, Bearer tokens.

tok.IsSensitiveFilename complements content scanning with a 3-layer filename detector (exact basename, sensitive directory, name token) for .env, id_rsa, /home/*/.ssh/..., etc.

Build & Release

Language: Go 1.26+, zero CGO
Type: Library — no binary, no CLI (.goreleaser.yml ships source archive + SPDX SBOM only)
Distribution: go get github.com/GrayCodeAI/tok
Versioning: VERSION file is the single source of truth; embedded via //go:embed in version.go; bumped by release-please from Conventional Commits
CI: 3 workflows (ci.yml for fmt/vet/lint/test/security, release.yml for GoReleaser, scorecard.yml for OpenSSF)
Coverage gate: 60% (codecov)

The consumer-facing CLI is hawk tok ..., which embeds this library.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tok Architecture

System Overview

Core Abstractions

Public API

Functional Options

Preset Variables

Compression Pipeline

Architecture

Tier System

Layer Interface

Inter-Layer Communication

Package Structure

Data Flow

Compression Request

Secret Detection Request

Cost Calculation Request

Performance

Object Pooling

SIMD Optimization

Token Estimation

Buffer Pooling

Filter Configuration

Security

Secret Detection

Build & Release

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

Tok Architecture

System Overview

Core Abstractions

Public API

Functional Options

Preset Variables

Compression Pipeline

Architecture

Tier System

Layer Interface

Inter-Layer Communication

Package Structure

Data Flow

Compression Request

Secret Detection Request

Cost Calculation Request

Performance

Object Pooling

SIMD Optimization

Token Estimation

Buffer Pooling

Filter Configuration

Security

Secret Detection

Build & Release