Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions .cursor/rules/log-analysis.mdc
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
description: Use ctrlb-decompose to analyze log files before reading them raw
globs: **/*.log
alwaysApply: false
---

# Log File Analysis

This project includes ctrlb-decompose, a log pattern extraction tool. When you encounter a `.log` file or are asked to analyze logs, run ctrlb-decompose first to get a structural summary:

```bash
./target/release/ctrlb-decompose <file> --llm --context 2 --quiet 2>/dev/null
```

This compresses raw log lines into patterns with typed variables, quantile stats (p50/p99), and anomaly detection. Errors and warnings are prioritized first.

For JSON output: `./target/release/ctrlb-decompose <file> --json --quiet 2>/dev/null`

For piped input: `kubectl logs pod | ./target/release/ctrlb-decompose --llm --source-label "pod-name"`

Workflow: decompose first, identify patterns of interest, then grep or use `--context N` with a higher N to see more examples.
35 changes: 35 additions & 0 deletions .cursorrules
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
This repository contains ctrlb-decompose, a CLI tool that compresses raw log lines into structural patterns with typed variables, quantile statistics, anomaly detection, and temporal correlations.

## Log Analysis

When working with log files (>100 lines), run ctrlb-decompose first to get a structural summary before reading raw logs:

```bash
./target/release/ctrlb-decompose <file> --llm --context 2 --quiet 2>/dev/null
```

This compresses thousands of log lines into a handful of actionable patterns with:
- Error/warning patterns prioritized first
- Typed variables (IPv4, UUID, Duration, HexID, Integer, Enum, etc.)
- Quantile statistics (p50, p99, min, max) for numeric variables
- Example raw lines for each pattern

Key flags:
- `--llm` — Token-efficient markdown output (default context=2, banner suppressed)
- `--json` — Structured JSON output for programmatic use
- `--source-label <name>` — Tag output with a source name (e.g., pod name)
- `--sim-threshold <0.0-1.0>` — Tune clustering granularity (default 0.5, lower = fewer patterns)
- `--top <N>` — Limit to top N patterns (default 20)
- `--context <N>` — Example lines per pattern

Works with stdin: `kubectl logs pod | ./target/release/ctrlb-decompose --llm`

Workflow: decompose first, identify patterns of interest, then drill into the raw file with grep or higher --context.

## Build

```bash
cargo build --release
cargo test --locked
cargo clippy
```
25 changes: 25 additions & 0 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
This repository contains ctrlb-decompose, a CLI tool that compresses raw log lines into structural patterns with typed variables, quantile statistics, and anomaly detection.

## Log Analysis

When working with log files (>100 lines), run ctrlb-decompose first:

```bash
./target/release/ctrlb-decompose <file> --llm --context 2 --quiet 2>/dev/null
```

Key flags:
- `--llm` — Token-efficient markdown (errors first, quantile stats inline)
- `--json` — Structured JSON for programmatic use
- `--source-label <name>` — Tag output with source name
- `--sim-threshold <0.0-1.0>` — Tune clustering granularity (default 0.5)

Works with stdin: `kubectl logs pod | ./target/release/ctrlb-decompose --llm`

## Build & Test

```bash
cargo build --release
cargo test --locked
cargo clippy
```
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,8 @@ target
/logs

# WASM build output (regenerated by wasm-pack)
/web/pkg
/web/pkg

# Perf profiling data
perf.data
perf.data.old
53 changes: 53 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# AGENTS.md

This file provides guidance to Codex and other AI agents working with code in this repository.

## Project

ctrlb-decompose compresses raw log lines into structural patterns with typed variables, quantile statistics, anomaly detection, and temporal correlations. It runs as a CLI tool, a Rust library, or a WASM module.

## Build & Test

```bash
cargo build --release
cargo test --locked
cargo clippy
```

## Log Analysis

When working with log files (>100 lines), run ctrlb-decompose first to get a structural summary before reading raw logs:

```bash
./target/release/ctrlb-decompose <file> --llm --context 2 --quiet 2>/dev/null
```

Key flags:
- `--llm` — Token-efficient markdown output (banner auto-suppressed, default context=2)
- `--json` — Structured JSON for programmatic use
- `--source-label <name>` — Tag output with source name
- `--sim-threshold <0.0-1.0>` — Tune clustering (default 0.5, lower = fewer patterns)
- `--top <N>` — Top N patterns (default 20)
- `--context <N>` — Example lines per pattern

Works with stdin: `kubectl logs pod | ./target/release/ctrlb-decompose --llm`

Workflow: decompose first, identify patterns, then drill into raw logs with grep or higher `--context`.

## Architecture

Two-stage normalization + clustering pipeline (single-pass, streaming):

1. Timestamp extraction (`src/timestamp.rs`)
2. CLP encoding (`src/extraction/clp/`) — normalizes variables into typed placeholders
3. Drain3 clustering (`src/extraction/drain3.rs`) — tree-based prefix clustering with LRU eviction
4. Variable classification — semantic types: IPv4, UUID, Duration, HexID, Integer, Float, Enum, String
5. Statistics (`src/stats.rs`) — DDSketch quantiles, HyperLogLog cardinality, top-k, reservoir sampling
6. Anomaly detection (`src/anomaly.rs`) — frequency spikes, error cascades, bimodal distributions
7. Scoring & correlation (`src/scoring.rs`, `src/correlation.rs`)
8. Output formatting (`src/format/`) — human, llm, json

Entry points:
- CLI: `main.rs` -> `lib.rs::run(args)`
- Library: `lib.rs::process_log_text(input, opts)`
- WASM: `wasm.rs::analyze_logs(input, format, top_n, context_lines)`
93 changes: 93 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project

ctrlb-decompose compresses raw log lines into structural patterns with typed variables, quantile statistics, anomaly detection, and temporal correlations. It runs as a CLI tool, a Rust library, or a WASM module in the browser.

## Build & Test Commands

```bash
# Build
cargo build
cargo build --release

# Test
cargo test --locked
cargo test <test_name> # Run a single test

# Lint
cargo clippy

# Build without default features (library-only, no CLI)
cargo build --no-default-features

# WASM build
wasm-pack build --target web --out-dir web/pkg -- --no-default-features --features wasm
```

## Architecture

**Two-stage normalization + clustering pipeline** (single-pass, streaming):

1. **Timestamp extraction** (`src/timestamp.rs`) — regex-based, stripped before further processing
2. **CLP encoding** (`src/extraction/clp/`) — normalizes variables (ints, floats, IPs, hex) into typed placeholders
3. **Drain3 clustering** (`src/extraction/drain3.rs`) — tree-based prefix clustering on logtypes with LRU eviction
4. **Variable classification** (`src/extraction/drain3.rs`) — merges CLP-decoded values with Drain3 wildcards, classifies into semantic types (IPv4, UUID, Duration, HexID, Integer, Float, Enum, String, etc.)
5. **Statistics** (`src/stats.rs`) — DDSketch quantiles (~200 bytes/slot), HyperLogLog++ cardinality, top-k, temporal bucketing, reservoir-sampled examples
6. **Anomaly detection** (`src/anomaly.rs`) — frequency spikes, error cascades, bimodal distributions, low cardinality
7. **Scoring & correlation** (`src/scoring.rs`, `src/correlation.rs`) — keyword severity, Pearson temporal co-occurrence, shared variables
8. **Output formatting** (`src/format/`) — human (ANSI terminal), llm (compact markdown), json (structured)

**Entry points:**
- CLI: `main.rs` → `lib.rs::run(args)`
- Library: `lib.rs::process_log_text(input, opts) -> AnalysisOutput`
- WASM: `wasm.rs::analyze_logs(input, format, top_n, context_lines) -> String`

## Feature Gates

- `cli` (default) — includes `clap` and `colored` for terminal use
- `wasm` — includes `wasm-bindgen` and `serde-wasm-bindgen` for browser use
- The core library is WASM-safe (no stdin/filesystem deps)
- Crate type is `["cdylib", "rlib"]` for dual WASM + library output

## Key Design Decisions

- **Single-pass streaming**: no second pass over data; all stats accumulated incrementally
- **Memory-bounded**: Drain3 LRU (default 10k clusters), DDSketch fixed-size quantiles, HyperLogLog++ fixed-size cardinality, reservoir sampling for examples
- **Lazy regex compilation**: `once_cell::sync::Lazy` for all regex patterns
- **Minimum Rust version**: 1.94.0

## Testing

Tests are in `tests/integration.rs` using the fixture at `tests/fixtures/sample.log`. Snapshot testing uses `insta`. Benchmarks use `criterion` in `benches/pipeline.rs`.

## CI

- `ci.yml` — runs on push to main and PRs: `cargo test`, `clippy`, feature matrix (no-default-features, wasm target)
- `release.yml` — triggered by `v*.*.*` tags: cargo-dist multi-platform binaries + homebrew
- `wasm-deploy.yml` — deploys WASM build to GitHub Pages on push to main

## Using ctrlb-decompose for log analysis

When working with large log files (>100 lines), use ctrlb-decompose to get a structured summary before reading raw logs. This saves context window and surfaces errors/anomalies immediately.

```bash
# Analyze a log file (LLM-optimized output, 2 example lines per pattern)
./target/release/ctrlb-decompose <file> --llm --context 2 --quiet 2>/dev/null

# Pipe from any command
kubectl logs <pod> | ./target/release/ctrlb-decompose --llm --context 2 --quiet --source-label "pod-name" 2>/dev/null
journalctl -n 5000 | ./target/release/ctrlb-decompose --llm --context 2 --quiet 2>/dev/null

# JSON output for programmatic use
./target/release/ctrlb-decompose <file> --json --quiet 2>/dev/null

# Tune clustering granularity (lower = more aggressive merging, higher = more patterns)
./target/release/ctrlb-decompose <file> --llm --sim-threshold 0.6
```

In LLM mode, the banner is suppressed automatically. The `--quiet` flag suppresses the progress line on stderr.

**Workflow**: Run `--llm` first to identify patterns of interest, then use `--context N` with higher N or grep for specific patterns in the raw file.
Loading