perf: low-latency optimizations for hot paths and memory allocation by dmitriimaksimovdevelop · Pull Request #21 · dmitriimaksimovdevelop/melisai

dmitriimaksimovdevelop · 2026-04-07T07:32:33Z

Summary

Pre-compile regex patterns at package level instead of per-call recompilation in parsers (~100μs saved per histogram parse × 67 tools)
Pre-allocate slices with capacity hints across all parsers (eliminate ~670K unnecessary copies per collection)
Replace fmt.Sprintf("%v") with type-switch formatKey() in aggregation (20ns vs ~500ns per key, zero allocs for common types)
Singleton defaultThresholds — avoid 37 closure + struct allocations on every DetectAnomalies() call (0.3ns access vs ~100ns)
sync.Pool for bytes.Buffer reuse across 67+ BCC tool executions
Streaming json.NewDecoder in diff.LoadReport() — halves peak memory for large reports
Single-pass category scan in AI prompt generation (3 separate loops → 1)
Manual binary.LittleEndian parsing in eBPF tcpretrans collector (avoid reflection-heavy binary.Read)
--pprof flag for CPU profiling melisai itself (go tool pprof melisai_cpu.prof)
11 benchmark tests for parsers, aggregation, and anomaly detection

Benchmark Results (Apple M2 Pro)

Benchmark	ns/op	allocs/op	B/op
ParseHistogram	6,524	28	4,583
ParseTabularEvents (1000)	296,086	8,008	577K
ParseFoldedStacks (500)	16,736	5	28K
AggregateByField (1000)	35,111	19	8,344
DetectAnomalies (37 rules)	17,005	188	10,710
DefaultThresholds access	0.3	0	0
FormatKey (type switch)	21	0	3

Test plan

All 11 packages pass go test ./...
All 11 benchmarks pass go test -bench=. -benchmem
No API changes — fully backward compatible
Run on target Linux system with melisai collect --pprof melisai_cpu.prof to verify real-world profile

🤖 Generated with Claude Code

Apply HFT-inspired low-latency best practices to reduce observer effect and improve melisai's own performance during system profiling. Hot path optimizations: - Pre-compile regex patterns at package level (parsers.go) - Pre-allocate slices with capacity hints in all parsers - Pre-lowercase headers once instead of per-event in ParseTabularEvents - Replace fmt.Sprintf("%v") with type-switch formatKey() in aggregation - Make DefaultThresholds a package-level singleton (avoid 37 closure allocs) Memory/IO optimizations: - Add sync.Pool for bytes.Buffer reuse across 67+ BCC tool executions - Switch diff.LoadReport to streaming json.NewDecoder (halve peak memory) - Single-pass category scan in AI prompt generation (3 loops → 1) - Manual binary.LittleEndian parsing in eBPF tcpretrans (avoid reflection) Observability: - Add --pprof flag for CPU profiling of melisai itself - Add 11 benchmark tests for parsers, aggregation, and anomaly detection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- NATIVE_EBPF_MIGRATION.md — full plan with phases, patterns, validation - PROMPT_NATIVE_EBPF.md — reusable prompt template for AI-assisted porting Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…-23-plus-perf release: issue #23 tcpretrans fix + low-latency perf (PR #21)

dmitriimaksimovdevelop and others added 2 commits April 7, 2026 10:32

docs: add native eBPF migration plan and AI prompt template (#22)

4df23cb

- NATIVE_EBPF_MIGRATION.md — full plan with phases, patterns, validation - PROMPT_NATIVE_EBPF.md — reusable prompt template for AI-assisted porting Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

dmitriimaksimovdevelop mentioned this pull request Apr 15, 2026

release: issue #23 tcpretrans fix + low-latency perf (PR #21) #24

Merged

6 tasks

dmitriimaksimovdevelop added a commit that referenced this pull request Apr 15, 2026

Merge pull request #24 from dmitriimaksimovdevelop/release-test/issue…

19aa128

…-23-plus-perf release: issue #23 tcpretrans fix + low-latency perf (PR #21)

dmitriimaksimovdevelop merged commit 9b10ed1 into master Apr 15, 2026
1 check passed

dmitriimaksimovdevelop deleted the perf/low-latency-optimizations branch April 15, 2026 11:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: low-latency optimizations for hot paths and memory allocation#21

perf: low-latency optimizations for hot paths and memory allocation#21
dmitriimaksimovdevelop merged 2 commits into
masterfrom
perf/low-latency-optimizations

dmitriimaksimovdevelop commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dmitriimaksimovdevelop commented Apr 7, 2026

Summary

Benchmark Results (Apple M2 Pro)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant