Skip to content

bench(rust): add criterion.rs benchmarks for evaluation, operators, and state management#66

Merged
aepfli merged 2 commits into
mainfrom
bench/rust-criterion
Feb 10, 2026
Merged

bench(rust): add criterion.rs benchmarks for evaluation, operators, and state management#66
aepfli merged 2 commits into
mainfrom
bench/rust-criterion

Conversation

@aepfli
Copy link
Copy Markdown
Contributor

@aepfli aepfli commented Feb 10, 2026

Summary

  • Add criterion.rs as dev dependency with HTML report generation
  • Create 3 benchmark suites covering the core Rust evaluation engine
  • All benchmarks compile and run via cargo bench

Benchmark Suites

benches/evaluation.rs — Flag Evaluation (8 benchmarks)

Benchmark What it measures
evaluate_flag_simple Boolean flag, no targeting
evaluate_flag_targeting_match Targeting rule that matches
evaluate_flag_targeting_no_match Targeting rule, default path
evaluate_flag_complex_targeting Nested and/or with multiple conditions
evaluate_flag_disabled Disabled flag (early return)
evaluate_flag_not_found Missing flag key (error path)
evaluate_logic_simple Direct JSON Logic {==: [1,1]}
evaluate_logic_complex Complex rule with custom operators

benches/operators.rs — Custom Operators (8 benchmarks)

Benchmark What it measures
fractional/buckets/2 A/B test bucketing
fractional/buckets/4 4-way bucketing
fractional/buckets/8 8-way bucketing
semver_equals Version equality
semver_range/caret Caret range matching
semver_range/tilde Tilde range matching
semver_range/gte_prerelease Greater-than-or-equal with prerelease
starts_with / ends_with String prefix/suffix matching

benches/state.rs — State Management (5 benchmarks)

Benchmark What it measures
update_state/flags/5 Small config load
update_state/flags/50 Medium config load
update_state/flags/200 Large config load
update_state_no_change Re-apply identical config
update_state_incremental Change 1 flag in 100-flag config

How to run

# Run all benchmarks
cargo bench

# Run specific suite
cargo bench --bench evaluation
cargo bench --bench operators
cargo bench --bench state

# Quick run (fewer iterations)
cargo bench -- --quick

# HTML reports generated in target/criterion/

Test plan

  • All 3 benchmark suites compile
  • cargo bench runs successfully
  • cargo fmt and cargo clippy pass
  • Review HTML reports in target/criterion/

Closes #63

🤖 Generated with Claude Code

…nd state management

Add comprehensive criterion.rs benchmarks to measure raw Rust evaluation
performance, providing a baseline for optimization work.

Benchmark suites:
- evaluation: flag resolution (static, targeting match/no-match, complex
  targeting, disabled, not found) and direct JSON Logic evaluation
- operators: fractional bucketing (2/4/8 buckets), semver comparison
  (equals, caret, tilde, gte with prerelease), starts_with, ends_with
- state: update_state at varying sizes (5/50/200 flags), no-change
  detection, and incremental single-flag updates

Closes #63

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…hmarks

Align Rust criterion benchmarks with the standardized BENCHMARKS.md matrix:

- evaluation.rs: Add E2/E3/E5/E7 context size variation benchmarks
  (small 5-attr and large 100+ attr contexts)
- concurrency.rs: Add C1-C6 multi-threaded benchmarks (1/4/8 threads,
  targeting, mixed workload, read/write contention)
- comparison.rs: Add X1-X2 DataLogic vs FlagEvaluator overhead benchmarks
- Cargo.toml: Register new bench targets

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
aepfli added a commit that referenced this pull request Feb 10, 2026
…#67)

## Summary
- Adds `BENCHMARKS.md` defining a consistent benchmark specification
across Rust, Java, and Python
- Covers evaluation scenarios (E1-E11), custom operators (O1-O6), state
management (S1-S5), concurrency (C1-C6), and old-vs-new comparison
(X1-X3)
- Standardizes context shapes and flag definitions for direct
cross-language comparison

## Context
The benchmark PRs (#64, #65, #66) implement subsets of this matrix. This
document serves as the reference spec so all three languages converge on
the same scenarios and can be compared meaningfully.

## Test plan
- [ ] Review benchmark IDs and scenarios for completeness
- [ ] Verify context/flag definitions match what implementations use

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
@aepfli aepfli merged commit 8a44c79 into main Feb 10, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bench(rust): add criterion.rs benchmarks for core evaluation engine

1 participant