bench(rust): add criterion.rs benchmarks for evaluation, operators, and state management by aepfli · Pull Request #66 · open-feature-forking/flagd-evaluator

aepfli · 2026-02-10T10:54:11Z

Summary

Add criterion.rs as dev dependency with HTML report generation
Create 3 benchmark suites covering the core Rust evaluation engine
All benchmarks compile and run via cargo bench

Benchmark Suites

`benches/evaluation.rs` — Flag Evaluation (8 benchmarks)

Benchmark	What it measures
`evaluate_flag_simple`	Boolean flag, no targeting
`evaluate_flag_targeting_match`	Targeting rule that matches
`evaluate_flag_targeting_no_match`	Targeting rule, default path
`evaluate_flag_complex_targeting`	Nested and/or with multiple conditions
`evaluate_flag_disabled`	Disabled flag (early return)
`evaluate_flag_not_found`	Missing flag key (error path)
`evaluate_logic_simple`	Direct JSON Logic `{==: [1,1]}`
`evaluate_logic_complex`	Complex rule with custom operators

`benches/operators.rs` — Custom Operators (8 benchmarks)

Benchmark	What it measures
`fractional/buckets/2`	A/B test bucketing
`fractional/buckets/4`	4-way bucketing
`fractional/buckets/8`	8-way bucketing
`semver_equals`	Version equality
`semver_range/caret`	Caret range matching
`semver_range/tilde`	Tilde range matching
`semver_range/gte_prerelease`	Greater-than-or-equal with prerelease
`starts_with` / `ends_with`	String prefix/suffix matching

`benches/state.rs` — State Management (5 benchmarks)

Benchmark	What it measures
`update_state/flags/5`	Small config load
`update_state/flags/50`	Medium config load
`update_state/flags/200`	Large config load
`update_state_no_change`	Re-apply identical config
`update_state_incremental`	Change 1 flag in 100-flag config

How to run

# Run all benchmarks
cargo bench

# Run specific suite
cargo bench --bench evaluation
cargo bench --bench operators
cargo bench --bench state

# Quick run (fewer iterations)
cargo bench -- --quick

# HTML reports generated in target/criterion/

Test plan

All 3 benchmark suites compile
cargo bench runs successfully
cargo fmt and cargo clippy pass
Review HTML reports in target/criterion/

Closes #63

🤖 Generated with Claude Code

…nd state management Add comprehensive criterion.rs benchmarks to measure raw Rust evaluation performance, providing a baseline for optimization work. Benchmark suites: - evaluation: flag resolution (static, targeting match/no-match, complex targeting, disabled, not found) and direct JSON Logic evaluation - operators: fractional bucketing (2/4/8 buckets), semver comparison (equals, caret, tilde, gte with prerelease), starts_with, ends_with - state: update_state at varying sizes (5/50/200 flags), no-change detection, and incremental single-flag updates Closes #63 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…hmarks Align Rust criterion benchmarks with the standardized BENCHMARKS.md matrix: - evaluation.rs: Add E2/E3/E5/E7 context size variation benchmarks (small 5-attr and large 100+ attr contexts) - concurrency.rs: Add C1-C6 multi-threaded benchmarks (1/4/8 threads, targeting, mixed workload, read/write contention) - comparison.rs: Add X1-X2 DataLogic vs FlagEvaluator overhead benchmarks - Cargo.toml: Register new bench targets Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…#67) ## Summary - Adds `BENCHMARKS.md` defining a consistent benchmark specification across Rust, Java, and Python - Covers evaluation scenarios (E1-E11), custom operators (O1-O6), state management (S1-S5), concurrency (C1-C6), and old-vs-new comparison (X1-X3) - Standardizes context shapes and flag definitions for direct cross-language comparison ## Context The benchmark PRs (#64, #65, #66) implement subsets of this matrix. This document serves as the reference spec so all three languages converge on the same scenarios and can be compared meaningfully. ## Test plan - [ ] Review benchmark IDs and scenarios for completeness - [ ] Verify context/flag definitions match what implementations use 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

aepfli mentioned this pull request Feb 10, 2026

docs: add standardized benchmark matrix for cross-language comparison #67

Merged

2 tasks

aepfli merged commit 8a44c79 into main Feb 10, 2026
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench(rust): add criterion.rs benchmarks for evaluation, operators, and state management#66

bench(rust): add criterion.rs benchmarks for evaluation, operators, and state management#66
aepfli merged 2 commits into
mainfrom
bench/rust-criterion

aepfli commented Feb 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aepfli commented Feb 10, 2026

Summary

Benchmark Suites

benches/evaluation.rs — Flag Evaluation (8 benchmarks)

benches/operators.rs — Custom Operators (8 benchmarks)

benches/state.rs — State Management (5 benchmarks)

How to run

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`benches/evaluation.rs` — Flag Evaluation (8 benchmarks)

`benches/operators.rs` — Custom Operators (8 benchmarks)

`benches/state.rs` — State Management (5 benchmarks)