bench(python): add pytest-benchmark suite with evaluation and operator benchmarks by aepfli · Pull Request #65 · open-feature-forking/flagd-evaluator

aepfli · 2026-02-10T10:53:59Z

Summary

Replace manual time.time() benchmark with proper pytest-benchmark suite
Add 19 benchmark scenarios covering evaluation, custom operators, state management, and concurrency
Add optional comparison against pure-Python JSON Logic library
Add pytest-benchmark>=4.0 to dev dependencies

Benchmark Scenarios

Category	Benchmarks	What they measure
Evaluation	bool simple, targeting match/no-match, string, int, float, object, large context	Flag evaluation across types and context sizes
Custom Operators	fractional, semver, starts_with, ends_with	Individual operator performance
State Management	5/50/200 flags, no-change re-apply	`update_state()` scaling
Concurrent	4-thread evaluation	Thread safety and contention
Comparison	vs pure-Python json-logic (optional)	Native PyO3 vs alternative

How to run

cd python
uv sync --group dev
maturin develop
pytest benchmarks/ --benchmark-only --benchmark-disable-gc -v

Initial Results

Boolean simple: ~714ns/call (~1.4M ops/sec)
Targeting match: ~1.7us/call (~594K ops/sec)
Custom operators: ~1.3-4.6us/call
State updates: ~24us (5 flags) to ~1ms (200 flags)

Test plan

All 18 benchmarks pass (1 skipped - optional comparison library)
maturin develop + pytest benchmarks/ succeeds
Verify results are stable across runs

Closes #62

🤖 Generated with Claude Code

…r benchmarks Replace the manual time.time() benchmark with a proper pytest-benchmark suite covering flag evaluation (all types), custom operators (fractional, semver, starts_with, ends_with), state management at different scales, and concurrent evaluation. Includes optional comparison against pure-Python JSON Logic. Closes #62 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… comparisons Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…#67) ## Summary - Adds `BENCHMARKS.md` defining a consistent benchmark specification across Rust, Java, and Python - Covers evaluation scenarios (E1-E11), custom operators (O1-O6), state management (S1-S5), concurrency (C1-C6), and old-vs-new comparison (X1-X3) - Standardizes context shapes and flag definitions for direct cross-language comparison ## Context The benchmark PRs (#64, #65, #66) implement subsets of this matrix. This document serves as the reference spec so all three languages converge on the same scenarios and can be compared meaningfully. ## Test plan - [ ] Review benchmark IDs and scenarios for completeness - [ ] Verify context/flag definitions match what implementations use 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

Replace non-existent json_logic_utils with panzi-json-logic, which is the actual JSON Logic library used by the flagd Python provider (openfeature-provider-flagd). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

aepfli mentioned this pull request Feb 10, 2026

docs: add standardized benchmark matrix for cross-language comparison #67

Merged

2 tasks

bench(python): add context size variations, error paths, and enhanced…

6028fb3

… comparisons Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(bench): use panzi-json-logic for Python comparison benchmarks

5e478a3

Replace non-existent json_logic_utils with panzi-json-logic, which is the actual JSON Logic library used by the flagd Python provider (openfeature-provider-flagd). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

aepfli merged commit 3e059c9 into main Feb 10, 2026
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench(python): add pytest-benchmark suite with evaluation and operator benchmarks#65

bench(python): add pytest-benchmark suite with evaluation and operator benchmarks#65
aepfli merged 3 commits into
mainfrom
bench/python-pytest

aepfli commented Feb 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aepfli commented Feb 10, 2026

Summary

Benchmark Scenarios

How to run

Initial Results

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant