Skip to content

bench(python): add pytest-benchmark suite with evaluation and operator benchmarks#65

Merged
aepfli merged 3 commits into
mainfrom
bench/python-pytest
Feb 10, 2026
Merged

bench(python): add pytest-benchmark suite with evaluation and operator benchmarks#65
aepfli merged 3 commits into
mainfrom
bench/python-pytest

Conversation

@aepfli
Copy link
Copy Markdown
Contributor

@aepfli aepfli commented Feb 10, 2026

Summary

  • Replace manual time.time() benchmark with proper pytest-benchmark suite
  • Add 19 benchmark scenarios covering evaluation, custom operators, state management, and concurrency
  • Add optional comparison against pure-Python JSON Logic library
  • Add pytest-benchmark>=4.0 to dev dependencies

Benchmark Scenarios

Category Benchmarks What they measure
Evaluation bool simple, targeting match/no-match, string, int, float, object, large context Flag evaluation across types and context sizes
Custom Operators fractional, semver, starts_with, ends_with Individual operator performance
State Management 5/50/200 flags, no-change re-apply update_state() scaling
Concurrent 4-thread evaluation Thread safety and contention
Comparison vs pure-Python json-logic (optional) Native PyO3 vs alternative

How to run

cd python
uv sync --group dev
maturin develop
pytest benchmarks/ --benchmark-only --benchmark-disable-gc -v

Initial Results

  • Boolean simple: ~714ns/call (~1.4M ops/sec)
  • Targeting match: ~1.7us/call (~594K ops/sec)
  • Custom operators: ~1.3-4.6us/call
  • State updates: ~24us (5 flags) to ~1ms (200 flags)

Test plan

  • All 18 benchmarks pass (1 skipped - optional comparison library)
  • maturin develop + pytest benchmarks/ succeeds
  • Verify results are stable across runs

Closes #62

🤖 Generated with Claude Code

…r benchmarks

Replace the manual time.time() benchmark with a proper pytest-benchmark suite
covering flag evaluation (all types), custom operators (fractional, semver,
starts_with, ends_with), state management at different scales, and concurrent
evaluation. Includes optional comparison against pure-Python JSON Logic.

Closes #62

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… comparisons

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
aepfli added a commit that referenced this pull request Feb 10, 2026
…#67)

## Summary
- Adds `BENCHMARKS.md` defining a consistent benchmark specification
across Rust, Java, and Python
- Covers evaluation scenarios (E1-E11), custom operators (O1-O6), state
management (S1-S5), concurrency (C1-C6), and old-vs-new comparison
(X1-X3)
- Standardizes context shapes and flag definitions for direct
cross-language comparison

## Context
The benchmark PRs (#64, #65, #66) implement subsets of this matrix. This
document serves as the reference spec so all three languages converge on
the same scenarios and can be compared meaningfully.

## Test plan
- [ ] Review benchmark IDs and scenarios for completeness
- [ ] Verify context/flag definitions match what implementations use

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Replace non-existent json_logic_utils with panzi-json-logic, which is
the actual JSON Logic library used by the flagd Python provider
(openfeature-provider-flagd).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@aepfli aepfli merged commit 3e059c9 into main Feb 10, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bench(python): add proper benchmarks with pytest-benchmark and comparison to alternatives

1 participant