Skip to content

perf(python): add host-side optimizations for pre-evaluation, context filtering, and index-based eval#72

Merged
aepfli merged 5 commits into
mainfrom
feat/python-normalization
Feb 12, 2026
Merged

perf(python): add host-side optimizations for pre-evaluation, context filtering, and index-based eval#72
aepfli merged 5 commits into
mainfrom
feat/python-normalization

Conversation

@aepfli
Copy link
Copy Markdown
Contributor

@aepfli aepfli commented Feb 10, 2026

Summary

  • Normalize the Python PyO3 bindings to support the same 3 host-side optimizations as Java and Go
  • Pre-evaluation cache: static/disabled flags served directly from Python dict (~0.3 µs) without calling into Rust evaluator
  • Context key filtering: only serialize keys referenced by targeting rules, plus $flagd enrichment and targetingKey
  • Index-based evaluation: evaluate_flag_by_index() with O(1) Vec lookup when both index and required keys are available

Benchmark Highlights

Scenario Time (µs) Ops/sec
Static flag (pre-eval cache) 0.3 2,918,057
Targeting match (small ctx) 1.4 700,497
Targeting (large 100+ ctx) 22.6 44,213
Complex targeting (small ctx) 2.8 359,245
Disabled flag (cache) 0.8 1,292,441

vs native json-logic-utils (used by current flagd Python provider):

Scenario PyO3 native json-logic-utils Speedup
Simple targeting 1.4 µs 2.6 µs 1.9x
Simple + small ctx 1.2 µs 3.8 µs 3.2x
Simple + large ctx 21.0 µs 47.7 µs 2.3x
Complex targeting 2.8 µs 10.4 µs 3.7x

Changes

File Description
python/src/lib.rs Add 3 caches to FlagEvaluator struct, optimized evaluation pipeline
python/tests/test_optimizations.py 15 new tests covering all optimization paths

Test plan

  • All 30 Python tests pass (15 original + 15 new)
  • 30/30 benchmarks pass (4 panzi comparison skipped — optional dep)
  • 134/134 Rust unit tests pass
  • 4/4 Gherkin tests pass
  • cargo fmt and cargo clippy -- -D warnings clean

🤖 Generated with Claude Code

aepfli and others added 5 commits February 10, 2026 19:41
…uation

Reduce targeting flag evaluation latency for large contexts by avoiding
unnecessary data transfer across the WASM boundary.

Rust side:
- Walk compiled targeting trees to extract referenced context keys
  (e.g. {"var": "email"} -> "email") during update_state()
- Return per-flag requiredContextKeys and flagIndices in UpdateStateResponse
- Add evaluate_by_index(u32, ctx_ptr, ctx_len) WASM export for O(1) flag
  lookup by numeric index instead of string key HashMap lookup
- Add evaluate_flag_pre_enriched() that skips context enrichment when
  $flagd is already present (host-side enrichment)

Java side:
- Cache requiredContextKeys and flagIndices from updateState() response
- EvaluationContextSerializer.serializeFiltered() serializes only the
  context keys a targeting rule references, plus $flagd enrichment
- evaluateFlag(EvaluationContext) uses filtered serialization + index-based
  eval when available, falls back to full serialization gracefully
- evaluateByIndex() calls the new WASM export with O(1) Vec lookup

JMH results (1000+ attribute LayeredEvaluationContext):
- Targeting flags: ~12.8 µs (down from ~167 µs) — 13x improvement
- vs old json-logic-java: 32-34x faster (409 µs -> 12.8 µs)
- Static/disabled flags: ~0.02 µs (pre-evaluated cache, unchanged)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… filtering, and index-based eval

Normalize the Python PyO3 bindings to support the same 3 host-side
optimizations as Java and Go:

1. Pre-evaluated cache: after update_state(), cache results for
   static/disabled flags and return them directly without calling
   into the Rust evaluator.

2. Context key filtering: store requiredContextKeys per flag and
   build a filtered context containing only the keys referenced
   by the targeting rule, plus $flagd enrichment and targetingKey.

3. Index-based evaluation: store flagIndices and call
   evaluate_flag_by_index() when both index and required keys are
   available, using O(1) Vec lookup instead of HashMap.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The WASM tests use a thread-local singleton evaluator. Parallel test
execution causes race conditions where one test's update_state overwrites
another test's state, producing intermittent failures like
test_wasm_evaluate_by_index getting "Static" instead of "TargetingMatch".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Gherkin tests (cucumber) don't support --test-threads flag. Run lib and
integration tests with --test-threads=1 for WASM singleton safety, and
gherkin tests separately without the flag.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@aepfli aepfli merged commit 2b7a0b9 into main Feb 12, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant