open-feature-forking · aepfli · Feb 12, 2026 · Feb 10, 2026
diff --git a/README.md b/README.md
@@ -566,6 +566,8 @@ Each evaluator instance maintains its own flag configuration state and validatio
 |----------|-----------|-------------|
 | `update_state` | `(config_ptr, config_len) -> u64` | Updates the feature flag configuration state |
 | `evaluate` | `(flag_key_ptr, flag_key_len, context_ptr, context_len) -> u64` | Evaluates a feature flag against context (generic) |
+| `evaluate_reusable` | `(flag_key_ptr, flag_key_len, context_ptr, context_len) -> u64` | Like `evaluate` but does not deallocate input buffers (caller manages memory) |
+| `evaluate_by_index` | `(flag_index, context_ptr, context_len) -> u64` | Evaluates a flag by numeric index (avoids flag key string overhead). Does not deallocate input buffers. |
 | `evaluate_boolean` | `(flag_key_ptr, flag_key_len, context_ptr, context_len) -> u64` | Evaluates a boolean flag with type checking |
 | `evaluate_string` | `(flag_key_ptr, flag_key_len, context_ptr, context_len) -> u64` | Evaluates a string flag with type checking |
 | `evaluate_integer` | `(flag_key_ptr, flag_key_len, context_ptr, context_len) -> u64` | Evaluates an integer flag with type checking |
@@ -615,7 +617,18 @@ The configuration should follow the [flagd flag definition schema](https://flagd
 // Success
 {
   "success": true,
-  "error": null
+  "error": null,
+  "changedFlags": ["flag-a", "flag-b"],
+  "preEvaluated": {
+    "static-flag": {"value": true, "variant": "on", "reason": "STATIC"}
+  },
+  "requiredContextKeys": {
+    "targeted-flag": ["email", "targetingKey"]
+  },
+  "flagIndices": {
+    "static-flag": 0,
+    "targeted-flag": 1
+  }
 }
 
 // Error
@@ -787,6 +800,24 @@ flowchart TD
     HostDealloc --> End([Evaluation complete])
 ```
 
+### evaluate_by_index
+
+Evaluates a flag using a numeric index instead of a string key. This is an optimization that avoids flag key string serialization and uses O(1) Vec lookup instead of HashMap lookup on the Rust side.
+
+**Parameters:**
+- `flag_index` (u32): Numeric index of the flag (from `flagIndices` in the `update_state` response)
+- `context_ptr` (u32): Pointer to the evaluation context JSON string in WASM memory
+- `context_len` (u32): Length of the context JSON string
+
+**Returns:**
+- `u64`: Packed pointer where upper 32 bits = result pointer, lower 32 bits = result length
+
+**Important:**
+- Does NOT deallocate input buffers — the caller must free `context_ptr` after reading the result.
+- Expects the context to already include `$flagd` enrichment and `targetingKey` (the host should add these before calling). If `$flagd` is present in the context, WASM-side enrichment is skipped.
+- Flag indices are assigned in sorted (alphabetical) order during `update_state()` and are stable until the next `update_state()` call.
+- If the index is out of bounds, returns a `FLAG_NOT_FOUND` error.
+
 ### Context Enrichment
 
 The evaluator automatically enriches the evaluation context with standard `$flagd` properties according to the [flagd provider specification](https://flagd.dev/reference/specifications/providers/#in-process-resolver). These properties are available in targeting rules via JSON Logic's `var` operator.
@@ -1237,16 +1268,30 @@ int len = (int) (packedResult & 0xFFFFFFFFL);
 
 | Metric | Target | Notes |
 |--------|--------|-------|
-| WASM Size | ~1.5MB | Full JSON Logic implementation with 50+ operators |
-| Evaluation Time | < 1ms | For simple rules with small data |
+| WASM Size | ~2.4MB | Full JSON Logic implementation with 50+ operators |
+| Static flag evaluation | < 0.1 µs | Pre-evaluated, no WASM call |
+| Targeting flag evaluation | < 15 µs | With context key filtering (1000+ attribute context) |
 | Memory Overhead | Minimal | Only allocates what's needed for inputs and outputs |
 
-### Optimization Tips
+### Host-Side Optimizations
+
+The `update_state` response includes metadata that host implementations can use to optimize evaluation:
+
+1. **Pre-evaluated results** (`preEvaluated`): Static and disabled flags are fully resolved at config-load time. The host caches these and returns them without any WASM call.
+
+2. **Required context keys** (`requiredContextKeys`): Per-flag list of context fields the targeting rule references. The host serializes only those fields instead of the entire context. A 1000-attribute context where the rule uses 2 fields shrinks from ~50KB to ~200 bytes.
+
+3. **Flag indices** (`flagIndices`): Numeric index per flag for `evaluate_by_index`. Avoids flag key string serialization across the WASM boundary and uses O(1) Vec lookup on the Rust side.
+
+4. **Host-side enrichment**: When using `evaluate_by_index`, the host adds `$flagd.flagKey`, `$flagd.timestamp`, and `targetingKey` to the context before serialization, skipping the WASM-side `enrich_context()` clone.
+
+### General Tips
 
 1. **Reuse the WASM instance** - Instantiation is expensive; reuse the instance for multiple evaluations
-2. **Batch evaluations** - If evaluating many rules, consider batching
-3. **Keep data small** - Only include necessary data in the context
-4. **Use wasm-opt** - The release workflow uses wasm-opt for additional optimization
+2. **Use `evaluate_by_index`** - Avoids flag key string overhead across the WASM boundary
+3. **Filter context on the host** - Use `requiredContextKeys` to serialize only needed fields
+4. **Cache pre-evaluated flags** - Static/disabled flags never need a WASM call
+5. **Use wasm-opt** - The release workflow uses wasm-opt for additional optimization
 
 ## Building from Source
 

diff --git a/java/README.md b/java/README.md
@@ -15,6 +15,8 @@ This library provides a standalone Java artifact that bundles the flagd-evaluato
 - ✅ **JIT compiled** - Uses Chicory's JIT compiler for performance
 - ✅ **Full feature support** - All flagd evaluation features including targeting rules
 - ✅ **Performance benchmarks** - JMH benchmarks for tracking performance over time
+- ✅ **Context key filtering** - Only serializes context fields referenced by targeting rules
+- ✅ **Index-based evaluation** - Numeric flag indices avoid string key overhead across WASM boundary
 
 ## Installation
 
@@ -188,8 +190,9 @@ Main class for flag evaluation.
 
 #### Methods
 
-- `UpdateStateResult updateState(String jsonConfig)` - Updates flag configuration
-- `<T> EvaluationResult<T> evaluateFlag(Class<T> type, String flagKey, String contextJson)` - Type-safe flag evaluation with JSON context
+- `UpdateStateResult updateState(String jsonConfig)` - Updates flag configuration. Returns changed flags, pre-evaluated results, required context keys per flag, and flag indices.
+- `<T> EvaluationResult<T> evaluateFlag(Class<T> type, String flagKey, EvaluationContext context)` - Type-safe flag evaluation with OpenFeature context (recommended). Automatically applies context key filtering and index-based evaluation when available.
+- `<T> EvaluationResult<T> evaluateFlag(Class<T> type, String flagKey, String contextJson)` - Type-safe flag evaluation with pre-serialized JSON context
 - `<T> EvaluationResult<T> evaluateFlag(Class<T> type, String flagKey, Map<String, Object> context)` - Type-safe flag evaluation with Map context
 
 **Supported Types:**
@@ -222,6 +225,9 @@ Contains the result of updating flag state.
 - `boolean isSuccess()` - Whether the update succeeded
 - `String getError()` - Error message if update failed
 - `List<String> getChangedFlags()` - List of changed flag keys
+- `Map<String, EvaluationResult<Object>> getPreEvaluated()` - Pre-evaluated results for static/disabled flags (cached on Java side)
+- `Map<String, List<String>> getRequiredContextKeys()` - Per-flag context keys needed by targeting rules (for context filtering)
+- `Map<String, Integer> getFlagIndices()` - Flag key to numeric index mapping (for `evaluate_by_index`)
 
 ## Building from Source
 
@@ -264,75 +270,62 @@ At runtime:
 - Chicory JIT compiles the WASM to optimized bytecode
 - Custom Jackson serializers handle OpenFeature SDK types (`ImmutableMetadata`, `LayeredEvaluationContext`)
 - Each `FlagEvaluator` instance creates its own WASM instance
+- `updateState()` populates three caches: pre-evaluated results, required context keys per flag, and flag index mappings
+- `evaluateFlag()` checks the pre-evaluated cache first, then applies context key filtering and index-based WASM evaluation for targeting flags
 - Type-safe evaluation returns `EvaluationResult<T>` with compile-time type checking
-- Evaluations are synchronized for thread safety
+- All evaluation and state update operations are synchronized for thread safety
 
 ## Performance
 
 - **Startup**: WASM module compiled once during class loading (~100ms)
 - **Memory**: ~3MB for WASM module + Chicory runtime
-- **Static flags**: Near-zero cost via pre-evaluation cache (see below)
+- **Static flags**: ~0.02 µs via pre-evaluation cache (no WASM call)
+- **Targeting flags**: ~12.8 µs with context key filtering (1000+ attribute context)
 
-### Pre-evaluation Cache (Issue #60)
+### Optimization Pipeline
 
-Static flags (no targeting rules) and disabled flags are pre-evaluated during `updateState()`. Their results are cached on the Java side, so `evaluateFlag()` returns instantly without crossing the WASM boundary. This eliminates the ~4.4µs WASM overhead for the most common flag types.
+The evaluator applies three optimizations automatically during `evaluateFlag()`:
 
-### WASM vs Native JsonLogic Comparison
+1. **Pre-evaluation cache**: Static flags (no targeting rules) and disabled flags are pre-evaluated during `updateState()` and cached on the Java side. `evaluateFlag()` returns instantly without crossing the WASM boundary.
 
-JMH benchmark comparing this WASM-based evaluator against a native Java JsonLogic implementation (`json-logic-java`):
+2. **Context key filtering**: During `updateState()`, the WASM module walks each flag's compiled targeting tree to extract which context fields the rule references (e.g., `{"var": "email"}` -> `email`). When evaluating with an `EvaluationContext`, only those fields are serialized — a 1000-attribute context where the rule uses 2 fields shrinks from ~50KB JSON to ~200 bytes.
 
-| Scenario | Native JsonLogic | WASM Evaluator | Ratio |
-|---|---|---|---|
-| **Simple flag (no targeting)** | 0.022 µs/op | 4.41 µs/op | ~200x |
-| **Targeting match** | 7.85 µs/op | 26.29 µs/op | ~3.4x |
-| **Targeting no-match** | 3.55 µs/op | 15.21 µs/op | ~4x |
+3. **Index-based evaluation**: Each flag is assigned a stable numeric index during `updateState()`. The WASM `evaluate_by_index(u32, ...)` export avoids flag key string serialization and uses O(1) Vec lookup instead of HashMap lookup on the Rust side.
 
-> **Note**: Simple flags now bypass WASM entirely via the pre-evaluation cache, effectively matching native performance.
+Context enrichment (`$flagd.flagKey`, `$flagd.timestamp`, `targetingKey`) is also moved to the Java side, eliminating an allocation + clone inside the WASM module.
 
-**Context size impact** (targeting evaluation with varying context sizes):
+### WASM Evaluator vs Native JsonLogic
 
-| Context Size | Native JsonLogic | WASM Evaluator | Ratio |
-|---|---|---|---|
-| Empty | 3.55 µs/op | 15.21 µs/op | ~4x |
-| Small (5 attributes) | 6.34 µs/op | 27.10 µs/op | ~4x |
-| Large (100+ attributes) | 24.02 µs/op | 166.72 µs/op | ~7x |
+JMH benchmark (`ResolverComparisonBenchmark`) comparing this WASM-based evaluator against a native Java JsonLogic implementation (`json-logic-java`) with a `LayeredEvaluationContext` containing 1000+ attributes:
 
-The WASM overhead comes from JSON serialization across the WASM boundary. For targeting rules with large contexts, serialization dominates the cost.
+| Scenario | Native JsonLogic | WASM Evaluator | Speedup |
+|---|---|---|---|
+| **Simple flag** (no targeting) | 0.023 µs/op | 0.020 µs/op | ~same (both cached) |
+| **Targeting match** (1000+ attrs) | 409.3 µs/op | 12.8 µs/op | **32x faster** |
+| **Targeting no-match** (small ctx) | 4.4 µs/op | 12.0 µs/op | 0.4x |
+| **Many evals** (x1000, 1000+ attrs) | 408.5 µs/op | 11.9 µs/op | **34x faster** |
 
-### Benchmarks
+Key observations:
+- **Simple/static flags** are served from the Java-side cache at ~0.02 µs — no WASM call at all.
+- **Targeting flags with large contexts** benefit most from context key filtering. The old JsonLogic resolver must iterate all 1000+ attributes on every evaluation, while the WASM evaluator only serializes the 2-3 fields the rule actually uses.
+- **Small contexts** (targeting no-match row) show the WASM overhead more clearly — the 12 µs includes the WASM boundary crossing cost. For small contexts, the native resolver is faster since there's little serialization to save.
 
-The library includes JMH (Java Microbenchmark Harness) benchmarks for performance tracking:
+### Running Benchmarks
 
 ```bash
-# Run comparison benchmark (WASM vs native JsonLogic)
-./mvnw exec:java@run-jmh-benchmark -Dbenchmark=ResolverComparisonBenchmark
-
-# Run evaluator benchmarks
-./mvnw exec:java@run-jmh-benchmark
-```
-
-**Evaluator Benchmark Results** (example from development machine):
-```
-Benchmark                                              Mode  Cnt       Score        Error  Units
-FlagEvaluatorJmhBenchmark.evaluateWithLayeredContext  thrpt    5   13035.383 ±   4173.375  ops/s
-FlagEvaluatorJmhBenchmark.evaluateWithSimpleContext   thrpt    5   14748.099 ±   2689.011  ops/s
-FlagEvaluatorJmhBenchmark.serializeLayeredContext     thrpt    5  222863.374 ± 151002.720  ops/s
-```
+# Build the JMH fat JAR
+cd java
+./mvnw clean package
 
-**Benchmark Scenarios:**
-- **evaluateWithLayeredContext**: Full flag evaluation with 4-layer context (API, Transaction, Client, Invocation) and 100+ entries per layer
-- **evaluateWithSimpleContext**: Baseline evaluation with minimal context
-- **serializeLayeredContext**: JSON serialization overhead measurement
+# Run the old-vs-new comparison benchmark
+java -jar target/benchmarks.jar ResolverComparisonBenchmark
 
-To run with GC profiling:
-```bash
-./mvnw exec:java -Dexec.classpathScope=test -Dexec.mainClass=org.openjdk.jmh.Main \
-  -Dexec.args="FlagEvaluatorJmhBenchmark -prof gc -f 0"
+# Run the evaluator benchmarks (layered context, simple context, serialization)
+java -jar target/benchmarks.jar FlagEvaluatorJmhBenchmark
 ```
 
 The JUnit-based benchmark test suite is also available:
 ```bash
-# Run performance benchmark tests
 ./mvnw test -Dtest=FlagEvaluatorBenchmarkTest
 ```
 
@@ -342,7 +335,6 @@ The JUnit-based benchmark test suite is also available:
 
 ## Future Improvements
 
-- **AOT Compilation**: When Chicory supports AOT, compile WASM → Java at build time for better performance
 - **Async API**: Non-blocking evaluation methods
 - **Streaming Updates**: Support for flag configuration streams