|
| 1 | +# Workflow Runtime Internals And Optimization Opportunities |
| 2 | + |
| 3 | +Date: 2026-03-11 |
| 4 | + |
| 5 | +## Goal |
| 6 | + |
| 7 | +Document how the `workflow-runtime` internals currently work, then identify practical optimization projects that can be prototyped and benchmarked. |
| 8 | + |
| 9 | +## Sources Used |
| 10 | + |
| 11 | +- Previous discussions on performance around `ActiveStagingList` identity checks and `forEachStaging` hot paths. |
| 12 | +- Runtime internals in `workflow-runtime`. |
| 13 | +- Event handler + `remember` implementation in `workflow-core`. |
| 14 | +- Existing tracing and benchmark infrastructure in `workflow-tracing` and `benchmarks`. |
| 15 | + |
| 16 | +## Runtime Architecture Today |
| 17 | + |
| 18 | +### 1. Entry Point And Loop Shape |
| 19 | + |
| 20 | +The low-level runtime entry point is `renderWorkflowIn` in `workflow-runtime/src/commonMain/kotlin/com/squareup/workflow1/RenderWorkflow.kt`. |
| 21 | + |
| 22 | +- It optionally wraps the scope dispatcher in `WorkStealingDispatcher` when `WORK_STEALING_DISPATCHER` is enabled (`RenderWorkflow.kt:154-162`). |
| 23 | +- It creates a `WorkflowRunner` and performs the first render pass synchronously before launching the loop coroutine (`RenderWorkflow.kt:163-191`). |
| 24 | +- The runtime loop then repeats: |
| 25 | + - wait for action or props update, |
| 26 | + - optionally drain exclusive actions, |
| 27 | + - render, |
| 28 | + - optionally conflate stale renderings, |
| 29 | + - emit rendering and output (`RenderWorkflow.kt:233-317`). |
| 30 | + |
| 31 | +### 2. WorkflowRunner Responsibilities |
| 32 | + |
| 33 | +`WorkflowRunner` coordinates props updates, tree action waiting, and render/snapshot passes (`workflow-runtime/src/commonMain/kotlin/com/squareup/workflow1/internal/WorkflowRunner.kt`). |
| 34 | + |
| 35 | +- Deduplicates initial props emission via `dropWhile { it == currentProps }` to avoid an immediate second render (`WorkflowRunner.kt:36-49`). |
| 36 | +- `nextRendering()` calls root `render` then `snapshot`, wrapped by interceptor hook `onRenderAndSnapshot` (`WorkflowRunner.kt:68-74`). |
| 37 | +- `awaitAndApplyAction()` uses `select` over props channel + root tree selectors (`WorkflowRunner.kt:83-90`). |
| 38 | + |
| 39 | +### 3. WorkflowNode As The Core State Machine Host |
| 40 | + |
| 41 | +`WorkflowNode` manages per-node state, rendering, action channels, dirty flags, side effects, remember cache, and child subtree manager (`workflow-runtime/src/commonMain/kotlin/com/squareup/workflow1/internal/WorkflowNode.kt`). |
| 42 | + |
| 43 | +Important fields and behavior: |
| 44 | + |
| 45 | +- Uses `SubtreeManager` for children (`WorkflowNode.kt:84-93`). |
| 46 | +- Uses `ActiveStagingList` for side effects and remembered values (`WorkflowNode.kt:94-96`). |
| 47 | +- Tracks dirty status with `selfStateDirty` and `subtreeStateDirty` for partial tree rendering (`WorkflowNode.kt:103-111`). |
| 48 | +- Re-renders only when needed if `PARTIAL_TREE_RENDERING` is enabled (`WorkflowNode.kt:312-316`). |
| 49 | +- Commit phase after each render: |
| 50 | + - `subtreeManager.commitRenderedChildren()` |
| 51 | + - start staged side effect jobs |
| 52 | + - cancel obsolete side effects |
| 53 | + - commit remembered entries (`WorkflowNode.kt:323-331`). |
| 54 | + |
| 55 | +### 4. Child Reconciliation Model |
| 56 | + |
| 57 | +`SubtreeManager` implements child rendering/reuse/teardown (`workflow-runtime/src/commonMain/kotlin/com/squareup/workflow1/internal/SubtreeManager.kt`). |
| 58 | + |
| 59 | +- Child nodes are tracked with active and staging collections (`SubtreeManager.kt:32-78`). |
| 60 | +- On each `renderChild` call: |
| 61 | + - validate sibling key uniqueness by scanning staging (`forEachStaging`) (`SubtreeManager.kt:127-135`), |
| 62 | + - `retainOrCreate` child by searching active (`SubtreeManager.kt:138-143`), |
| 63 | + - update handler and render child (`SubtreeManager.kt:145-147`). |
| 64 | +- On commit, children left in old active are cancelled (`SubtreeManager.kt:110-119`). |
| 65 | + |
| 66 | +### 5. ActiveStagingList + InlineLinkedList |
| 67 | + |
| 68 | +`ActiveStagingList` is the dual-list abstraction used by children/side-effects/remembered values (`workflow-runtime/src/commonMain/kotlin/com/squareup/workflow1/internal/ActiveStagingList.kt`). |
| 69 | + |
| 70 | +- `retainOrCreate` does a linear `removeFirst(predicate)` from active, then appends to staging (`ActiveStagingList.kt:42-49`). |
| 71 | +- `commitStaging` calls `onRemove` for remaining active entries, swaps list references, clears new staging (`ActiveStagingList.kt:55-65`). |
| 72 | + |
| 73 | +`InlineLinkedList` is a custom intrusive singly-linked list (`workflow-runtime/src/commonMain/kotlin/com/squareup/workflow1/internal/InlineLinkedList.kt`). |
| 74 | + |
| 75 | +- Nodes carry their own `nextListNode` pointer. |
| 76 | +- Operations are minimal and allocation-light: append, iterate, remove-first-by-predicate, clear. |
| 77 | + |
| 78 | +### 6. Where Uniqueness Checks Happen |
| 79 | + |
| 80 | +The sibling/remember/side-effect duplicate checks are all linear scans over staging: |
| 81 | + |
| 82 | +- Child uniqueness in `SubtreeManager.render` (`SubtreeManager.kt:127-135`). |
| 83 | +- Side effect key uniqueness in `WorkflowNode.runningSideEffect` (`WorkflowNode.kt:175-183`). |
| 84 | +- Remember uniqueness (`key + resultType + inputs`) in `WorkflowNode.remember` (`WorkflowNode.kt:192-206`). |
| 85 | + |
| 86 | +This matches previous discussions on performance: repeated `forEachStaging` checks in hot render paths. |
| 87 | + |
| 88 | +### 7. EventHandler + remember Coupling |
| 89 | + |
| 90 | +Stable event handlers are implemented in `HandlerBox.kt` and route through `BaseRenderContext.remember` when `remember = true` (or when runtime enables stable handlers by default): |
| 91 | + |
| 92 | +- `eventHandler*` uses `remember(name, typeOf<...>())` to retrieve a stable handler box (`workflow-core/src/commonMain/kotlin/com/squareup/workflow1/HandlerBox.kt:10-413`). |
| 93 | +- `STABLE_EVENT_HANDLERS` controls default remember behavior (`workflow-core/src/commonMain/kotlin/com/squareup/workflow1/RuntimeConfig.kt:76-80`, `StatefulWorkflow.kt:167-178`). |
| 94 | + |
| 95 | +Implication: as stable handlers are used more heavily, the remembered staging identity path gets hotter. |
| 96 | + |
| 97 | +### 8. Tracing And Benchmarking Surface |
| 98 | + |
| 99 | +Existing observability/perf infrastructure is already strong: |
| 100 | + |
| 101 | +- `WorkflowRuntimeMonitor` tracks action causes and render pass behavior (`workflow-tracing/src/main/java/com/squareup/workflow1/tracing/WorkflowRuntimeMonitor.kt`). |
| 102 | +- `WorkflowRenderPassTracker` records render causes + durations (`workflow-tracing/src/main/java/com/squareup/workflow1/tracing/WorkflowRenderPassTracker.kt`). |
| 103 | +- `benchmarks/runtime-microbenchmark` has targeted runtime microbenchmarks for tree updates and state/props churn (`benchmarks/runtime-microbenchmark/src/androidTest/kotlin/com/squareup/benchmark/runtime/benchmark/WorkflowRuntimeMicrobenchmark.kt`). |
| 104 | +- `benchmarks/performance-poetry` includes integration-style render-pass efficiency checks (`RenderPassTest.kt`, `RenderPassCountingInterceptor.kt`). |
| 105 | + |
| 106 | +## Previous Performance Discussions (Summary) |
| 107 | + |
| 108 | +Previous discussions on performance centered on one specific hot section in child rendering: |
| 109 | + |
| 110 | +- Duplicate sibling key checking (`CheckingUniqueMatches`) scans staging children each `renderChild`. |
| 111 | +- Concern is that this pattern is also relevant to `remember`, which is hit by event handler creation. |
| 112 | +- Proposed direction in those discussions: treat identity as top-level abstraction and add set-backed lookup (potentially `LinkedHashSet` or sidecar set) while preserving ordering/reconciliation semantics. |
| 113 | + |
| 114 | +## Optimization Project Candidates |
| 115 | + |
| 116 | +### Project 1: Set-Backed Identity Index For Active/Staging |
| 117 | + |
| 118 | +Scope: |
| 119 | + |
| 120 | +- Keep `InlineLinkedList` for ordering and swap semantics. |
| 121 | +- Add optional sidecar identity indexes for active and/or staging (e.g. `MutableSet` / `MutableMap<Identity, Node>`). |
| 122 | + |
| 123 | +Targeted wins: |
| 124 | + |
| 125 | +- O(1)-ish duplicate detection for sibling keys/remember keys/side-effect keys. |
| 126 | +- O(1)-ish node lookup during `retainOrCreate` for indexed identities. |
| 127 | + |
| 128 | +Key files: |
| 129 | + |
| 130 | +- `workflow-runtime/.../ActiveStagingList.kt` |
| 131 | +- `workflow-runtime/.../SubtreeManager.kt` |
| 132 | +- `workflow-runtime/.../WorkflowNode.kt` |
| 133 | + |
| 134 | +Validation: |
| 135 | + |
| 136 | +- Extend runtime microbenchmarks with key-heavy sibling and remember-heavy scenarios. |
| 137 | +- Confirm no regressions in `ActiveStagingListTest`, `SubtreeManagerTest`, and `WorkflowNodeTest`. |
| 138 | + |
| 139 | +### Project 2: Insert-Time Uniqueness API |
| 140 | + |
| 141 | +Scope: |
| 142 | + |
| 143 | +- Move duplicate checking into insertion path (`retainOrCreate`-like API), removing separate pre-scan + insert phases. |
| 144 | + |
| 145 | +Targeted wins: |
| 146 | + |
| 147 | +- Remove one full staging traversal in child/remember/side-effect paths. |
| 148 | +- Simplify call-site logic and reduce repeated predicate work. |
| 149 | + |
| 150 | +Rationale from previous performance discussions: |
| 151 | + |
| 152 | +- The check and insertion happen adjacently today and can be unified. |
| 153 | + |
| 154 | +### Project 3: Adaptive Hybrid Collection (Small-N List, Larger-N Indexed) |
| 155 | + |
| 156 | +Scope: |
| 157 | + |
| 158 | +- Preserve current linear path for tiny sibling counts. |
| 159 | +- Promote to indexed mode once count exceeds threshold, demote when shrinking. |
| 160 | + |
| 161 | +Targeted wins: |
| 162 | + |
| 163 | +- Preserve low overhead for common single-digit sibling lists. |
| 164 | +- Protect against pathological larger sibling counts or high-frequency remember/eventHandler usage. |
| 165 | + |
| 166 | +### Project 4: Remember Identity Key Object Fast Path |
| 167 | + |
| 168 | +Scope: |
| 169 | + |
| 170 | +- Replace repeated tuple comparisons (`key`, `KType`, `inputs.contentEquals`) with a cached identity token or precomputed key object for remembered entries. |
| 171 | + |
| 172 | +Targeted wins: |
| 173 | + |
| 174 | +- Fewer repeated array comparisons in `WorkflowNode.remember`. |
| 175 | +- Better cache locality if identity is represented by compact key struct. |
| 176 | + |
| 177 | +Key files: |
| 178 | + |
| 179 | +- `workflow-runtime/.../RememberedNode.kt` |
| 180 | +- `workflow-runtime/.../WorkflowNode.kt` |
| 181 | +- `workflow-core/.../HandlerBox.kt` |
| 182 | + |
| 183 | +### Project 5: Dedicated Benchmarks For Uniqueness/Identity Hot Paths |
| 184 | + |
| 185 | +Scope: |
| 186 | + |
| 187 | +- Add microbenchmarks that intentionally stress: |
| 188 | + - high sibling counts with unique keys, |
| 189 | + - many remembered entries per render, |
| 190 | + - stable event handler heavy renders. |
| 191 | + |
| 192 | +Targeted wins: |
| 193 | + |
| 194 | +- Quantify tradeoffs of list vs set/indexed structures. |
| 195 | +- Provide objective gates for runtime changes before/after. |
| 196 | + |
| 197 | +Suggested location: |
| 198 | + |
| 199 | +- `benchmarks/runtime-microbenchmark/.../WorkflowRuntimeMicrobenchmark.kt` |
| 200 | + |
| 201 | +### Project 6: Runtime-Integrated Perf Counters For Internal Collection Ops |
| 202 | + |
| 203 | +Scope: |
| 204 | + |
| 205 | +- Add optional internal counters (debug-only or interceptor-backed) for: |
| 206 | + - active scan length, |
| 207 | + - staging uniqueness check counts, |
| 208 | + - retain hit/miss ratio. |
| 209 | + |
| 210 | +Targeted wins: |
| 211 | + |
| 212 | +- Faster diagnosis of real-world hot spots in production-like scenarios. |
| 213 | +- Better prioritization of which identity paths matter most. |
| 214 | + |
| 215 | +Potential integration: |
| 216 | + |
| 217 | +- `WorkflowRuntimeMonitor`/`WorkflowRuntimeTracer` reporting hooks. |
| 218 | + |
| 219 | +### Project 7: Evaluate Multiplatform Set Implementations For Runtime CommonMain |
| 220 | + |
| 221 | +Scope: |
| 222 | + |
| 223 | +- Evaluate candidate set/map implementations for KMP runtime internals. |
| 224 | +- Compare stdlib structures against AndroidX collection options where compatible with current module constraints. |
| 225 | + |
| 226 | +Targeted wins: |
| 227 | + |
| 228 | +- Potentially lower overhead than default stdlib structures for small-object identity sets. |
| 229 | +- Better understanding of dependency and binary-size tradeoffs before deep refactors. |
| 230 | + |
| 231 | +### Project 8: Action Drain/Conflation Heuristics Experiments |
| 232 | + |
| 233 | +Scope: |
| 234 | + |
| 235 | +- Use existing options (`DRAIN_EXCLUSIVE_ACTIONS`, `CONFLATE_STALE_RENDERINGS`, `WORK_STEALING_DISPATCHER`) to evaluate if additional heuristics should govern draining depth or render emission timing. |
| 236 | + |
| 237 | +Targeted wins: |
| 238 | + |
| 239 | +- Reduce stale intermediate render work in high-throughput action cascades. |
| 240 | +- Improve throughput without changing workflow semantics. |
| 241 | + |
| 242 | +Key files: |
| 243 | + |
| 244 | +- `workflow-runtime/.../RenderWorkflow.kt` |
| 245 | +- `workflow-runtime/.../WorkflowRunner.kt` |
| 246 | + |
| 247 | +## Recommended Execution Order |
| 248 | + |
| 249 | +1. Start with Project 5 (benchmark scenarios) so every following project has measurable baselines. |
| 250 | +2. Prototype Project 1 (set-backed identity index) behind internal flag or branch. |
| 251 | +3. Fold in Project 2 (insert-time uniqueness) if benchmark data supports simplification. |
| 252 | +4. Evaluate Projects 3/4 based on measured wins and complexity. |
| 253 | +5. Run broader perf + trace validation with Project 6 and optionally 8. |
| 254 | + |
| 255 | +## Notes On Correctness Constraints |
| 256 | + |
| 257 | +Any collection/index refactor must preserve: |
| 258 | + |
| 259 | +- Child lifecycle behavior (retain existing node when identity matches, cancel dropped nodes on commit). |
| 260 | +- Deterministic active ordering assumptions used by child action selector traversal. |
| 261 | +- Duplicate-key failure behavior and error messages. |
| 262 | +- Remember semantics: stable identity based on `(key, resultType, inputs)`. |
| 263 | +- Side effect semantics: start-after-render, retain-by-key, cancel-when-not-rendered. |
0 commit comments