Commit 878eb8a
authored
feat(amber): add loop-bookkeeping columns to materialized State (dormant) (#5900)
### What changes were proposed in this PR?
Extends the cross-region **State materialization** format from a single
`content` column to **3 columns** — `content`, `loop_counter`,
`loop_start_id` — promoting loop bookkeeping to first-class columns
(never inside the content JSON). The transport carries them end to end:
the `OutputManager` state writer + `emit_state`, the Python network
sender/receiver, the materialization reader, and the Scala
`state.toTuple` call sites. In memory the two loop fields ride on the
`StateFrame` envelope; they are materialized/serialized as their own
columns (parallel to `content`), and `from_tuple` / `fromTuple` read
only `content` back into the `State`.
The loop-back write address (LoopStart's input-port URI) is
**intentionally not** carried in State. It's constant per-execution
config, not per-iteration data, so it will be delivered to Loop End at
**setup** in the loop PR rather than round-tripping through every State
row. (An earlier revision of this PR carried a `loop_start_state_uri`
column; it was dropped after review — better than shipping a dormant
column and removing it later.)
On the Python side the column-name → value mapping is defined once in
`State.to_columns` and reused by both `to_tuple` (iceberg) and the
network sender's `StateFrame` branch, so adding a column later is a
single-line change in one place rather than an edit in every serializer.
**Dormant on `main`** — nothing observable changes without the loop
operators:
- `to_tuple()` / `toTuple()` and
`OutputManager.save_state_to_storage_if_needed` / `emit_state` default
the two loop columns to `0` / `""`, so every existing non-loop caller is
unchanged.
- `from_tuple` / `fromTuple` read only the `content` column, so
round-trip identity is preserved and the extra columns are inert.
No backward-compatible read of old 1-column State is needed: State
materialization is **intra-execution only** — the iceberg state-document
URI is execution-scoped (`…/eid/{executionId}/`) and recreated fresh
each run, and State tuples are never replayed across executions or
engine versions, so a 1-column tuple can never reach the 3-column
reader.
This is the state-format prerequisite the loop operators build on; the
columns carry non-default values only once Loop Start/End set them
(follow-up PR).
### Any related issues, documentation, discussions?
Extracted from #5700 (loop operators) per @Xiao-zhen-Liu's split
request; part of #4442 ("Introduce for loop").
### How was this PR tested?
- **Format / round-trip:** `test_state.py` (loop columns are their own
columns, never in content JSON, default to `0` / `""`), Scala
`StateSpec` (both loop columns round-trip through a tuple with
non-default values, not just `content`), `ArrowUtilsSpec` (3-column
Arrow vector round-trip), `IcebergDocumentSpec` (iceberg state-doc
round-trip).
- **Transport:** `test_network_receiver.py`,
`test_input_port_materialization_reader_runnable.py`, and
`test_state_materialization_e2e.py` — the e2e (hermetic sqlite catalog)
writes non-default values for both loop columns end-to-end and asserts
they replay both on the `StateFrame` and on the raw iceberg row,
exercising the real Tuple/Schema/iceberg path.
- **Dormancy:**
`test_output_manager.py::test_defaults_loop_columns_when_omitted` pins
that a no-loop caller (no `loop_counter`) still produces a valid
3-column tuple with the loop columns at `0` / `""`.
- Local: `workflow-core` + `amber` compile; `StateSpec` +
`ArrowUtilsSpec` pass; Python state + transport + e2e tests pass;
scalafmt + scalafix + black clean. (`IcebergDocumentSpec` needs the
iceberg catalog backend, so it runs in CI.)
### Was this PR authored or co-authored using generative AI tooling?
Co-authored with Claude Opus 4.8 in compliance with ASF.1 parent 322babf commit 878eb8a
17 files changed
Lines changed: 510 additions & 317 deletions
File tree
- amber/src
- main
- python/core
- architecture/packaging
- models
- runnables
- storage/runnables
- scala/org/apache/texera/amber/engine/architecture
- messaginglayer
- pythonworker
- test
- integration/org/apache/texera/amber/storage/result/iceberg
- python/core
- architecture/packaging
- models
- runnables
- storage/runnables
- common/workflow-core/src
- main/scala/org/apache/texera/amber/core/state
- test/scala/org/apache/texera/amber
- core/state
- util
Lines changed: 52 additions & 55 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
133 | 133 | | |
134 | 134 | | |
135 | 135 | | |
136 | | - | |
137 | | - | |
138 | | - | |
139 | | - | |
140 | | - | |
141 | | - | |
142 | | - | |
143 | | - | |
144 | | - | |
145 | | - | |
146 | | - | |
147 | | - | |
148 | | - | |
149 | | - | |
150 | | - | |
151 | | - | |
152 | | - | |
153 | | - | |
154 | | - | |
155 | 136 | | |
156 | | - | |
157 | | - | |
158 | | - | |
159 | | - | |
160 | | - | |
161 | | - | |
162 | | - | |
163 | | - | |
164 | | - | |
165 | | - | |
166 | | - | |
167 | | - | |
168 | | - | |
169 | | - | |
170 | | - | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
171 | 156 | | |
172 | | - | |
173 | | - | |
174 | | - | |
175 | | - | |
176 | | - | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
177 | 161 | | |
178 | 162 | | |
179 | 163 | | |
| |||
203 | 187 | | |
204 | 188 | | |
205 | 189 | | |
206 | | - | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
207 | 197 | | |
208 | 198 | | |
209 | 199 | | |
210 | 200 | | |
211 | 201 | | |
212 | 202 | | |
213 | | - | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
214 | 206 | | |
215 | 207 | | |
216 | 208 | | |
| |||
223 | 215 | | |
224 | 216 | | |
225 | 217 | | |
226 | | - | |
227 | | - | |
228 | | - | |
229 | | - | |
230 | | - | |
231 | | - | |
232 | | - | |
233 | | - | |
234 | | - | |
235 | | - | |
236 | | - | |
237 | | - | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
238 | 228 | | |
239 | 229 | | |
240 | 230 | | |
| |||
290 | 280 | | |
291 | 281 | | |
292 | 282 | | |
293 | | - | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
294 | 287 | | |
295 | 288 | | |
296 | 289 | | |
297 | 290 | | |
298 | 291 | | |
299 | 292 | | |
300 | 293 | | |
301 | | - | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
302 | 299 | | |
303 | 300 | | |
304 | 301 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
28 | | - | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
29 | 42 | | |
30 | 43 | | |
31 | 44 | | |
32 | 45 | | |
33 | | - | |
34 | | - | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
35 | 72 | | |
36 | 73 | | |
37 | 74 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
96 | 96 | | |
97 | 97 | | |
98 | 98 | | |
99 | | - | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
100 | 104 | | |
101 | 105 | | |
102 | 106 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
23 | | - | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
24 | 30 | | |
25 | 31 | | |
26 | 32 | | |
| |||
99 | 105 | | |
100 | 106 | | |
101 | 107 | | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
102 | 113 | | |
103 | | - | |
| 114 | + | |
104 | 115 | | |
105 | 116 | | |
106 | 117 | | |
| |||
Lines changed: 15 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
37 | | - | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
38 | 45 | | |
39 | 46 | | |
40 | 47 | | |
| |||
152 | 159 | | |
153 | 160 | | |
154 | 161 | | |
155 | | - | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
156 | 169 | | |
157 | 170 | | |
158 | 171 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
242 | 242 | | |
243 | 243 | | |
244 | 244 | | |
245 | | - | |
| 245 | + | |
246 | 246 | | |
247 | 247 | | |
248 | 248 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
125 | 125 | | |
126 | 126 | | |
127 | 127 | | |
128 | | - | |
| 128 | + | |
129 | 129 | | |
130 | 130 | | |
131 | 131 | | |
| |||
0 commit comments