Skip to content

Commit 347754e

Browse files
committed
Add temporary exec-id implementation plan
1 parent 757651c commit 347754e

1 file changed

Lines changed: 118 additions & 0 deletions

File tree

docs/exec-id-plan.md

Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
# Plan: execution_id node state (TEMPORARY — do not commit)
2+
3+
> This plan file is scratch. Delete `docs/exec-id-plan.md` before the feature
4+
> branch is finalized (last step below). It must not land in history.
5+
6+
## Goal
7+
8+
Replace the `node-state-model` PR's stored `status_reason` approach with an
9+
`execution_id`-based model where **physical** statuses are written by the owning
10+
orchestration and **implicit** statuses (`skipped`, `cancelled`) are *derived at
11+
read time* in `df.instance_nodes` from each node's ancestors — with no duroxide
12+
dependency in the read path.
13+
14+
---
15+
16+
## Step 1 — Schema: `status_details JSONB` on `df.nodes`
17+
18+
- Add `status_details JSONB` (nullable) to `df.nodes` in [src/lib.rs](src/lib.rs)
19+
`create_tables` DDL.
20+
- Stores the writer's `execution_id` (the ordered path; see Step 4) plus room
21+
for small read-path payloads we already need (the IF decision; the RACE
22+
winner — see Step 6).
23+
- Incidental cleanup agreed earlier: **drop the dead `error` column** and stop
24+
overloading `result` for non-result data.
25+
- Keep `nodes_status_chk` limited to the **physical** statuses
26+
(`pending`/`running`/`completed`/`failed`). `skipped`/`cancelled` are never
27+
stored, so they must NOT be added to the check constraint.
28+
- Grants: add `status_details` to nothing in the user INSERT column list (worker
29+
writes it); remove `error` from any column lists in `df.grant_usage` /
30+
`df.revoke_usage`.
31+
- **Upgrade script** `sql/pg_durable--0.2.4--0.2.5.sql` (next version): `ADD
32+
COLUMN status_details`, `DROP COLUMN error`, constraint + grant adjustments.
33+
- **Binary backward-compat (B1):** the new `.so` must still run against older
34+
schemas that lack `status_details`/still have `error`. The write path (Step 2)
35+
and read path (Step 3) must degrade gracefully when the column is absent
36+
(runtime column detection, or a guarded query) — verify with
37+
`scripts/test-upgrade.sh`.
38+
39+
## Step 2 — Write path: fence status updates on `execution_id`
40+
41+
- In [src/activities/update_node_status.rs](src/activities/update_node_status.rs):
42+
accept the caller's `execution_id`, write it to `status_details`, and make the
43+
`UPDATE` **conditional**: apply only when the incoming `execution_id` is **not
44+
superseded** by the stored one (`incoming >= stored`, equal allowed for
45+
running→terminal within one execution).
46+
- Ordering: JSONB does not compare the way we need. Store/compare via a
47+
**comparable form** (e.g. an `int[]` of the execution numbers) so Postgres'
48+
native lexicographic array order gives "first differing segment, larger number
49+
wins". The JSONB can carry the human-readable path; the gate compares the
50+
array form (column, generated column, or extracted in the `WHERE`).
51+
- **Fire-and-forget / determinism:** the activity returns `Ok` regardless of
52+
whether the row was updated; the orchestration must never branch on the
53+
outcome (keep `let _ = ctx.schedule_activity(...)`).
54+
55+
## Step 3 — Read path: `df.instance_nodes` infers status from `df.nodes` only
56+
57+
- Rewrite [src/monitoring.rs](src/monitoring.rs) `instance_nodes` to:
58+
1. **Drop the duroxide dependency** — read solely from `df.nodes` (one
59+
`SELECT` of the instance's rows). Remove `list_executions` / the fabricated
60+
`execution_id` cross join.
61+
2. For each node, **override** the returned `status` (and `status_details`) to
62+
the *effective* value at read time, derived from ancestors:
63+
- **Leaf→root climb:** find the first ancestor whose `execution_id` is not a
64+
prefix of the node's. Ancestor `failed``skipped`; `running`/newer
65+
execution ⇒ `pending`; `completed` but branch/iteration not taken ⇒
66+
`skipped`; RACE loser ⇒ `cancelled`.
67+
- Use the IF decision and RACE winner recorded in `status_details`
68+
(Step 6) to know which branch/iteration is "taken".
69+
3. Children are reachable via `left_node`/`right_node`; the tree is immutable
70+
after `df.start()`, so no parent column is required (optional safe
71+
denormalization only).
72+
- Output columns: surface `status_details` alongside `status`; the implicit
73+
states appear only here, never in the base table.
74+
75+
## Step 4 — Document the `execution_id`
76+
77+
- Fold the structure from [docs/execution-id.md](docs/execution-id.md) into the
78+
permanent doc (see Step 7): ordered path `root:n₀ : seg₁:n₁ : … : segₖ:nₖ`;
79+
segment added only at sub-orchestration boundaries (loop iteration, join/race
80+
branch); branch numbers = 1, loop numbers = 1..N; same node across executions
81+
shares segment identities+length, differing only in numbers ⇒ total order.
82+
83+
## Step 5 — Document the state-transition graph + helper
84+
85+
- Add a diagram of node states with the **physical** transitions
86+
(`pending→running→completed|failed`) drawn solid and the **implicit, never
87+
materialized** ones (`→skipped`, `→cancelled`, re-entry `→pending` on a new
88+
execution) drawn dashed, with a clear note that the dashed ones exist only in
89+
`df.instance_nodes` output.
90+
- Document the inference helper used by `instance_nodes` (the ancestor-climb /
91+
derivation function): inputs, the prefix rule, and each case.
92+
93+
## Step 6 — RACE winner recording (small behavioral add)
94+
95+
- The root→leaves / branch-taken inference needs the RACE node to record its
96+
**winning branch** in `status_details`. Add this write in the race
97+
orchestration if not already present.
98+
99+
## Step 7 — Docs cleanup
100+
101+
- Delete [docs/execution-id.md](docs/execution-id.md) and
102+
[docs/node-state-model.md](docs/node-state-model.md); their content is folded
103+
into the permanent doc/diagram (Steps 4–5).
104+
- Delete this plan file `docs/exec-id-plan.md` (must not be committed on the
105+
feature branch).
106+
107+
---
108+
109+
## Open dependency you may be assuming implicitly
110+
111+
**Loop boundary (prerequisite, not in your list).** Clean per-iteration
112+
`execution_id` segments require a nested loop to run as a **sub-orchestration
113+
rooted at the loop node**. Today the loop uses `ctx.continue_as_new` from
114+
`graph.root_node_id` (the only `continue_as_new` caller), which both produces the
115+
existing nested-loop "restarts wrong root" bug and prevents a nested loop from
116+
owning its own segment. This must be addressed (or explicitly deferred with
117+
top-level-loops-only scope) for Steps 2–3 to be correct. Track it as a separate
118+
work item.

0 commit comments

Comments
 (0)