Add execution_id node-state design docs and implementation plan

pinodeca · pinodeca · commit 7ab6b3f588f2 · 2026-06-27T13:30:38.000Z
Squash of the node-state-model proposal, the execution-id proposal, and the temporary exec-id implementation plan. Rebased onto the df.loop sub-orchestration fix (PR #228, copilot/fix-loop-restart-issue).
diff --git a/docs/exec-id-plan.md b/docs/exec-id-plan.md
@@ -0,0 +1,134 @@
+# Plan: execution_id node state (TEMPORARY — do not commit)
+
+> This plan file is scratch. Delete `docs/exec-id-plan.md` before the feature
+> branch is finalized (last step below). It must not land in history.
+
+## Goal
+
+Replace the `node-state-model` PR's stored `status_reason` approach with an
+`execution_id`-based model where **physical** statuses are written by the owning
+orchestration and **implicit** statuses (`skipped`, `cancelled`) are *derived at
+read time* in `df.instance_nodes` from each node's ancestors — with no duroxide
+dependency in the read path.
+
+---
+
+## Step 1 — Schema: `status_details JSONB` on `df.nodes`
+
+- Add `status_details JSONB` (nullable) to `df.nodes` in [src/lib.rs](src/lib.rs)
+  `create_tables` DDL.
+- Stores the writer's `execution_id` (the ordered path; see Step 4) plus room
+  for small read-path payloads we already need (the IF decision; the RACE
+  winner — see Step 6).
+- Incidental cleanup agreed earlier: **drop the dead `error` column** and stop
+  overloading `result` for non-result data.
+- Keep `nodes_status_chk` limited to the **physical** statuses
+  (`pending`/`running`/`completed`/`failed`). `skipped`/`cancelled` are never
+  stored, so they must NOT be added to the check constraint.
+- Grants: add `status_details` to nothing in the user INSERT column list (worker
+  writes it); remove `error` from any column lists in `df.grant_usage` /
+  `df.revoke_usage`.
+- **Upgrade script** `sql/pg_durable--0.2.4--0.2.5.sql` (next version): `ADD
+  COLUMN status_details`, `DROP COLUMN error`, constraint + grant adjustments.
+- **Binary backward-compat (B1):** the new `.so` must still run against older
+  schemas that lack `status_details`/still have `error`. The write path (Step 2)
+  and read path (Step 3) must degrade gracefully when the column is absent
+  (runtime column detection, or a guarded query) — verify with
+  `scripts/test-upgrade.sh`.
+
+## Step 2 — Write path: fence status updates on `execution_id`
+
+- In [src/activities/update_node_status.rs](src/activities/update_node_status.rs):
+  accept the caller's `execution_id`, write it to `status_details`, and make the
+  `UPDATE` **conditional**: apply only when the incoming `execution_id` is **not
+  superseded** by the stored one (`incoming >= stored`, equal allowed for
+  running→terminal within one execution).
+- Ordering: JSONB does not compare the way we need. Store/compare via a
+  **comparable form** (e.g. an `int[]` of the execution numbers) so Postgres'
+  native lexicographic array order gives "first differing segment, larger number
+  wins". The JSONB can carry the human-readable path; the gate compares the
+  array form (column, generated column, or extracted in the `WHERE`).
+- **Fire-and-forget / determinism:** the activity returns `Ok` regardless of
+  whether the row was updated; the orchestration must never branch on the
+  outcome (keep `let _ = ctx.schedule_activity(...)`).
+
+## Step 3 — Read path: `df.instance_nodes` infers status from `df.nodes` only
+
+- Rewrite [src/monitoring.rs](src/monitoring.rs) `instance_nodes` to:
+  1. **Drop the duroxide dependency** — read solely from `df.nodes` (one
+     `SELECT` of the instance's rows). Remove `list_executions` / the fabricated
+     `execution_id` cross join.
+  2. For each node, **leave `status` and the stored `execution_id` untouched**
+     and instead **add inferred fields to the returned `status_details`**:
+     - `inferred_status` — the *effective* state at read time (`skipped`,
+       `cancelled`, `pending`, or the node's own physical status when no
+       ancestor overrides it).
+     - `inferred_status_from_ancestor_id` — the node whose state *determined*
+       `inferred_status` (the first ancestor whose `execution_id` is not a
+       prefix of this node's). Makes the derivation explainable per row.
+     - We deliberately do NOT add `inferred_status_execution_id`: it is just the
+       `execution_id` of `inferred_status_from_ancestor_id`'s row, so a reader
+       can join to that node if needed.
+     - Rationale: the read path **must not compete with the write path**. By
+       augmenting (not overwriting) `status_details`, the helper never races the
+       orchestration's fenced writes. (If we later materialize this at the DB
+       level it would compete — defer that decision.)
+  3. **Derivation (leaf→root climb):** find the first ancestor whose
+     `execution_id` is not a prefix of the node's; that ancestor's state sets
+     `inferred_status`. Ancestor `failed` ⇒ `skipped`; `running`/newer execution
+     ⇒ `pending`; `completed` but branch/iteration not taken ⇒ `skipped`; RACE
+     loser ⇒ `cancelled`. Use the IF decision and RACE winner recorded in
+     `status_details` (Step 6) to know which branch/iteration is "taken".
+  4. Children are reachable via `left_node`/`right_node`; the tree is immutable
+     after `df.start()`, so no parent column is required (optional safe
+     denormalization only).
+- Output columns: `status` and `status_details` keep their stored values; the
+  inferred fields live inside the returned `status_details` JSONB only, never in
+  the base table.
+
+## Step 4 — Document the `execution_id`
+
+- Fold the structure from [docs/execution-id.md](docs/execution-id.md) into the
+  permanent doc (see Step 7): ordered path `root:n₀ : seg₁:n₁ : … : segₖ:nₖ`;
+  segment added only at sub-orchestration boundaries (loop iteration, join/race
+  branch); branch numbers = 1, loop numbers = 1..N; same node across executions
+  shares segment identities+length, differing only in numbers ⇒ total order.
+
+## Step 5 — Document the state-transition graph + helper
+
+- Add a diagram of node states with the **physical** transitions
+  (`pending→running→completed|failed`) drawn solid and the **implicit, never
+  materialized** ones (`→skipped`, `→cancelled`, re-entry `→pending` on a new
+  execution) drawn dashed, with a clear note that the dashed ones exist only in
+  `df.instance_nodes` output.
+- Document the inference helper used by `instance_nodes` (the ancestor-climb /
+  derivation function): inputs, the prefix rule, each case, and that it emits
+  `inferred_status` + `inferred_status_from_ancestor_id` into `status_details`
+  rather than mutating the stored `status`/`execution_id`.
+
+## Step 6 — RACE winner recording (small behavioral add)
+
+- The root→leaves / branch-taken inference needs the RACE node to record its
+  **winning branch** in `status_details`. Add this write in the race
+  orchestration if not already present.
+
+## Step 7 — Docs cleanup
+
+- Delete [docs/execution-id.md](docs/execution-id.md) and
+  [docs/node-state-model.md](docs/node-state-model.md); their content is folded
+  into the permanent doc/diagram (Steps 4–5).
+- Delete this plan file `docs/exec-id-plan.md` (must not be committed on the
+  feature branch).
+
+---
+
+## Open dependency you may be assuming implicitly
+
+**Loop boundary (prerequisite, not in your list).** Clean per-iteration
+`execution_id` segments require a nested loop to run as a **sub-orchestration
+rooted at the loop node**. Today the loop uses `ctx.continue_as_new` from
+`graph.root_node_id` (the only `continue_as_new` caller), which both produces the
+existing nested-loop "restarts wrong root" bug and prevents a nested loop from
+owning its own segment. This must be addressed (or explicitly deferred with
+top-level-loops-only scope) for Steps 2–3 to be correct. Track it as a separate
+work item.
diff --git a/docs/execution-id.md b/docs/execution-id.md
@@ -0,0 +1,42 @@
+# Execution IDs for Node State
+
+## Principles
+
+Each node row is written by a **single writer** (the orchestration that owns it). We keep changes **minimal**, and we **never materialize** implicit states (`skipped`, `cancelled`): they are *derived* at read time. This avoids extra writes, write contention, and conflict resolution.
+
+## 1. `execution_id`
+
+An `execution_id` is the **sub-orchestration path** from the root to the orchestration that last wrote a node, with an **execution number** per segment:
+
+```
+root:3:loopA:4:branchC:1
+```
+
+- A segment is added **only at a sub-orchestration boundary** — a loop iteration, or a join/race branch. Plain nodes (SQL, IF, sequencing, …) inherit their owning orchestration's `execution_id`; the path is therefore *coarser* than the full graph.
+- Execution numbers: **1** for join/race branches; **1..N** for loop iterations (one per `continue_as_new`); the root counts root-level loop iterations.
+- Two `execution_id`s that can ever write the **same** node share identical segment *identities* and length — they differ only in the **numbers**. This makes them **totally ordered**: at the first segment whose number differs, the larger number is more recent (“supersedes”).
+
+## 2. `status_detail` column
+
+Add `status_detail` to `df.nodes`. On every status write, the owning orchestration stamps the node's current `execution_id` there.
+
+*(Incidental cleanup, not core to this proposal: `status_detail` lets us drop the unused `error` column and stop overloading `result` for non-result data.)*
+
+## 3. Most-recent-execution wins
+
+The status update becomes a **fenced conditional write**: apply only if the incoming `execution_id` is **not superseded** by the one already stored (`incoming >= stored`). A stale writer — e.g. a race loser still draining, or a previous loop iteration — cannot clobber a newer execution's state, and the row converges to the most-recent execution regardless of arrival order. The write stays single-writer-*effective* and order-independent; the orchestration never branches on whether the write landed.
+
+## 4. Monitoring: it's enough for both tree walks
+
+With `status` + `status_detail`, plus what we already store (the IF decision in `result`, and the race winner on the race node), a single `SELECT` of an instance's nodes supports both directions:
+
+- **Leaf → root:** climb a node's ancestors; the first ancestor whose `execution_id` is not a prefix of the node's tells you the node's *effective* state (ancestor failed ⇒ skipped; running ⇒ pending in a new execution; completed ⇒ branch/iteration not taken ⇒ skipped).
+- **Root → leaves:** descend carrying the live lineage; decisions at IF/RACE/LOOP paint whole sub-graphs at once (not-taken ⇒ skipped, race loser ⇒ cancelled, under-failure ⇒ skipped, unreached ⇒ pending).
+
+Intuitively: structure tells you *what could run*, and `execution_id` tells you *which execution each node's status belongs to* — together enough to color the current state of **every sub-graph**, with no materialized transitions.
+
+## Out of scope / dependencies
+
+- **Loop boundary**: requires loops to run as a sub-orchestration rooted at the loop node so iterations get a clean `execution_id` segment — see separate note.
+- **Race winner**: the root→leaves walk needs the race node to record its winning branch (small add if not already present).
+- **Representation & upgrade**: `execution_id` ordering can be stored in a comparable form (implementation detail); `status_detail` is a new nullable column and the fenced `UPDATE` is compatible with pre-existing rows.
diff --git a/docs/node-state-model.md b/docs/node-state-model.md