Skip to content

refactor(agent-service): redesign sync-execution result and error model#6009

Open
bobbai00 wants to merge 15 commits into
apache:mainfrom
bobbai00:refactor/sync-execution-result-model
Open

refactor(agent-service): redesign sync-execution result and error model#6009
bobbai00 wants to merge 15 commits into
apache:mainfrom
bobbai00:refactor/sync-execution-result-model

Conversation

@bobbai00

@bobbai00 bobbai00 commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this PR?

Restructures the per-operator summary the sync-execution backend returns and the agent-service / frontend consume, for a leaner, consistent wire contract. This is a focused re-do of #5927 cut directly from main (no foundation stack). Scope: only the type definitions of the execution result/error model — the wire contract across the Scala producer, the agent-service, and the frontend — plus the minimal consumer changes the new types force; no transport or formatting-logic changes.

  • Replace the flat OperatorInfo with OperatorExecutionSummary (orthogonal sub-summaries: state, errorMessages, resultSummary?, consoleLogsSummary?); rename SyncExecutionResultWorkflowExecutionSummary.
  • resultSummary.sampleTuples is now SampleRow[] ({ rowIndex, tuple }) instead of JSON rows with an embedded __row_index__; drop the table-shape types (the agent derives input-port shapes from the DAG).
  • Move WorkflowFatalError into types/execution.ts and reuse it for per-operator errors — the same type the workflow-compiling service returns for compilation errors, so compile and execution errors share one wire shape; api/compile-api.ts re-exports it so its existing importers are unchanged, and the frontend imports its existing declaration instead of adding a copy.
  • errorMessages / errors are non-optional (empty = none); drop compilationErrors; collapse the console-message types and derive warnings from WARNING:-titled messages.
  • Operator results are still pulled on demand via GET /agents/:id/operator-results (transport unchanged); that REST payload now carries the canonical OperatorExecutionSummary, and the frontend maps it to its flat display type (re-flattening sampleTuples so the display components are unchanged).

Touches the Scala producer (SyncExecutionResource, split into pure helpers with a new unit spec), the agent-service consumers (result-formatting, workflow-execution-tools, workflow-result-state, server), and the frontend mapping. Representation/type-level; the only behavior deltas are in the failure paths the new error type forces: one shared "operator failed" derivation (getOperatorErrorText), per-operator errors surfaced for any terminal state, a distinct "Compilation error:" label, and Killed preserved on timeout. Input-port shape lines are now derived rather than explicitly rendered.

Any related issues, documentation, discussions?

Closes #5750
Part of #5747.
Supersedes #5927.

How was this PR tested?

  • agent-service: tsc --noEmit clean, bun test 134/134 pass, prettier --check clean.
  • Scala: sbt WorkflowExecutionService/compile clean; the new SyncExecutionResourceSpec unit suite (no DB) passes 24/24, scalafmt applied. refactor(agent-service): redesign sync-execution result and error model #5927 verified the same wire shape via a full-stack end-to-end run (a Claude Haiku 4.5 agent built and executed a CSV workflow; /operator-results returned the new shape — resultSummary.sampleTuples: [{ rowIndex, tuple }], errorMessages: []).
  • frontend: tsc --noEmit clean; targeted karma specs pass 22/22 (the new TestBed fixture renders the changed template); eslint clean.

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.8 (1M context), Claude Fable 5

@github-actions

github-actions Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Automated Reviewer Suggestions

Based on the git blame history of the changed files, we recommend the following reviewers:

  • Contributors with relevant context: @Ma77Ball, @Yicong-Huang
    You can notify them by mentioning @Ma77Ball, @Yicong-Huang in a comment.

@codecov-commenter

codecov-commenter commented Jun 29, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 89.54248% with 32 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.61%. Comparing base (5c4a963) to head (392f4de).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
...he/texera/web/resource/SyncExecutionResource.scala 76.74% 27 Missing and 3 partials ⚠️
...agent-interaction/agent-interaction.component.html 66.66% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #6009      +/-   ##
============================================
+ Coverage     59.11%   60.61%   +1.50%     
- Complexity     3201     3232      +31     
============================================
  Files          1132     1133       +1     
  Lines         43681    43494     -187     
  Branches       4734     4717      -17     
============================================
+ Hits          25821    26365     +544     
+ Misses        16430    15682     -748     
- Partials       1430     1447      +17     
Flag Coverage Δ *Carryforward flag
access-control-service 70.00% <ø> (ø) Carriedforward from 5c4a963
agent-service 57.44% <100.00%> (+12.84%) ⬆️
amber 64.01% <76.74%> (+0.89%) ⬆️
computing-unit-managing-service 0.00% <ø> (ø) Carriedforward from 5c4a963
config-service 52.30% <ø> (ø) Carriedforward from 5c4a963
file-service 62.81% <ø> (ø) Carriedforward from 5c4a963
frontend 51.78% <96.15%> (+0.46%) ⬆️
notebook-migration-service 78.57% <ø> (ø) Carriedforward from 5c4a963
pyamber 91.19% <ø> (ø) Carriedforward from 5c4a963
workflow-compiling-service 55.14% <ø> (ø) Carriedforward from 5c4a963

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

github-actions Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

⚠️ Benchmark changes need a look

🟢 0 better · 🔴 7 worse · ⚪ 8 noise (<±5%) · 0 without baseline

Compared against main 6de8a48 benchmarked on this same runner, so the delta is largely free of cross-runner hardware noise. The "7d avg" column still reflects the gh-pages dashboard. Treat <±5% as noise unless repeated.

Dashboard · Run

config throughput MB/s latency max Δ latest / 7d
🔴 bs=10 sw=10 sl=64 414 0.253 23,639/33,453/33,453 us 🔴 +14.9% / 🔴 +120.9%
🔴 bs=100 sw=10 sl=64 923 0.563 107,304/138,820/138,820 us 🔴 +18.5% / 🔴 +28.6%
bs=1000 sw=10 sl=64 1,117 0.682 892,104/941,056/941,056 us ⚪ within ±5% / 🟢 -11.2%
Baseline details

Latest main 6de8a48 from same runner

config metric PR latest main 7d avg Δ latest Δ 7d
bs=10 sw=10 sl=64 throughput 414 tuples/sec 470 tuples/sec 776.36 tuples/sec -11.9% -46.7%
bs=10 sw=10 sl=64 MB/s 0.253 MB/s 0.287 MB/s 0.474 MB/s -11.8% -46.6%
bs=10 sw=10 sl=64 p50 23,639 us 21,484 us 12,636 us +10.0% +87.1%
bs=10 sw=10 sl=64 p95 33,453 us 29,105 us 15,143 us +14.9% +120.9%
bs=10 sw=10 sl=64 p99 33,453 us 29,105 us 18,954 us +14.9% +76.5%
bs=100 sw=10 sl=64 throughput 923 tuples/sec 967 tuples/sec 985.33 tuples/sec -4.6% -6.3%
bs=100 sw=10 sl=64 MB/s 0.563 MB/s 0.59 MB/s 0.601 MB/s -4.6% -6.4%
bs=100 sw=10 sl=64 p50 107,304 us 103,010 us 101,671 us +4.2% +5.5%
bs=100 sw=10 sl=64 p95 138,820 us 117,171 us 107,939 us +18.5% +28.6%
bs=100 sw=10 sl=64 p99 138,820 us 117,171 us 113,798 us +18.5% +22.0%
bs=1000 sw=10 sl=64 throughput 1,117 tuples/sec 1,100 tuples/sec 1,016 tuples/sec +1.5% +9.9%
bs=1000 sw=10 sl=64 MB/s 0.682 MB/s 0.671 MB/s 0.62 MB/s +1.6% +9.9%
bs=1000 sw=10 sl=64 p50 892,104 us 916,304 us 989,693 us -2.6% -9.9%
bs=1000 sw=10 sl=64 p95 941,056 us 980,009 us 1,028,327 us -4.0% -8.5%
bs=1000 sw=10 sl=64 p99 941,056 us 980,009 us 1,059,969 us -4.0% -11.2%
Raw CSV
config_idx,batch_size,schema_width,string_len,num_batches,total_ms,total_tuples,total_bytes,tuples_per_sec,mb_per_sec,lat_p50_us,lat_p95_us,lat_p99_us
0,10,10,64,20,482.81,200,128000,414,0.253,23639.13,33453.15,33453.15
1,100,10,64,20,2167.76,2000,1280000,923,0.563,107303.81,138820.05,138820.05
2,1000,10,64,20,17904.21,20000,12800000,1117,0.682,892104.14,941056.45,941056.45

@bobbai00 bobbai00 marked this pull request as draft June 29, 2026 01:00
@bobbai00 bobbai00 force-pushed the refactor/sync-execution-result-model branch from 89eb9e9 to 84022c9 Compare June 29, 2026 04:56
@bobbai00 bobbai00 marked this pull request as ready for review June 29, 2026 08:25
@bobbai00 bobbai00 force-pushed the refactor/sync-execution-result-model branch from 43166b9 to 66d9785 Compare July 2, 2026 09:18
@bobbai00 bobbai00 requested a review from Yicong-Huang July 2, 2026 09:25
bobbai00 and others added 15 commits July 4, 2026 15:41
### What changes were proposed in this PR?

Restructures the per-operator summary the sync-execution backend returns and the agent-service / frontend consume, for a leaner, consistent wire contract. This is a focused re-do of apache#5927 cut directly from `main` (no foundation stack): it changes only the execution result/error model and its consumers.

- Replace the flat `OperatorInfo` with `OperatorExecutionSummary` (orthogonal sub-summaries: `state`, `errorMessages`, `resultSummary?`, `consoleLogsSummary?`); rename `SyncExecutionResult` → `WorkflowExecutionSummary`.
- `resultSummary.sampleTuples` is now `SampleRow[]` (`{ rowIndex, tuple }`) instead of JSON rows with an embedded `__row_index__`; drop the table-shape types (the agent derives input-port shapes from the DAG).
- Move `WorkflowFatalError` into `types/execution.ts` and reuse it for per-operator errors — the same type the workflow-compiling service returns for compilation errors, so compile and execution errors share one wire shape; `api/compile-api.ts` re-exports it so its existing importers are unchanged.
- `errorMessages` / `errors` are non-optional (empty = none); drop `compilationErrors`; collapse the console-message types and derive warnings from `WARNING:`-titled messages.
- Operator results are still pulled on demand via `GET /agents/:id/operator-results` (transport unchanged); that REST payload now carries the canonical `OperatorExecutionSummary`, and the frontend maps it to its flat display type (re-flattening `sampleTuples` so the display components are unchanged).

Touches the Scala producer (`SyncExecutionResource`), the agent-service consumers (`result-formatting`, `workflow-execution-tools`, `workflow-result-state`, `server`), and the frontend mapping. Representation/type-level; behavior preserved (input-port shape lines are now derived rather than explicitly rendered).

### Any related issues, documentation, discussions?

Closes apache#5750
Part of apache#5747.
Supersedes apache#5927.

### How was this PR tested?

- agent-service: `tsc --noEmit` clean, `bun test` 110/110 pass, `prettier --check` clean.
- The Scala producer (`SyncExecutionResource`) is unchanged from apache#5927, which verified it via `sbt WorkflowExecutionService/compile` and a full-stack end-to-end run (a Claude Haiku 4.5 agent built and executed a CSV workflow; `/operator-results` returned the new shape — `resultSummary.sampleTuples: [{ rowIndex, tuple }]`, `errorMessages: []`).

### Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.8 (1M context)
Add unit tests for the redesigned sync-execution result/error model to
bring patch coverage to 100%:

- workflow-execution-tools: drive executeOperatorAndFormat and
  createExecuteOperatorTool across pre-flight guards, successful runs
  (shape/warnings/gaps/cell types/truncation), execution failures
  (FAILED/KILLED/ERROR, per-operator and general errors), abort
  propagation, and callback failures.
- texera-agent: exercise getFormattedResultsForDAG over the visible
  result branch.
- workflow-result-state: cover getOperatorInfo.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Wrap long tuple/list expressions to the 100-column limit and drop a
stray blank line so scalafmtCheckAll passes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- agent-interaction: unit-test the visualization html caching, column
  and row derivation (ellipsis on index gaps) via direct construction
  with stubbed services.
- result-table-frame: cover setupResultTable populating currentResult,
  columns, and totalNumTuples for non-empty data.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add SyncExecutionResourceSpec covering the unit-testable changed lines:
the new result/error summary case classes and handleExecutionError
(all compilation-error branches plus the unknown-error fallback), via
PrivateMethodTester so no production visibility changes are needed.

The remaining changed lines (executeWorkflowSync orchestration,
collectOperatorInfos, and the collectOperatorResult truncation loop)
drive a live Pekko execution + DB/Iceberg-backed result documents and
are exercised by the integration suite, which is out of unit-test scope.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rs for testing

Extract two behavior-preserving pure helpers so the result sampling and
per-operator summary logic can be unit-tested without a live engine:

- sampleAndTruncateTuples(tupleIterator, totalCount, ...) — the symmetric
  result truncation / sampling previously inlined in collectOperatorResult.
- buildOperatorExecutionSummary(...) — the per-operator summary + console
  error extraction previously inlined in collectOperatorInfos.

Both are pure code moves (identical expressions); call sites delegate to
them. Expand SyncExecutionResourceSpec to cover every branch of both
(empty/visualization/front-only/oversized/sliding-window truncation, and
result/console-error/no-result summaries).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ting

Factor the final WorkflowExecutionSummary assembly (fatal-error
formatting, operator-console-error detection, state string, and success
determination) out of executeWorkflowSync into a pure
assembleExecutionSummary helper. Behavior-preserving code move; the live
observable-wait / termination handling stays inline.

Add unit tests covering the state/success derivation across terminal,
console-error, target-results-override, operator-error, and fatal-error
cases (SyncExecutionResourceSpec now 29 tests).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… partials

Add two unit cases to close branch gaps in the extracted helpers:
- sampleAndTruncateTuples: empty iterator with a positive count (the
  `!tupleIterator.hasNext` half of the guard).
- buildOperatorExecutionSummary: a console ERROR whose message is
  non-empty but shorter than the title (the length-comparison false
  branch), so the title is kept.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Shrink the sampling helpers' return to the three consumed values, extract
the duplicated back-window loop into collectBackWindow, and drop the
constructor-echo tests in favor of the behavioral suites.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
getOperatorErrorText pins the one derivation of "operator failed" the new
errorMessages type forces (a fatal error carrying message text), replacing
the divergent .length and joined-truthiness checks. In the failure branch,
surface per-operator errors for any terminal state (success=false can come
with state "Completed"), label CompilationFailed results distinctly, and
preserve the Killed state on timeout instead of collapsing to Failed.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…tion in tests

Import the existing WorkflowFatalError from workflow-websocket.interface
instead of redeclaring it in agent.service.ts, and rewrite the
agent-interaction spec onto a TestBed fixture with detectChanges() so the
changed template is actually rendered and covered.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@bobbai00 bobbai00 force-pushed the refactor/sync-execution-result-model branch from 8e8a9eb to 392f4de Compare July 5, 2026 00:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-service engine frontend Changes related to the frontend GUI refactor Refactor the code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

refactor(agent-service): redesign the sync-execution result and error model

2 participants