feat: post success-path step comments via trusted credentials for change/test/checkup/e2e_fix#1117
Conversation
Add three public helpers so orchestrators can own per-step `## Step N/T:` GitHub issue comments via trusted PDD credentials, instead of asking step prompts to shell out to `gh issue comment` (which is unreliable across providers, especially Gemini's sandboxed shell that cannot read GH_TOKEN). - `extract_step_report(text) -> Optional[str]` — public alias of the existing `_extract_step_report`; returns the LAST `<step_report>...</step_report>` block, DOTALL + case-insensitive, whitespace-stripped. - `post_step_comment_once(*, repo_owner, repo_name, issue_number, step_num, body, posted_steps, cwd) -> bool` — keyword-only, idempotent. Returns True immediately if `step_num` already in `posted_steps`; otherwise sanitizes/truncates the body via the existing `_sanitize_comment_body`, shells out to `gh issue comment`, and on success mutates `posted_steps` in place. Never raises; transient failures are logged and surfaced as False. - `normalize_step_comments_state(raw) -> Set[int]` — coerces any persisted shape (None, list/tuple/set/frozenset, legacy bug-orchestrator dict, malformed) into a `Set[int]` so resume is robust across state-file generations. Rejects `bool` (legacy flag value) and `float` (caller must project composite keys). Bug-orchestrator path (already shipping success-comment posting on upstream main) is untouched — these helpers cover the change/test/checkup/e2e_fix orchestrators, wired in a follow-up commit. architecture.json grows three function entries for the new public helpers under the agentic_common interface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…entials Provider-side `gh issue comment` posts from step prompts broke across non-Anthropic providers, leaving change/test/checkup/e2e_fix workflows silent from the GitHub UI even though Cloud Run logs showed progress. Wire the per-step `## Step N/T:` posting into the orchestrator path behind trusted PDD credentials. Step prompts emit `<step_report>...</step_report>` blocks; the orchestrator extracts, sanitizes, truncates, and posts once per step via `gh issue comment`, mutating a `Set[int]` of posted-step keys for resume idempotency. Wiring per orchestrator: - **change** (13 steps + iterated review loop 11/12): linear steps 1-10 keyed by `step_num`; review iterations keyed by `iteration * 100 + step_num` (so a second-iteration step 11 doesn't dedupe against the first); step 13 keyed by 13. All hard-stop / abort / failure early-returns persist `state["step_comments"]` before saving. - **test** (18 steps with fractional 5.5 + iterated manual loop 6-11): composite-keyed via `iteration * 10000 + int(round(step_num * 10))` so 5.5 -> 55, step 8 of iteration 2 -> 20080. `save_state` grows `description` and `iteration` kwargs to thread the description through. - **checkup** (8 steps with fractional 6.1/6.2/6.3 + iterated fix-verify loop): same composite encoding as test. Step comments persist through `_build_state(..., step_comments=...)`. - **e2e_fix** (11 steps × cycle loop): cycle-keyed encoding `current_cycle * 10000 + step_num`. `current_cycle` is normalized to >=1 before the step loop. All three state-save sites (main inner loop, KeyboardInterrupt, fatal Exception) persist `sorted(step_comments_set)`. Tests: - 24 new unit tests in `test_agentic_common.py` cover the three helpers — extract success/missing/empty/multi/case/multiline; normalize across every persisted shape (None, list, tuple, set, frozenset, legacy dict, malformed, garbage); once idempotency, successful post, gh-failure, missing-gh, secret redaction, oversize truncation, exception non-raise. - Three orchestrator-level tests per command: success-path posts, no-`<step_report>` skips, helper-exception is log-and-continue. - `tests/test_post_step_comment_once_real.py` — gated by `PDD_RUN_REAL_LLM_TESTS=1` and `PDD_REAL_GH_ISSUE_URL`. Posts a real comment, asserts idempotency on second call, then deletes the comment via `gh api`. Bug-orchestrator path (already shipping success-comment posting upstream via `_maybe_post_step_comment`) is untouched. `agentic_sync_runner` is out of scope — its single rolling PATCH'd comment is not per-step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
Staging1 end-to-end verificationBuilt worker image from this PR's HEAD ( Build / unit
Bug reproduction (pre-PR worker,
|
gltanaka
left a comment
There was a problem hiding this comment.
I think the underlying change is needed for #1109: moving user-visible step progress out of model-side gh issue comment calls and into the orchestrators is the right direction.
I would not merge this PR as-is. Required changes:
-
Resolve the current merge conflicts. GitHub reports the PR as
CONFLICTING, andgit merge-tree origin/main origin/pr/1117shows conflicts inpdd/agentic_checkup_orchestrator.pyandtests/test_agentic_e2e_fix_orchestrator.py. -
Fix the
pdd changereview-loop posting path. Inpdd/agentic_change_orchestrator.py, the newpost_step_comment_once(...)calls for Step 11 and Step 12 run immediately afterrun_agentic_taskwithout checkings11_success/s12_success(pdd/agentic_change_orchestrator.py:2005and:2043in this PR). That means a provider failure or failed review/fix task can still post a visible “Step completed” fallback and mark the composite key as posted instate["step_comments"]. A later resume/retry can then suppress the real successful report for that same iteration key. Please gate these success comments on the step success flag, and add regression tests proving failed Step 11/12 outputs do not post or mark success comments. -
Align
e2e_fixwith the claimed 11-step coverage. The PR introduces## Step N/11comments, but the actual posting hook is only inside the innerfor step_num in range(1, 10)loop (pdd/agentic_e2e_fix_orchestrator.py:2483-2505). The post-success Step 11 cleanup and Step 10 CI validation paths (:2903-2943) do not use the trusted step-comment path, and the new tests only assertcall_count >= 1, so this gap is not covered. Either add trusted visible progress for Steps 10 and 11, or narrow the scope/header/tests so the PR does not claim complete 11-steppdd fixcoverage.
I ran the targeted PR tests locally from the PR head:
pytest tests/test_agentic_common.py::TestPostStepCommentOnce tests/test_agentic_change_orchestrator.py::TestTrustedStepCommentPosting tests/test_agentic_test_orchestrator.py::TestTrustedStepCommentPosting tests/test_agentic_checkup_orchestrator.py::TestTrustedStepCommentPosting tests/test_agentic_e2e_fix_orchestrator.py::TestTrustedStepCommentPosting -q
Result: 19 passed.
Resolves conflicts with main's checkup hardening (round-3/4/5 Step 7 gate, pr_head_sha state, --no-fix gate, final-checkup head-SHA safety). Files touched: - pdd/agentic_checkup_orchestrator.py: keep both branches' import additions (_sanitize_comment_body / post_pr_comment / post_step_comment from main; normalize_step_comments_state / post_step_comment_once from this branch). _save_state, _build_state: keep both new params (pr_head_sha, step_comments). --no-fix step 7 call site: preserve main's nofix_step7_output capture AND this branch's description=desc7 argument. - tests/test_agentic_e2e_fix_orchestrator.py: additive append of both branches' new test classes (TestTrustedStepCommentPosting from here, TestFinalCheckupForwardsCwd / TestFinalCheckupHeadSafetyChecks from main). - tests/test_agentic_checkup_orchestrator.py: trusted-step-comment tests now wrap mock outputs in ALL_ISSUES_FIXED so main's _step7_passed JSON gate doesn't reject them. 945 tests pass across the conflict-touched files (test_agentic_common.py, *_change_orchestrator.py, *_test_orchestrator.py, *_checkup_orchestrator.py, *_e2e_fix_orchestrator.py).
…N gate accepts them Post-merge fix-up: main's Step 7 gate (`_step7_passed`) parses a structured JSON verdict and rejects step 7 outputs that don't include it. The trusted step-comment tests added by this PR were written against the pre-gate behavior and used `"All Issues Fixed"` alone. Wrap them with the `ALL_ISSUES_FIXED` helper defined at the top of the test module so the mocks pass the new gate. All three TestTrustedStepCommentPosting tests now pass against the merged checkup orchestrator.
1. Gate trusted Step 11/12 posts on success flag (agentic_change_orchestrator). Previously the change orchestrator called post_step_comment_once inside the review loop unconditionally after run_agentic_task, even when s11_success / s12_success was False. A failed task would post a "Step N completed; no <step_report> block" fallback comment AND mark the composite key (review_iteration * 100 + step_num) as posted. A later resume/retry of the same iteration would then dedup against that failed key and silently suppress the real successful report. Gate both posts behind their success flag. Add two regression tests (test_step11_failure_does_not_post_or_mark_iter_key, test_step12_failure_does_not_post_or_mark_iter_key) that prove a failed Step 11/12 does not post anything and does not burn keys 111 / 112. 2. Extend trusted step-comment coverage to Steps 10 + 11 (agentic_e2e_fix_orchestrator) so the PR's "## Step N/11:" claim matches reality. The original PR only posted from the linear `for step_num in range(1, 10)` loop (steps 1-9). Step 10 (CI validation) and Step 11 (cleanup) ran after the loop and never invoked post_step_comment_once. Add a trusted post for Step 10 immediately after run_ci_validation_loop returns ci_success=True, keyed by current_cycle * 10000 + 10. Add a trusted post for Step 11 immediately after _run_step11_code_cleanup returns, keyed by current_cycle * 10000 + 11, with a body that reports the actual cleanup file delta (or a "no additional files modified" summary when the cleanup pass found nothing to do or reverted itself). Both posts use the orchestrator-format "## Step N/11:" header so downstream tooling can recognize them as orchestrator-owned. Add two regression tests (test_step10_ci_path_posts_trusted_comment, test_step11_cleanup_path_posts_trusted_comment) that prove the composite keys 10010 / 10011 appear in mock_post_once.call_args_list with the right header on the happy path. 949 tests pass across the affected files (test_agentic_common.py, test_agentic_change_orchestrator.py, test_agentic_test_orchestrator.py, test_agentic_checkup_orchestrator.py, test_agentic_e2e_fix_orchestrator.py).
|
Thanks for the review — addressed all three. 1. Merge conflicts resolved. Brought 2. Gated Step 11/12 trusted posts on
3. Aligned
Strengthened the existing tests with two new ones that assert the specific composite keys appear in
Test results locally (Python 3.12, Trusted-step-comment tests alone: Commits in this push:
CI rerunning on |
…g + step 10/11 trusted posts PDD prompts are the source of truth for the generated orchestrator code. After the code changes that gate Step 11/12 trusted posts on ``s11_success``/``s12_success`` (change orchestrator) and add Step 10 (CI validation) + Step 11 (cleanup) trusted posts via the same cycle-keyed encoding (e2e_fix orchestrator), the matching prompt specs needed to spell out the new contract — otherwise the next ``pdd sync``/regen would re-emit code that drops the gate and the new post sites. agentic_change_orchestrator_python.prompt: - Posting protocol §4 (review loop): add a REQUIRED success-gate clause stating that ``post_step_comment_once`` for Step 11 and Step 12 MUST be guarded by ``s11_success`` / ``s12_success``. Spell out the resume/dedup failure mode so a future regen does not weaken the gate to "log-and-continue". agentic_e2e_fix_orchestrator_python.prompt: - Narrow the in-loop posting protocol to "steps 1-9" so it is no longer ambiguous whether the helper covers the post-loop steps. - Add explicit Step 10 (CI validation) post protocol gated on ``ci_success=True``, keyed ``current_cycle * 10000 + 10``, with the ``## Step 10/11:`` header. - Add explicit Step 11 (cleanup) post protocol that snapshots ``changed_files`` before ``_run_step11_code_cleanup`` and computes the cleanup file delta for the body, keyed ``current_cycle * 10000 + 11``, header ``## Step 11/11:``. - Add a separate REQUIRED success-gate clause for the in-loop posts so failed steps don't burn composite keys (mirror of the change orchestrator gate). No code change in this commit — the code already implements all of the above. This is the prompt-side of the same contract so the ``architecture.json`` interface declaration and the generated code stay consistent.
gltanaka
left a comment
There was a problem hiding this comment.
Thanks for the update. The previous blockers are mostly addressed: the PR is now mergeable, the Step 11/12 review-loop posts are gated on success, and Step 10/11 e2e_fix comment sites were added with tests. CI is green, and I also reran the trusted-step-comment subset locally on the updated head:
PDD_PATH=$PWD pytest tests/test_agentic_common.py::TestPostStepCommentOnce tests/test_agentic_change_orchestrator.py::TestTrustedStepCommentPosting tests/test_agentic_test_orchestrator.py::TestTrustedStepCommentPosting tests/test_agentic_checkup_orchestrator.py::TestTrustedStepCommentPosting tests/test_agentic_e2e_fix_orchestrator.py::TestTrustedStepCommentPosting -q
Result: 23 passed.
One remaining required change before I would merge:
pdd/agentic_e2e_fix_orchestrator.py posts trusted Step 11 cleanup and Step 10 CI comments outside the main inner-loop save path, but those post sites only mutate the in-memory step_comments_set. They do not persist the updated state["step_comments"] / workflow state before planned failure returns.
Concrete failure modes:
- Step 11 cleanup posts key
current_cycle * 10000 + 11, thenrun_ci_validation_loopreturnsci_success=Falseand the function returnsFalseat the CI failure path. The existing workflow state still only has the earlier inner-loop keys, so a resume/retry can repost Step 11. - Step 10 CI posts key
current_cycle * 10000 + 10, then_run_final_checkup_on_pr(...)returnsFalseand the function returns before clearing state. The saved state again lacks the Step 10 key, so a resume/retry can repost Step 10 (and likely Step 11 too).
This violates the PR/issue requirement that visible step comments be idempotent across resume/retry. Please persist the updated step_comments after successful Step 10/11 trusted posts on any path where workflow state may remain for a later retry, and add regression coverage for the CI-failure / final-checkup-failure cases so those keys survive in saved state.
…d posts
Round-3 review surfaced a remaining idempotency hole on the post-loop
trusted-comment sites.
Failure modes addressed:
1. Step 11 cleanup post followed by ``run_ci_validation_loop``
returning ``ci_success=False``. The orchestrator returns at the
CI-failure path without touching workflow state, so saved state
still only has the inner-loop keys. Resume rehydrates
``step_comments_set`` from disk, misses the Step 11 key
``current_cycle * 10000 + 11``, and re-posts the comment.
2. Step 10 CI post followed by ``_run_final_checkup_on_pr`` returning
``checkup_success=False``. Same shape: the failure return bypasses
state save, the Step 10 key ``current_cycle * 10000 + 10`` is not
on disk, and resume re-posts (and likely re-runs Step 11 too).
Both violate the PR/issue contract that visible step comments be
idempotent across resume/retry.
Fix:
* After each post-loop Step 10/11 trusted ``post_step_comment_once``
call, mutate ``state_data["step_comments"] = sorted(step_comments_set)``
(also refresh ``total_cost``, ``changed_files``, ``last_saved_at``)
and call ``save_workflow_state(...)`` BEFORE the next code path that
may fail-and-return. Both saves live inside the same ``try/except``
block that guards the post, so a transient persistence failure
logs-and-continues rather than crashing the workflow.
Tests added (regression coverage for the two failure modes above):
* ``test_step11_post_persists_step_comments_before_ci_failure_return``
- mocks ``run_ci_validation_loop`` to return ``ci_success=False``,
asserts ``success=False`` propagates, AND asserts ``mock_save``
was called with ``state["step_comments"]`` containing key
``10011``.
* ``test_step10_post_persists_step_comments_before_checkup_failure_return``
- mocks the final checkup to return ``checkup_success=False``,
asserts ``success=False`` propagates AND asserts ``mock_save``
was called with ``state["step_comments"]`` containing key
``10010``.
Both tests use a ``post_step_comment_once`` side-effect that
faithfully mirrors the real helper's ``posted_steps.add(step_num)``
behavior — otherwise a bare ``return_value=True`` mock would leave
the in-memory set empty and silently pass the assertion.
Prompt:
* ``agentic_e2e_fix_orchestrator_python.prompt`` gains a REQUIRED
post-loop persistence clause that spells out exactly which save
state fields to refresh, why the save must be inline (CI/
final-checkup failure paths don't touch state), and that the save
must live inside the same try/except as the post.
951 tests pass across the affected files
(test_agentic_common.py, test_agentic_change_orchestrator.py,
test_agentic_test_orchestrator.py, test_agentic_checkup_orchestrator.py,
test_agentic_e2e_fix_orchestrator.py).
|
Addressed in The fix: After each post-loop Step 10 / Step 11 trusted That closes both of the failure modes you described:
Regression tests added (live in
Both tests use a Prompt updated too ( Local test results from PR head |
Catch-up merge after main advanced 6 commits since the previous merge (``a93cecf519``). No content conflicts expected — this preserves the linear feature history while keeping the PR fast-forwardable.
gltanaka
left a comment
There was a problem hiding this comment.
The prior round-3 fresh-run failure cases are now covered, and the updated trusted-step-comment subset passes locally (25 passed). CI is also green. I still would not merge yet because the new persistence code has a resume-path hole.
Remaining required change:
pdd/agentic_e2e_fix_orchestrator.py now persists post-loop Step 10/11 keys by mutating state_data["step_comments"] after the trusted post. That works only when state_data was created by the inner for step_num in range(1, 10) loop. On resumed success paths, the inner loop can be skipped entirely:
- cached Step 9
SUCCESS_FALL_THROUGHsetssuccess = Truebefore the outer loop (_resume_deferred_action == "SUCCESS_FALL_THROUGH"), thenwhile current_cycle <= max_cycles and not successdoes not run; resume_terminal_successalso setssuccess = Truebefore the loop.
In those cases state_data has not been initialized in this invocation (it is first assigned inside the inner loop). The post-loop Step 11 / Step 10 comments can still be posted, but the subsequent state_data["step_comments"] = ... raises UnboundLocalError inside the broad try/except, gets logged as a post_step_comment_once failed warning, and the key is not persisted. A later retry can repost the same visible comment, which reopens the idempotency problem for exactly the resume/retry paths this state is meant to protect.
Please build or refresh a post-loop state object independently of the inner-loop local before these Step 10/11 saves, and add regression coverage for a resumed cached-success path (for example: loaded state at last_completed_step=9 with Step 9 success, independent re-verification succeeds, then CI or final checkup fails) asserting keys 10011 / 10010 are persisted.
…tence saves Round-4 review surfaced a resume-path hole in the round-3 persistence fix. The inner ``for step_num in range(1, 10)`` loop is where the orchestrator's ``state_data`` dict was first assigned, but the SUCCESS_FALL_THROUGH and resume-terminal-success resume paths both set ``success = True`` BEFORE the outer ``while`` loop runs. On those paths the inner loop is skipped entirely, ``state_data`` is never assigned in this invocation, and the round-3 mutation lines (``state_data["step_comments"] = ...``) would raise ``UnboundLocalError`` inside the broad try/except around the trusted post. The exception got swallowed as a ``post_step_comment_once failed`` warning and the composite key ``current_cycle * 10000 + 10`` (or ``+ 11``) was never written to disk — a later retry rehydrated ``step_comments_set`` empty and re-posted the same visible step comment. Fix: * Introduce ``_build_post_loop_state(...)`` at module scope. It takes only function-scope locals that are unconditionally initialized at orchestrator entry (``step_outputs``, ``step_comments_set``, ``changed_files``, ``total_cost``, ``model_used``, ``dev_unit_states``, ``skipped_steps``, ``github_comment_id``, ``initial_file_hashes``, ``initial_sha``) and returns the same shape as the inner-loop save. * Replace both post-loop ``state_data["step_comments"] = ...`` mutations with a call to ``_build_post_loop_state`` whose result is handed directly to ``save_workflow_state``. The save works the same on the linear-execution path AND on the SUCCESS_FALL_THROUGH / resume-terminal-success resume paths. Tests added (regression coverage for the resume hole): * ``test_step11_persists_on_success_fall_through_resume_path`` — ``load_workflow_state`` returns a state with ``last_completed_step=9`` and a cached ALL_TESTS_PASS Step 9 output, forcing the SUCCESS_FALL_THROUGH path. Mocks ``run_ci_validation_loop`` to ``ci_success=False``, asserts the Step 11 composite key ``10011`` is in at least one ``save_workflow_state`` call's ``state["step_comments"]``. * ``test_step10_persists_on_success_fall_through_resume_path`` — same resume setup, mocks the final checkup to fail. Asserts key ``10010`` is persisted before the failure return. Both tests would fail on pre-fix code because the round-3 code mutated the never-assigned inner-loop ``state_data`` local. Prompt updated (``agentic_e2e_fix_orchestrator_python.prompt``) so ``pdd sync``-regenerated code stays resume-safe: a new "Resume-safe state construction (REQUIRED)" clause names the failure mode, explicitly forbids mutating the inner-loop ``state_data`` local, and suggests the ``_build_post_loop_state`` helper shape. 953 tests pass across the affected files (test_agentic_common.py, test_agentic_change_orchestrator.py, test_agentic_test_orchestrator.py, test_agentic_checkup_orchestrator.py, test_agentic_e2e_fix_orchestrator.py).
|
Addressed in The fix: Introduced a module-level helper Regression tests added (live in
Both tests use the same Prompt updated too ( Local test results from PR head |
gltanaka
left a comment
There was a problem hiding this comment.
The latest update addresses the remaining resume-path idempotency blocker.
I verified that the post-loop Step 10/11 saves now use _build_post_loop_state(...) instead of mutating the inner-loop state_data local, so the SUCCESS_FALL_THROUGH resume path can persist keys 10010 / 10011 even when the inner loop is skipped. The prompt contract was updated to preserve that behavior on regeneration, and the new regression tests cover the resume cases.
I reran the trusted-step-comment subset locally on the latest PR head:
PDD_PATH=$PWD pytest tests/test_agentic_common.py::TestPostStepCommentOnce tests/test_agentic_change_orchestrator.py::TestTrustedStepCommentPosting tests/test_agentic_test_orchestrator.py::TestTrustedStepCommentPosting tests/test_agentic_checkup_orchestrator.py::TestTrustedStepCommentPosting tests/test_agentic_e2e_fix_orchestrator.py::TestTrustedStepCommentPosting -q
Result: 27 passed.
GitHub checks are green and the PR is mergeable. Approved.
1. Gate trusted Step 11/12 posts on success flag (agentic_change_orchestrator). Previously the change orchestrator called post_step_comment_once inside the review loop unconditionally after run_agentic_task, even when s11_success / s12_success was False. A failed task would post a "Step N completed; no <step_report> block" fallback comment AND mark the composite key (review_iteration * 100 + step_num) as posted. A later resume/retry of the same iteration would then dedup against that failed key and silently suppress the real successful report. Gate both posts behind their success flag. Add two regression tests (test_step11_failure_does_not_post_or_mark_iter_key, test_step12_failure_does_not_post_or_mark_iter_key) that prove a failed Step 11/12 does not post anything and does not burn keys 111 / 112. 2. Extend trusted step-comment coverage to Steps 10 + 11 (agentic_e2e_fix_orchestrator) so the PR's "## Step N/11:" claim matches reality. The original PR only posted from the linear `for step_num in range(1, 10)` loop (steps 1-9). Step 10 (CI validation) and Step 11 (cleanup) ran after the loop and never invoked post_step_comment_once. Add a trusted post for Step 10 immediately after run_ci_validation_loop returns ci_success=True, keyed by current_cycle * 10000 + 10. Add a trusted post for Step 11 immediately after _run_step11_code_cleanup returns, keyed by current_cycle * 10000 + 11, with a body that reports the actual cleanup file delta (or a "no additional files modified" summary when the cleanup pass found nothing to do or reverted itself). Both posts use the orchestrator-format "## Step N/11:" header so downstream tooling can recognize them as orchestrator-owned. Add two regression tests (test_step10_ci_path_posts_trusted_comment, test_step11_cleanup_path_posts_trusted_comment) that prove the composite keys 10010 / 10011 appear in mock_post_once.call_args_list with the right header on the happy path. 949 tests pass across the affected files (test_agentic_common.py, test_agentic_change_orchestrator.py, test_agentic_test_orchestrator.py, test_agentic_checkup_orchestrator.py, test_agentic_e2e_fix_orchestrator.py).
…r-step-comments-upstream feat: post success-path step comments via trusted credentials for change/test/checkup/e2e_fix
Summary
The GitHub App's worker container installs the CLI from
promptdriven/pdd, but onlypdd bugalready had reliable orchestrator-owned visible per-step comments.pdd change/pdd test/pdd checkup/pdd fixstep prompts could still rely on agent-side shell comments, which fails across providers when the model subprocess cannot access GitHub credentials.This PR moves success-path step comments into the orchestrators and posts them with trusted PDD credentials. If an agent omits a
<step_report>block, the orchestrator now posts a short fallback visible comment and keeps the raw output in workflow state.Acceptance criteria mapping (from #1109)
_maybe_post_step_comment— unchangedagentic_e2e_fix_orchestrator, cycle-keyed across the 11-step loopWhat changed
Helpers:
pdd/agentic_common.pynow exposes:extract_step_report(text) -> Optional[str]post_step_comment_once(...) -> boolnormalize_step_comments_state(raw) -> Set[int]Orchestrators:
change,test,checkup, ande2e_fixhydrate persisted step-comment state on resume, post each successful step once with trusted credentials, persist sorted posted-step keys, and log-and-continue if comment posting fails.Fallback behavior: Missing
<step_report>no longer suppresses visible progress. The orchestrator posts:_Step N completed; no<step_report>block returned by agent. Raw output retained in workflow state._Prompt specs: Updated the matching
*_python.promptfiles so generated implementations preserve the trusted-comment and fallback behavior.Tests
Local:
pytest tests/test_agentic_common.py::TestPostStepCommentOnce tests/test_agentic_change_orchestrator.py::TestTrustedStepCommentPosting tests/test_agentic_test_orchestrator.py::TestTrustedStepCommentPosting tests/test_agentic_checkup_orchestrator.py::TestTrustedStepCommentPosting tests/test_agentic_e2e_fix_orchestrator.py::TestTrustedStepCommentPosting -q— 19 passedPDD_PATH=$PWD pytest tests/test_agentic_common.py tests/test_agentic_change_orchestrator.py tests/test_agentic_test_orchestrator.py tests/test_agentic_checkup_orchestrator.py tests/test_agentic_e2e_fix_orchestrator.py -q— 934 passedgit diff --check— cleanGitHub Actions on latest commit
2f1e641ad:Supersedes
Closes #1109