Skip to content

Commit 195014d

Browse files
update(pipeline-monitoring): enhance Phase 7 documentation
- Rename section to "Monitor Pipeline" - Clarify team shutdown process and CI pipeline handling - Update PR status tracking table to include additional statuses - Refine exit conditions for monitoring and handling rejections - Improve clarity on CI repair pipeline operations
1 parent 1ca6fc0 commit 195014d

2 files changed

Lines changed: 59 additions & 105 deletions

File tree

agentic/commands/update.md

Lines changed: 47 additions & 90 deletions
Original file line numberDiff line numberDiff line change
@@ -420,15 +420,27 @@ Report all PR URLs to the user.
420420

421421
---
422422

423-
### Phase 7: Monitor & Resolve
423+
### Phase 7: Monitor Pipeline
424424

425-
After shipping PRs, the lead monitors the review pipeline and handles any failures. The team stays alive until
426-
all PRs are merged.
425+
After shipping PRs, shut down agents and clean up the team immediately — repairs are handled by the CI pipeline
426+
(`impl-repair.yml`), not locally. The lead monitors progress until all PRs reach a terminal state.
427427

428-
#### 7a. Poll PR Status
428+
#### 7a. Shut Down Team
429429

430-
Build a tracking table: `{library} → {pr_number, status}` where status is one of: `reviewing`, `approved`,
431-
`merged`, `rejected`, `failed`.
430+
Immediately after Phase 6 completes:
431+
432+
1. `SendMessage` with type `shutdown_request` to all agents
433+
2. Wait for all agents to confirm shutdown
434+
3. `TeamDelete` to clean up the team
435+
4. Clean up preview directory:
436+
```bash
437+
rm -rf plots/{spec_id}/implementations/.update-preview
438+
```
439+
440+
#### 7b. Poll PR Status
441+
442+
Build a tracking table: `{library} → {pr_number, score, status}` where status is one of: `reviewing`, `approved`,
443+
`repairing`, `merged`, `rejected`, `failed`, `not-feasible`.
432444

433445
Present the summary table to the user.
434446

@@ -438,109 +450,54 @@ Poll every **90 seconds** using `gh pr view` for each PR:
438450
gh pr view {pr_number} --json state,labels,mergedAt
439451
```
440452

441-
Extract status from labels: `ai-approved`, `ai-rejected`, `quality:{score}`, `quality-poor`.
453+
Extract status from labels: `ai-approved`, `ai-rejected`, `quality:{score}`, `quality-poor`, `not-feasible`,
454+
`ai-attempt-{N}`.
442455

443-
Update the table and inform the user when status changes.
456+
Update the table and inform the user when any status changes.
444457

445-
**Exit conditions**: all PRs are `merged` OR user says `abort`.
458+
**How the CI repair pipeline works:**
459+
- `impl-review.yml` scores the PR. If score < 90, it adds `ai-rejected` label.
460+
- `impl-repair.yml` auto-triggers on `ai-rejected`: reads review feedback, runs Claude to fix, pushes, re-triggers review.
461+
- Up to 3 attempts. After attempt 3: score >= 50 → `ai-approved` and merge; score < 50 → PR closed + `not-feasible`.
462+
- `impl-merge.yml` auto-triggers on `ai-approved`: squash-merges, creates metadata, promotes GCS images.
446463

447-
#### 7b. Handle Rejections
464+
**Exit conditions**: all PRs are `merged`, `not-feasible`, or closed — OR user says `abort`.
448465

449-
**When a PR gets `ai-rejected`:**
466+
#### 7c. Handle Pipeline Failures
450467

451-
1. **Cancel CI repair**`impl-repair.yml` auto-triggers on `ai-rejected`. Cancel it since we'll fix locally
452-
(agents have context):
453-
```bash
454-
gh run list --workflow=impl-repair.yml --branch=implementation/{spec_id}/{library} --status=in_progress --json databaseId -q '.[0].databaseId'
455-
# then: gh run cancel {run_id}
456-
```
457-
458-
2. **Read review feedback** from the PR:
459-
```bash
460-
gh pr view {pr_number} --json comments -q '.comments[-1].body'
461-
```
462-
Also read the updated metadata on the PR branch for structured review data:
463-
```bash
464-
gh api repos/{owner}/{repo}/contents/plots/{spec_id}/metadata/{library}.yaml?ref=implementation/{spec_id}/{library} -q '.content' | base64 -d
465-
```
468+
Only intervene if the CI pipeline itself fails (not for normal rejections — those are handled by `impl-repair.yml`).
466469

467-
3. **Wake the agent** via `SendMessage` with the review feedback. Agent repeats Steps 2-8 (conflict check →
468-
modify → generate → lint → process → self-check → report).
470+
**Stalled PRs** — if a PR shows no label changes for ~10 minutes:
469471

470-
4. **Push repair to PR branch** — after agent reports back:
472+
1. Check workflow run status:
471473
```bash
472-
# Save current main state
473-
git stash
474-
475-
# Checkout PR branch, pull latest (review may have pushed metadata updates)
476-
git checkout implementation/{spec_id}/{library}
477-
git pull
478-
479-
# Stage agent's changes
480-
git add plots/{spec_id}/implementations/{library}.py
481-
git add plots/{spec_id}/metadata/{library}.yaml
482-
git commit -m "repair({spec_id}): {library} — address review feedback"
483-
git push
484-
485-
# Return to main
486-
git checkout main
487-
git stash pop
474+
gh run list --workflow=impl-review.yml --branch=implementation/{spec_id}/{library} --limit 1 --json status,conclusion
475+
gh run list --workflow=impl-repair.yml --branch=implementation/{spec_id}/{library} --limit 1 --json status,conclusion
488476
```
489477

490-
5. **Re-upload images to GCS staging** (agent regenerated in `.update-preview/`):
478+
2. If a workflow run failed, read logs:
491479
```bash
492-
# Process images
493-
uv run python -m core.images process \
494-
plots/{spec_id}/implementations/.update-preview/{library}/plot.png \
495-
plots/{spec_id}/implementations/.update-preview/{library}/plot.png \
496-
plots/{spec_id}/implementations/.update-preview/{library}/plot_thumb.png
497-
498-
STAGING_PATH="gs://pyplots-images/staging/{spec_id}/{library}"
499-
gsutil cp plots/{spec_id}/implementations/.update-preview/{library}/plot.png "${STAGING_PATH}/plot.png"
500-
gsutil cp plots/{spec_id}/implementations/.update-preview/{library}/plot_thumb.png "${STAGING_PATH}/plot_thumb.png"
480+
gh run view {run_id} --log-failed
501481
```
502482

503-
6. **Re-trigger review**:
504-
```bash
505-
gh api repos/{owner}/{repo}/dispatches \
506-
-f event_type=review-pr \
507-
-f 'client_payload[pr_number]='"$PR_NUMBER"
508-
```
509-
510-
7. Continue polling. If rejected again, repeat (up to 2 repair rounds by lead — 3rd attempt handled by CI if
511-
needed).
512-
513-
**Workflow failures:**
483+
3. Report the failure reason to the user and ask how to proceed:
484+
- **Re-trigger**: `gh api repos/{owner}/{repo}/dispatches -f event_type=review-pr -f 'client_payload[pr_number]='"$PR_NUM"`
485+
- **Skip**: move on, leave PR open for manual handling
486+
- **Abort**: stop monitoring entirely
514487

515-
If a review or merge workflow fails (no labels appear after ~10 minutes):
488+
#### 7d. Final Report
516489

517-
- Check workflow run status:
518-
```bash
519-
gh run list --workflow=impl-review.yml --branch=implementation/{spec_id}/{library} --limit 1 --json status,conclusion
520-
```
521-
- If failed, read logs:
522-
```bash
523-
gh run view {run_id} --log-failed
524-
```
525-
- Report failure reason to user, ask how to proceed (re-trigger, fix manually, skip).
526-
527-
#### 7c. Final Report & Cleanup
528-
529-
Once all PRs are merged:
490+
Once all PRs have reached a terminal state:
530491

531492
1. Present final summary table:
532493

533-
| Library | PR | Quality Score | Status |
534-
|---------|-----|--------------|--------|
535-
| matplotlib | #1234 | 92 | merged |
536-
| seaborn | #1235 | 87 (repair) | merged |
494+
| Library | PR | Quality Score | Attempts | Status |
495+
|---------|-----|--------------|----------|--------|
496+
| matplotlib | #1234 | 92 | 2 | merged |
497+
| seaborn | #1235 | 94 | 1 | merged |
498+
| pygal | #1236 | 45 | 3 | not-feasible |
537499

538-
2. `SendMessage` with type `shutdown_request` to all agents
539-
3. `TeamDelete` to clean up the team
540-
4. Clean up preview directory if still present:
541-
```bash
542-
rm -rf plots/{spec_id}/implementations/.update-preview
543-
```
500+
2. Report any `not-feasible` libraries to the user — these may need manual intervention or a different approach.
544501

545502
---
546503

tests/unit/prompts/test_prompts.py

Lines changed: 12 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -216,9 +216,7 @@ def test_library_prompt_has_save_section(self, filename: str) -> None:
216216
def test_static_library_has_interactive_handling(self, filename: str) -> None:
217217
"""Static library prompts should have Interactive Spec Handling section."""
218218
content = (LIBRARY_PROMPTS_DIR / filename).read_text()
219-
assert "## Interactive Spec Handling" in content, (
220-
f"{filename} missing Interactive Spec Handling section"
221-
)
219+
assert "## Interactive Spec Handling" in content, f"{filename} missing Interactive Spec Handling section"
222220
assert "NOT_FEASIBLE" in content, f"{filename} missing NOT_FEASIBLE guidance"
223221
assert "AR-08" in content, f"{filename} missing AR-08 reference"
224222

@@ -238,9 +236,7 @@ def test_library_prompt_no_hardcoded_yellow(self, filename: str) -> None:
238236
colors_section_match = re.search(r"## Colors\n(.*?)(?=\n## |\Z)", content, re.DOTALL)
239237
if colors_section_match:
240238
colors_section = colors_section_match.group(1)
241-
assert "#FFD43B" not in colors_section, (
242-
f"{filename} still has hardcoded Python Yellow in Colors section"
243-
)
239+
assert "#FFD43B" not in colors_section, f"{filename} still has hardcoded Python Yellow in Colors section"
244240

245241

246242
class TestNoPlaceholders:
@@ -349,25 +345,26 @@ def test_all_libraries_have_same_structure(self) -> None:
349345

350346
def test_quality_score_threshold_consistent(self) -> None:
351347
"""Quality threshold (90) should be consistent across scoring prompts."""
352-
files_to_check = [
353-
PROMPTS_DIR / "quality-criteria.md",
354-
PROMPTS_DIR / "quality-evaluator.md",
355-
]
348+
files_to_check = [PROMPTS_DIR / "quality-criteria.md", PROMPTS_DIR / "quality-evaluator.md"]
356349

357350
for filepath in files_to_check:
358351
if filepath.exists():
359352
content = filepath.read_text()
360-
assert ">= 90" in content or "≥ 90" in content, (
361-
f"Approval threshold (90) not found in {filepath.name}"
362-
)
353+
assert ">= 90" in content or "≥ 90" in content, f"Approval threshold (90) not found in {filepath.name}"
363354

364355
def test_scoring_categories_consistent(self) -> None:
365356
"""Quality criteria and evaluator should have the same 6 categories."""
366357
criteria = (PROMPTS_DIR / "quality-criteria.md").read_text()
367358
evaluator = (PROMPTS_DIR / "quality-evaluator.md").read_text()
368359

369-
categories = ["Visual Quality", "Design Excellence", "Spec Compliance",
370-
"Data Quality", "Code Quality", "Library Mastery"]
360+
categories = [
361+
"Visual Quality",
362+
"Design Excellence",
363+
"Spec Compliance",
364+
"Data Quality",
365+
"Code Quality",
366+
"Library Mastery",
367+
]
371368

372369
for category in categories:
373370
assert category in criteria, f"Missing {category} in quality-criteria.md"

0 commit comments

Comments
 (0)