|
1 | 1 | # Night Evolution Map — Cloud Dev Pipeline Hardening |
2 | 2 | # 2026-03-11 night → 2026-03-12 morning |
3 | 3 |
|
4 | | -## Final Status |
| 4 | +## FINAL STATUS — Night Complete |
5 | 5 |
|
6 | | -### PRs Merged (5) + Closed (2 quality) |
| 6 | +### PRs Merged (5) |
7 | 7 | | PR | Title | Source | |
8 | 8 | |----|-------|--------| |
9 | 9 | | #129 | JSONL event persistence + deduplication | agent-124 | |
|
12 | 12 | | #141 | Golden Chain pipeline — tri cloud pipeline/verify/merge | agent-140 | |
13 | 13 | | #142 | Telegram log streaming — batch every 5s + output classifier | agent-131 | |
14 | 14 |
|
15 | | -### PRs Closed (3, superseded by direct fixes) |
| 15 | +### PRs Closed (5, quality gate or conflicts) |
16 | 16 | | PR | Reason | |
17 | 17 | |----|--------| |
18 | 18 | | #132 | Merge conflict after #129/#130 merged | |
19 | 19 | | #133 | Merge conflict after #129/#130 merged | |
20 | 20 | | #139 | Merge conflict, fixes applied directly | |
| 21 | +| #143 | Review: grep -oP not portable, destructive git checkout, worktree cleanup order | |
| 22 | +| #144 | Modified generated files (trinity-nexus/output/) — forbidden per CLAUDE.md | |
21 | 23 |
|
22 | 24 | ### Issues Closed (5) |
23 | 25 | | Issue | Resolution | |
|
28 | 30 | | #137 | Fixed: pipefail, bash shebang, Telegram ordering | |
29 | 31 | | #140 | Fixed via PR #141 merge | |
30 | 32 |
|
31 | | -### Direct Commits to Main (3) |
| 33 | +### Direct Commits to Main (4) |
32 | 34 | 1. `b470c5ae7` — heartbeat subshell + pipefail + Telegram ordering + HTML escape |
33 | 35 | 2. `fe6dc534e` — u32 overflow, entry_idx duplicates, VOLUME shadow, worktree conflict |
34 | | -3. Merge commits for PRs #129, #130, #138, #141 |
| 36 | +3. `9362cec04` — reuse Railway services instead of delete+create |
| 37 | +4. `f803a5fbd` — gh auth setup-git + --repo flag + push failure tracking |
35 | 38 |
|
36 | | -### Docker Image Rebuilt (2x) |
37 | | -- First: heartbeat + pipefail + Telegram fixes |
38 | | -- Second: VOLUME shadow removal + worktree branch fix |
| 39 | +### Docker Image Rebuilt (3x) |
| 40 | +1. heartbeat + pipefail + Telegram fixes |
| 41 | +2. VOLUME shadow removal + worktree branch fix |
| 42 | +3. `gh auth setup-git` + `--repo` flag + PUSH_OK tracking (sha256:b1c73cbc) |
39 | 43 |
|
40 | 44 | ## Phase Completion |
41 | 45 |
|
42 | 46 | | Phase | Status | Detail | |
43 | 47 | |-------|--------|--------| |
44 | | -| 1. Merge PRs | DONE | 4 merged, 3 closed | |
45 | | -| 2. Entrypoint Hardening | DONE | 6 fixes applied | |
| 48 | +| 1. Merge PRs | ✅ DONE | 5 merged, 5 closed | |
| 49 | +| 2. Entrypoint Hardening | ✅ DONE | 17 fixes applied | |
46 | 50 | | 3. Orchestrator CLI | 80% | pipeline/verify/merge added, logs TBD | |
47 | | -| 4. Auto-Pipeline | 70% | In PR #141, needs testing | |
48 | | -| 5. Monitoring | 30% | JSONL working, dashboard TBD | |
49 | | -| 6. Agent Intelligence | 20% | SOUL.md works, branch reuse TBD | |
50 | | - |
51 | | -## Remaining Open Issues |
52 | | -- #131 feat(cloud): Stream all container logs to Telegram in realtime |
53 | | -- #126 Cloud Dev: Structured ACI protocol |
54 | | -- #128, #127 FPGA/pipeline TODOs (lower priority) |
55 | | - |
56 | | -## Key Fixes Applied |
57 | | -1. Heartbeat reads from temp file (subshell isolation solved) |
58 | | -2. Telegram gets notifications on every status change (ordering fix) |
| 51 | +| 4. Auto-Pipeline | 80% | PR #141 merged, Telegram streaming in #142 | |
| 52 | +| 5. Monitoring | 40% | JSONL + Telegram live, dashboard TBD | |
| 53 | +| 6. Agent Intelligence | 30% | SOUL.md works, --repo fix, auth fixed | |
| 54 | + |
| 55 | +## Agent Spawns (10 total runs, 2 services) |
| 56 | + |
| 57 | +| Run | Service | Issue | Result | Duration | Notes | |
| 58 | +|-----|---------|-------|--------|----------|-------| |
| 59 | +| 1 | ubuntu | #126 | 🔴 FAILED | 619s | Too abstract, 0 commits | |
| 60 | +| 2 | Agents Anywhere | #131 | 🔵 DONE | ~300s | PR #142 merged ✅ | |
| 61 | +| 3 | Agents Anywhere | #115 | 🔴 FAILED | 303s | Push failed 3x (no gh auth setup-git) | |
| 62 | +| 4 | ubuntu | #114 | 🔴 FAILED | 519s | Push failed (same auth bug) | |
| 63 | +| 5 | Agents Anywhere | #116 | 🔴 FAILED | 81s | Can't read issue (no --repo flag) | |
| 64 | +| 6 | ubuntu | #126 (prev) | 🔴 CLOSED | — | PR #143 closed: quality issues | |
| 65 | +| 7 | ubuntu | #114 (retry) | 🔴 FAILED | 253s | 0 commits: generated files forbidden | |
| 66 | +| 8 | Agents Anywhere | #116 (retry) | 🔴 FAILED | 586s | 0 commits: generated files forbidden | |
| 67 | +| 9 | ubuntu | #114 (prev) | 🔴 CLOSED | — | PR #144 closed: edited output/ | |
| 68 | +| — | — | — | **1/8 success** | — | 12.5% solve rate | |
| 69 | + |
| 70 | +## All Bugs Fixed (17) |
| 71 | +1. Heartbeat reads from temp file (subshell isolation) |
| 72 | +2. Telegram notification ordering (LAST_STATUS moved after send) |
59 | 73 | 3. HTML escaping + safe JSON via temp files |
60 | 74 | 4. `#!/bin/bash` + `set -eo pipefail` |
61 | | -5. `i64` timestamps (no more u32 overflow) |
62 | | -6. No duplicate JSONL entries |
| 75 | +5. `i64` timestamps (u32 overflow) |
| 76 | +6. No duplicate JSONL entries (entry_idx fix) |
63 | 77 | 7. No VOLUME shadowing bare repo |
64 | | -8. Concurrent agents get unique branches |
| 78 | +8. Concurrent agents get unique worktree branches |
65 | 79 | 9. Golden Chain: `tri cloud pipeline <N>` automates full cycle |
66 | 80 | 10. Telegram `editMessageText` — 1 dashboard message updated in place |
67 | 81 | 11. `NO_COLOR=1` in containers for clean output |
68 | 82 | 12. Worktree lock/unlock prevents accidental pruning |
69 | | -13. Workflow reuses services instead of delete+create (avoids 25/day limit) |
70 | | - |
71 | | -## Active Agents (latest cycle — 16:33 UTC) |
72 | | -- **ubuntu** service → #126 — 🔴 FAILED (0 commits, 619s — issue too abstract for autonomous agent) |
73 | | -- **Agents Anywhere** service → #131 — 🔵 DONE → PR #142 merged |
74 | | -- **Agents Anywhere** service → #115 (VIBEE eqlPrimitive fix) — 🔴 DONE but push failed 3x, no PR created |
75 | | -- **ubuntu** service → #114 (VIBEE undefined Field type) — 🔴 DONE but push failed (git auth bug) |
76 | | -- **Agents Anywhere** service → #116 (Re-verify stale ast-check) — 🔴 FAILED (gh can't read issue — missing --repo) |
77 | | -- PR #143 from agent-126 — 🔴 CLOSED (review: grep -oP not portable, worktree cleanup order) |
78 | | -- **Docker rebuild #3** — fixes: `gh auth setup-git`, `--repo` on all gh commands, PUSH_OK tracking |
79 | | -- **ubuntu** service → #114 (RETRY) — 🚀 REDEPLOYED 16:55 UTC with fixed image |
80 | | -- **Agents Anywhere** service → #116 (RETRY) — 🚀 REDEPLOYED 16:55 UTC with fixed image |
81 | | - |
82 | | -## Bug Found & Fixed This Cycle |
83 | | -14. `sleepApplication: true` on "Agents Anywhere" service — Railway was sleeping container before entrypoint ran. Fixed via `serviceInstanceUpdate` + redeploy. |
| 83 | +13. Workflow reuses services instead of delete+create (25/day limit) |
| 84 | +14. `sleepApplication: true` on Agents Anywhere — disabled |
| 85 | +15. Push failure silently swallowed — PUSH_OK tracking added |
| 86 | +16. **CRITICAL**: `gh auth setup-git` — bridges gh→git credential helper |
| 87 | +17. **CRITICAL**: `--repo` flag on all gh commands — bare-repo worktrees lack context |
84 | 88 |
|
85 | 89 | ## Lessons Learned |
86 | 90 | 1. Railway MCP `deploy` uploads source, NOT Docker image — use GraphQL API |
87 | 91 | 2. `startCommand` overrides Docker ENTRYPOINT — must set via serviceInstanceUpdate |
88 | 92 | 3. 25 service/day creation limit — never delete+create, always reuse |
89 | 93 | 4. `variableCollectionUpsert` needs actual values, not empty shell vars |
90 | | -5. Service names with spaces break Railway CLI — avoid spaces in service names |
91 | | -6. `sleepApplication: true` silently kills agent containers — always set to false for batch jobs |
92 | | -7. Abstract/design issues (#126 "Structured ACI protocol") produce 0 commits — agents need concrete, code-level tasks with specific files/functions to modify |
93 | | -8. `retry "git push ... 2>/dev/null" || true` silently swallows push failures — agent reports DONE with no PR. Fixed: track PUSH_OK, skip PR creation if push fails, report FAILED explicitly |
94 | | -9. **CRITICAL**: `gh auth login` only configures `gh` CLI, NOT `git push`. Fixed: `gh auth setup-git` |
95 | | -10. **CRITICAL**: All `gh issue/pr` commands lack `--repo` flag — bare-repo worktrees have no git remote context. Fixed: extract `GH_REPO` from `REPO_URL`, add `--repo` to all gh calls |
96 | | -11. Docker rebuild #3 deployed with fixes #8-10. Both services redeployed 16:55 UTC |
| 94 | +5. Service names with spaces break Railway CLI — avoid spaces |
| 95 | +6. `sleepApplication: true` silently kills batch containers |
| 96 | +7. Abstract issues produce 0 commits — agents need concrete file/function targets |
| 97 | +8. `2>/dev/null || true` on push hides critical auth failures |
| 98 | +9. `gh auth login` ≠ git push auth — need `gh auth setup-git` |
| 99 | +10. Bare-repo worktrees have no git remote — all gh commands need `--repo` |
| 100 | +11. Codegen issues (#114-116) require editing generated files — agents can't solve them |
| 101 | +12. Agent solve rate: ~12.5% — need better issue selection + more specific SOUL.md |
| 102 | + |
| 103 | +## Night 2 (2026-03-12) — Model Fix + CLI Tools |
| 104 | + |
| 105 | +### Root Cause Found: z.ai proxy returns GLM-4.7 instead of Claude |
| 106 | +- **Bug #18 (CRITICAL)**: z.ai proxy routes `claude-sonnet-4-20250514` → `glm-4.7` (wrong model!) |
| 107 | +- GLM-4.7 cannot handle Claude Code's tool-use protocol → 0 commits on ALL agents |
| 108 | +- **Fix**: `--model glm-5` flag in entrypoint + `CLAUDE_MODEL=glm-5` env var |
| 109 | +- z.ai's top model is `glm-5` — confirmed working via API test |
| 110 | + |
| 111 | +### Changes Applied |
| 112 | +1. `deploy/agent-entrypoint.sh`: Added `--model "${CLAUDE_MODEL:-glm-5}"` to claude invocation |
| 113 | +2. `.github/workflows/agent-spawn.yml`: Added `CLAUDE_MODEL=glm-5` to Railway env vars |
| 114 | +3. Railway ubuntu service: `CLAUDE_MODEL=glm-5` set via MCP |
| 115 | +4. Docker image: Rebuilt and pushed to GHCR (sha256 new) |
| 116 | +5. **Bug #19**: `railway deploy` overwrote Docker image source with `railway.toml` (Dockerfile.px-bridge) |
| 117 | + - Fixed via `serviceInstanceUpdate` GraphQL — restored image source + startCommand |
| 118 | + - Lesson: NEVER use `railway deploy`/`redeploy` on Docker image services — it uploads source code |
| 119 | + |
| 120 | +### New CLI Commands (4) + MCP Tools (4) |
| 121 | +| Command | Purpose | |
| 122 | +|---------|---------| |
| 123 | +| `tri cloud api-check` | Test API key + model routing (catches proxy mismatch) | |
| 124 | +| `tri cloud redeploy <svc> <N>` | Reuse Railway service for new issue | |
| 125 | +| `tri cloud diagnose <N>` | Why did agent fail? (comments + events + PR) | |
| 126 | +| `tri cloud issue-create <title>` | Create issue with `agent:spawn` label | |
| 127 | + |
| 128 | +### Agent Spawn #145 (glm-5 validation) |
| 129 | +- **RESULT: SUCCESS** — Full E2E cycle in ~5 minutes |
| 130 | +- Auth OK, clone OK, read issue OK, code OK, self-review OK, push OK, PR #146 created |
| 131 | +- PR closed (local branch has richer impl), issue closed |
| 132 | +- **Agent solve rate: 2/9 = 22%** (up from 12.5%) |
| 133 | +- Bug #19 also found: `railway deploy` overwrites Docker image source |
| 134 | + |
| 135 | +### Docker Image Rebuilt (4th time) |
| 136 | +- glm-5 model fix + 4 new CLI commands |
| 137 | +- sha256 new, pushed to GHCR |
| 138 | + |
| 139 | +### Remaining Work |
| 140 | +- [ ] Create new agent-friendly issues and spawn more agents |
| 141 | +- [ ] Push local changes (4 new CLI commands + glm-5 fix) |
| 142 | +- [ ] Dashboard UI (Phase 5) |
| 143 | +- [ ] Agent self-metrics tracking |
| 144 | +- [ ] Investigate why `railway deploy` via MCP overrides Docker image source |
0 commit comments