refactor: simplify session lifecycle and zellij runtime#62
Conversation
1940c69 to
7e18aae
Compare
Greptile SummaryThis PR is a substantial simplification of the session lifecycle model: the canonical
Confidence Score: 5/5Safe to merge; the simplification is well-executed and the migration correctly handles both fresh and existing databases with IF EXISTS guards throughout. The lifecycle model refactor correctly replaces the multi-field state machine with two persisted facts (activity_state + is_terminated) and derives display status at read time. The migration Up/Down paths are symmetric and cover both upgrade and fresh-DB cases. The anti-flap mechanism (ProbeDead+ProbeDead required, recent-activity window) is simpler than the old detecting quarantine but still protects against transient false positives. No logic errors or data-corruption paths were found in the changed code. reactions.go — the sendOnce dedup state is not session-scoped and never pruned; low practical risk given PR URL uniqueness, but worth addressing before this path sees heavy use. Important Files Changed
Sequence DiagramsequenceDiagram
participant GH as GitHub Tracker
participant PRM as pr.Manager
participant LCM as lifecycle.Manager
participant Store as SQLite Store
participant Reaper as Reaper
Note over GH,Reaper: PR observation flow (new model)
GH->>PRM: ApplyObservation(id, obs)
PRM->>Store: WritePR(row, checks, comments)
alt "obs.Merged == true"
PRM->>LCM: MarkTerminated(id)
LCM->>Store: "UpdateSession is_terminated=true activity=exited"
else obs open or CI failing or review
PRM->>LCM: ApplyPRObservation(id, obs)
LCM->>Store: GetSession(id)
LCM-->>LCM: sendOnce(key, sig, msg)
LCM->>LCM: messenger.Send(id, msg)
end
Note over Reaper,LCM: Runtime liveness probe (new model)
Reaper->>Store: ListAllSessions()
loop each non-terminated session
Reaper->>Reaper: zellij.IsAlive(handle)
alt ProbeDead and no recent activity
Reaper->>LCM: ApplyRuntimeObservation(id, facts)
LCM->>Store: "UpdateSession is_terminated=true activity=exited"
end
end
Note over LCM,Store: Status derivation on read
Store-->>LCM: SessionRecord plus PRFacts list
LCM-->>LCM: displayPR selects best PRFacts
LCM-->>LCM: DeriveStatus returns SessionStatus
Reviews (6): Last reviewed commit: "refactor: align storage and runtime obse..." | Re-trigger Greptile |
7e18aae to
08c8adb
Compare
08c8adb to
77456b0
Compare
) * feat(scm): GitHub provider adapter — Observe(prURL) → PRObservation A fresh GitHub SCM provider adapter under backend/internal/adapters/scm/github/ exposing one method: (*Provider).Observe(ctx, prURL) (ports.PRObservation, error) It performs a REST GET on /repos/{o}/{r}/pulls/{n} for the authoritative draft/merged/closed/head-SHA, one GraphQL query for the reviewDecision + mergeStateStatus + statusCheckRollup + unresolved review threads, and (only for failure-class CheckRuns) a REST GET on /actions/jobs/{job_id}/logs to splice the last 20 lines of the failed job into the observation. The package is the observation primitive; the polling loop, cadence selection, daemon wiring, persistence and webhook receiver are all intentionally out of scope (separate PRs / lanes). Closes #27 — this supersedes PR #28's attempt, which targeted types (domain.SCMProvider / SCMSnapshot / ports.SCMObserveRequest) that the PR #62 simplification refactor has since removed. The GraphQL queries and mergeability composition logic are credited to @whoisasx from PR #28's provider.go; the package was re-implemented against the current ports.PRObservation seam (post-#62) rather than rebased. Bot-author detection uses ONLY GitHub's typed signal (__typename "Bot" / User.Type "Bot"). The strings.Contains(login, "bot") fallback from PR #28 was intentionally dropped — aa-18's review flagged it as a false-positive magnet for logins like "robothon" / "lambot123". 46 table-driven tests against httptest.NewServer cover happy path, draft, merged, closed (not merged), CI passing/failing/pending, StatusContext legacy, log-tail extraction (and the best-effort log-fetch failure case), mergeability mergeable/conflicting/blocked (including ci-failing → blocked even when GitHub still says CLEAN — the load-bearing aa-18 contract)/unstable/unknown, review approved/changes-requested/required/none, bot-author filtering (including the robothon false-positive guard), unresolved-only threads, all-bots → empty Comments, ETag-304 cache hit, primary + secondary rate-limit (with errors.As → *RateLimitError), 401 → ErrAuthFailed, malformed JSON → Fetched:false, network error → Fetched:false, Authorization Bearer header injection, StaticTokenSource blank/whitespace rejection, GHTokenSource memoize + invalidate. Verification: - go build ./... clean - go vet ./... clean - gofmt -l backend/internal/adapters/scm/ clean - golangci-lint run ./... (v2.12, repo .golangci.yml) 0 issues - go test -race ./internal/adapters/scm/github/... 46/46 PASS References: - aa-18 review of PR #28: ~/.ao/agent-reports/aa-18.md - aa-26 tracker adapter (sibling Go-adapter pattern): #36 / agent-reports/aa-26.md Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(scm): address greptile review on #69 Four fixes from the greptile review of PR #69: 1. CI rollup pagination (P1) — when GraphQL reports pageInfo.hasNextPage=true for the statusCheckRollup contexts, a visible "all passing" set could be hiding a failing context on the next page. ciSummaryFromGraphQL now degrades Passing / Pending / Unknown to CIUnknown in that case; a known CIFailing on the visible page is still safe and is NOT degraded. Also bumped the per-page limit from 50 to 100 (GraphQL's documented max for the contexts connection). Two new tests pin both branches. 2. Empty GraphQL inline fragment (P2) — dropped `... on User { }` from the reviewThreads author selection. The empty selection set was technically invalid GraphQL and a future API tightening could reject the query. __typename already tells us whether the actor is a Bot, so the fragment carried no information. 3. rest.MergeStateStatus dead-code (P2) — the field decoded from the non-existent REST `merge_state_status` was always empty, making the firstNonEmpty fallback dead code. Removed the field and switched the tiebreaker to rest.MergeableState (the actual REST field, upper- cased so the same switch covers both GraphQL and REST shapes). 4. Wrong Accept header on /actions/jobs/{id}/logs (P2) — GitHub's REST API validates the Accept header before issuing the 302 to the log blob; sending text/plain risks a 406. Switched to the canonical application/vnd.github+json; the redirected blob serves text/plain regardless. Verification: - go build ./... clean - go vet ./... clean - golangci-lint run ./... 0 issues - go test -race ./internal/adapters/scm/github/... 48 / 48 PASS Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…pec API, regen) Rebased onto main, which landed several changes that intersect this branch: - writeProjectError is now 3-arg (status derived from project.Error.Kind); the get handler's degenerate-GetResult guard now calls the 3-arg form. - apispec.Spec dropped its YAML() method; add apispec.Embedded() returning the raw embedded bytes, used by the generator's drift + route-parity tests. - #62 removed Runtime/Reactions/ReactionConfig from the project types, so the regenerated openapi.yaml (and frontend schema.d.ts) drop those, and the dead ReactionConfig entry is removed from the schema-name map. - CI go.yml: keep main's new golangci-lint `lint` job alongside the gen-verify job, both reading the Go version from go.mod. go build/vet/test all pass; spec + TS regenerate with no drift; gofmt clean.
Re-integrate the agent-adapter layer onto main's rewritten session lane. main (#62/#66/#67/#68/#69) moved the session manager into session_manager plus a service layer, reworked the domain (Activity, IsTerminated), and removed the tmux runtime. This merge: - Keeps the rich, hooks-capable ports.Agent (in internal/ports) as the canonical agent contract; session_manager.Manager now resolves a real adapter per session via ports.AgentResolver (from cfg.Harness on Spawn, the stored harness on Restore) and builds RuntimeConfig.Argv. - Drops the tmux runtime; standardizes on zellij. - Re-adds the daemon's per-session agent resolver wiring (buildAgentResolver over the claude-code + codex registry, AO_AGENT default), ready to plug into the httpd session-route slot. go build, go vet, and go test -race are green; the zellij/tmux real integration tests are environment-gated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
This PR simplifies the agent session lifecycle model and removes a large amount of now-unneeded runtime, notification, and storage plumbing.
Lifecycle/session model
activity_state,activity_source,activity_last_at, and booleanis_terminated.Runtime simplification
runtime_name/runtime configurability from storage and wiring.ao doctorusing the runtime's minimum supported version.Storage/sqlite cleanup
backend/internal/storage/sqlite/store.npm run sqlcas the explicit sqlc generation workflow.Ports/domain/API cleanup
domainand keepsportsfocused on boundary contracts/DTOs./muxroute.slog.Default().CLI and docs
ao doctorfrom opening/migrating SQLite; daemon remains the only DB migrator/writer.Testing
npm run sqlcnpm run lintcd backend && go test -race ./...