Tracker intake hardening slice: rate-limit budget, spawn cap, sandbox boundary, and write-back port

## Summary

PR #2325 (`feat(tracker): GitHub issue intake`) closes #2324 and lands the per-project poll-and-spawn path. This issue proposes the **hardening slice** that needs to land alongside (or as a follow-up fast-follow to) #2325 before issue intake is safe to enable on a real repo with more than a handful of eligible open issues.

This is **not** a duplicate of #2282 / #2324 / #2325 / #2288 — those define the intake product. This issue focuses on the **durable-boundary, sandbox, and rate-limit budget** mechanics that PR #2325's body says it does *not* cover (per-project 5-min backoff on failure does not bound parallelism on a healthy first cycle, and the read path goes through `gh` which is the active critical bottleneck).

If the maintainer prefers, this can be merged as a hardening checklist on PR #2325 itself rather than a separate issue; the goal is to surface the gaps, not to require a separate PR.

## Gaps not currently addressed by #2325 (verified 2026-07-01 against PR #2325 body)

| # | Gap | Source | Why it matters for intake |
|---|-----|--------|----------------------------|
| 1 | No `max_concurrent_intake_spawns` global cap on workers started by `trackerintake.Observer`. Per-project 5-min backoff (#2325) bounds failure, not success-path parallelism. | Reviewer C failure mode #1 (worker amplification storm). Related: #918 (`maxConcurrentSessions` enforcement, codex-scoped today). | A noisy label or a long backlog + a clean first poll can spawn N agents in one 30-second cycle → SQLite WAL contention, loopback HTTP queue back-pressure, host OOM, GitHub 429s within minutes. |
| 2 | No backfill / first-poll flood guard. The poll loop fires every tick once intake is enabled; there is no `WHERE remote_updated_at > cursor` initial-bounds clause visible from #2325's description. | Reviewer A `tracker_sync_cursor` recommendation; Reviewer C failure modes #1 + #7 (CDC durability gap on crash). | After daemon restart, or when intake is first enabled on a project with N open issues, the loop can dispatch once per poll cycle for each eligible issue, exceeding #1 even at modest backlogs. |
| 3 | `ao spawn` does not reject closed/cancelled issues (#2063 is open). | #2063 + #2064 open. | Auto-intake amplifies #2063 from a one-off CLI footgun into a permanent rate-limit + worker-budget leak: every label flip on a `closed` issue triggers a fresh session. |
| 4 | Tracker reads go through `gh` CLI subprocesses; the dashboard read path is benchmarked as the dominant 20–40s bottleneck, not tmux/ps (#1885, `priority: critical`). | #1885 critical. | A 30-second poll cadence plus a 20–40s `gh` round-trip means the poller effectively *cannot* sustain 30s cadence once it has more than a few eligible issues per project. The intake loop will visibly fall behind the moment it is enabled. |
| 5 | No sandbox boundary at worker spawn. Issue body is rendered into the worker prompt and the worker inherits the daemon environment (which holds `AO_GITHUB_TOKEN` / `gh auth token`). | Reviewer C top risk (prompt injection via issue body → secret exfiltration). | A malicious issue body → prompt-injected worker → exfil. Read-only-ness on the tracker side does not protect you from the worker side. |
| 6 | No write-back path / no GitHub App. #2325 is explicitly read-only toward GitHub. The state-transition gap (issue #40, "in-progress" / "in-review" reverse-map label is *read* but never *written*) means status propagation from worker → issue is unsupported. | Reviewer B GitHub-App-vs-OAuth note; existing tracker.go comment re #40. | When an intake-spawned worker transitions its PR through `ci_failed → review_pending → mergeable`, the originating issue cannot be transitioned to `in_progress` (or `done` on merge) without a write-back path. For org rollouts the recommended path is a GitHub App, not user tokens. |

## Proposed v1 hardening (composes into PR #2325 or a fast-follow PR)

P1 — **must have before any project enables intake in production:**

1. **Durable `tracker_sync_cursor` row per project.** Poller emits issues with `updated_at > cursor` only; the cursor advances atomically in the same SQLite transaction as the `issue_observed_at` write. (Closes C failure mode #7.)
2. **`max_concurrent_intake_spawns` knob, default 2, env-driven.** Backing semaphore in Session Manager, decoupled from poll cadence. (Closes C failure modes #1 + #5.)
3. **First-poll throttle.** On the first N cycles after intake is enabled (or after a tracked downtime), cap `dispatch_per_cycle = min(issues_per_cycle, ramp_limit)`; ramp limit configurable, default 1. (Closes C failure mode #1.)
4. **Closed/cancelled short-circuit at intake time** — even before the observer dispatches. (#2063 follow-up becomes load-bearing.)
5. **Tracker reads via the direct REST adapter, not `gh` subprocess.** Fall back to `gh` only when no token env is set. (Unblocks #1885 critical; required for any cadence below ~60s.)

P2 — **must have before intake is a default-on recommendation:**

6. **Sandbox boundary on spawned worker runtime.** No `GITHUB_TOKEN`, no `~/.ssh`, no daemon env inherited. Issue body pinned as untrusted data in the prompt template (quoted / delimited), never rendered as instructions. (Closes C top risk.)
7. **Optional write-back port `TrackerWriter`** behind a separate interface, opt-in per project. v1 implementation: comment the PR link back to the issue on session start; on PR merge, optionally transition `open → done`. (Closes A's "separate write-only port" verdict; complements #40.)
8. **`tracker_intake_enabled = false` by default.** Document the recommendation that projects opt in one at a time so the rate-limit budget per token stays bounded.

P3 — **nice to have, document the deferred scope explicitly:**

9. GitHub App migration plan (per-installation tokens, webhook ingest of `issues.opened` / `issues.labeled`) so org rollouts avoid the user-token RPS ceiling. Webhook is the de-facto pattern for org-wide write-back; the polling path remains the fallback health-check loop.
10. Multi-provider (Linear, Jira, GitLab) parallel intake behind the same `TrackerResolver` seam. (Paired with #2288.)

## Evidence bar for "done" (proposed merge checklist)

- Unit test: `issues.state_transition_at` rejects every illegal flip, including machine-issued `in_progress → done`.
- Integration test: an injected fixture of 50 eligible issues + a single cycle dispatches ≤ `max_concurrent_intake_spawns` workers; restart-of-daemon dispatches 0 (cursor + durable fact prevent dup spawn).
- Load test: 10k issues in SQLite, 1-hour soak, sustained RSS ≤ baseline + 20%, zero GitHub 429s. Token remaining ≥ 10% at all times.
- Security test: spawned worker process env contains no `GITHUB_TOKEN`, `~/.ssh` unreadable inside worker mount, egress denied by default. An injection-laden issue body never reaches shell / agent context as code.
- Crash test: kill daemon mid-poll, restart, assert no duplicate worker for issues already in `in_progress`.
- Schema migration is additive (`issue_observations` table), no rewrite of `pull_requests` / `pr_observations`.
- Sign-off: PR-side SCM Observer owner confirms the intake observer does not touch the PR poller's write path.

## Context

- Architecture verified 2026-07-01 from `AgentWrapper/agent-orchestrator` `main`.
- PR #2325 body inspected via `gh api repos/AgentWrapper/agent-orchestrator/pulls/2325`.
- All linked issue numbers verified live via `gh api` on the same date.
- Drafted from a `/advice` review that surfaced both the in-flight work and the gaps above; happy to fold this into PR #2325 directly if that's the maintainer's preference.

/cc the GitHub issue templates request (#2210) — this issue would arguably have been filed into that template slot had it existed.


#	Gap	Source	Why it matters for intake
1	No `max_concurrent_intake_spawns` global cap on workers started by `trackerintake.Observer`. Per-project 5-min backoff (#2325) bounds failure, not success-path parallelism.	Reviewer C failure mode #1 (worker amplification storm). Related: #918 (`maxConcurrentSessions` enforcement, codex-scoped today).	A noisy label or a long backlog + a clean first poll can spawn N agents in one 30-second cycle → SQLite WAL contention, loopback HTTP queue back-pressure, host OOM, GitHub 429s within minutes.
2	No backfill / first-poll flood guard. The poll loop fires every tick once intake is enabled; there is no `WHERE remote_updated_at > cursor` initial-bounds clause visible from #2325's description.	Reviewer A `tracker_sync_cursor` recommendation; Reviewer C failure modes #1 + #7 (CDC durability gap on crash).	After daemon restart, or when intake is first enabled on a project with N open issues, the loop can dispatch once per poll cycle for each eligible issue, exceeding #1 even at modest backlogs.
3	`ao spawn` does not reject closed/cancelled issues (#2063 is open).	#2063 + #2064 open.	Auto-intake amplifies #2063 from a one-off CLI footgun into a permanent rate-limit + worker-budget leak: every label flip on a `closed` issue triggers a fresh session.
4	Tracker reads go through `gh` CLI subprocesses; the dashboard read path is benchmarked as the dominant 20–40s bottleneck, not tmux/ps (#1885, `priority: critical`).	#1885 critical.	A 30-second poll cadence plus a 20–40s `gh` round-trip means the poller effectively cannot sustain 30s cadence once it has more than a few eligible issues per project. The intake loop will visibly fall behind the moment it is enabled.
5	No sandbox boundary at worker spawn. Issue body is rendered into the worker prompt and the worker inherits the daemon environment (which holds `AO_GITHUB_TOKEN` / `gh auth token`).	Reviewer C top risk (prompt injection via issue body → secret exfiltration).	A malicious issue body → prompt-injected worker → exfil. Read-only-ness on the tracker side does not protect you from the worker side.
6	No write-back path / no GitHub App. #2325 is explicitly read-only toward GitHub. The state-transition gap (issue #40, "in-progress" / "in-review" reverse-map label is read but never written) means status propagation from worker → issue is unsupported.	Reviewer B GitHub-App-vs-OAuth note; existing tracker.go comment re #40.	When an intake-spawned worker transitions its PR through `ci_failed → review_pending → mergeable`, the originating issue cannot be transitioned to `in_progress` (or `done` on merge) without a write-back path. For org rollouts the recommended path is a GitHub App, not user tokens.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracker intake hardening slice: rate-limit budget, spawn cap, sandbox boundary, and write-back port #2344

Summary

Gaps not currently addressed by #2325 (verified 2026-07-01 against PR #2325 body)

Proposed v1 hardening (composes into PR #2325 or a fast-follow PR)

Evidence bar for "done" (proposed merge checklist)

Context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Tracker intake hardening slice: rate-limit budget, spawn cap, sandbox boundary, and write-back port #2344

Description

Summary

Gaps not currently addressed by #2325 (verified 2026-07-01 against PR #2325 body)

Proposed v1 hardening (composes into PR #2325 or a fast-follow PR)

Evidence bar for "done" (proposed merge checklist)

Context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions