Skip to content

Commit 021bb7e

Browse files
test(ci): deploy-failure auto-debug path + anon-stack gap + with_failed_deploy factory (#70) (#269)
Failure-diagnosis CI integration tests for the deploy-failure AUTO-DEBUG PATH (docs/ci/02-FAILURE-DIAGNOSIS-AND-AUTODEBUG.md §5), api side. 1. Auto-debug PATH integration test (deploy_autodebug_path_test.go): seeds a status=failed deployment + an older lifecycle row + a failure_autopsy deployment_events row (reason/exit_code/last_lines/ hint) against a real test DB, then asserts the full agent debug loop as ONE coherent contract: - GET /api/v1/deployments/:id → status=failed + non-empty error_message (the one-line cause) - GET /api/v1/deployments/:id/events → autopsy with reason + non-empty last_lines + hint, newest-first, count correct - auth-negative: no/invalid bearer → 401 - cross-team: another team's token → 404 (no existence leak) 2. Anonymous-stack failure-diagnosis contract (stack_anon_failure_diag_ test.go): drives an anon stack (NULL team_id) to failed, asserts GET /stacks/:slug (slug-bearer, no auth) returns status=failed and the raw build error is persisted on the service row, then PINS the documented gap by enumerating the LIVE router (router.New + GetRoutes) and asserting NO /stacks/:slug/events route exists. Adding a stack-autopsy endpoint later reds this test deliberately. Anon failure-diagnosis is status + logs only (no classified autopsy). 3. with_failed_deploy factory flag (internal_e2e_account.go): cohort- only, inert-by-default pre-seed of ONE failed deployment + ONE failure_autopsy event via the production deploy models (CreateDeployment → UpdateDeploymentStatus → UpsertDeploymentAutopsy), surfaced as failed_deploy_id, reaped with the team. Lets the web wave load /app/deployments/:id and render the FailureAutopsyPanel against a real backend. Tests: seeds exactly one failed deploy + one autopsy with the factory payload; omitting seeds none; seam-driven seed_failed 503; whitebox sqlmock coverage of all three seed error branches. The producer↔consumer schema parity (worker autopsy write ↔ api /events read) is asserted in the worker PR's deploy_failure_autopsy_schema_ parity_test.go (cross-referenced). make gate: green except pre-existing local-only flakes outside this diff (internal/models/TestLinkGitHubID DB-pollution, handlers TestQueue_CredIssueError NATS flake). CI (fresh DB, Go 1.25) is authoritative. New tests + donebar/manner-matrix/error-envelope guards all pass. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
1 parent 3ab0350 commit 021bb7e

7 files changed

Lines changed: 859 additions & 17 deletions
Lines changed: 243 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,243 @@
1+
package handlers_test
2+
3+
// deploy_autodebug_path_test.go — the end-to-end AUTO-DEBUG PATH integration
4+
// test for a FAILED deployment (task #70, docs/ci/02-FAILURE-DIAGNOSIS-AND-
5+
// AUTODEBUG.md §5.1).
6+
//
7+
// The pieces of the failure-diagnosis surface are each unit/integration-tested
8+
// elsewhere (deploy_events_endpoint_test.go: ordering/empty/clamp/cross-team;
9+
// deploy_buildfailed_autopsy_test.go: the autopsy "failure" field on GET
10+
// /deploy/:id). This file asserts them as ONE coherent contract — the exact
11+
// loop an MCP agent or the dashboard FailureAutopsyPanel runs to diagnose a
12+
// failed deploy WITHOUT cluster access:
13+
//
14+
// 1. GET /api/v1/deployments/:id → status="failed" + non-empty
15+
// error_message (the one-line cause
16+
// the worker autopsy stamped).
17+
// 2. GET /api/v1/deployments/:id/events → events[] carrying the
18+
// failure_autopsy with reason +
19+
// non-empty last_lines + hint, newest
20+
// first, count correct.
21+
// 3. auth-negative: no / invalid bearer → 401 (the surface is gated).
22+
// 4. cross-team: another team's token → 404 (you can NOT read another
23+
// team's failure — never 403, no
24+
// existence leak).
25+
//
26+
// This mirrors the seeding pattern in deploy_events_endpoint_test.go and
27+
// deploy_lifecycle_block_integration_test.go (real Postgres test DB via
28+
// testhelpers.SetupTestDB, the production RequireAuth chain via
29+
// NewTestAppWithServices), so the HTTP envelope, route resolution, JWT
30+
// middleware, and model SQL path are exercised end-to-end against the same SQL
31+
// the production handler issues. The producer side (worker autopsy) and this
32+
// consumer side (the /events + /:id read) are proven schema-compatible by the
33+
// worker's deploy_failure_autopsy_schema_parity_test.go.
34+
35+
import (
36+
"encoding/json"
37+
"net/http"
38+
"net/http/httptest"
39+
"testing"
40+
41+
"github.com/google/uuid"
42+
"github.com/stretchr/testify/assert"
43+
"github.com/stretchr/testify/require"
44+
45+
"instant.dev/internal/testhelpers"
46+
)
47+
48+
// adbDeploymentEnvelope is the GET /api/v1/deployments/:id response shape (the
49+
// item.error one-liner is the agent's first debug read).
50+
type adbDeploymentEnvelope struct {
51+
OK bool `json:"ok"`
52+
Item struct {
53+
AppID string `json:"app_id"`
54+
Status string `json:"status"`
55+
Error string `json:"error"`
56+
} `json:"item"`
57+
}
58+
59+
// adbEventsEnvelope is the GET /api/v1/deployments/:id/events response shape.
60+
type adbEventsEnvelope struct {
61+
OK bool `json:"ok"`
62+
DeploymentID string `json:"deployment_id"`
63+
Events []struct {
64+
Kind string `json:"kind"`
65+
Reason string `json:"reason"`
66+
ExitCode *int `json:"exit_code"`
67+
Event string `json:"event"`
68+
LastLines []string `json:"last_lines"`
69+
Hint string `json:"hint"`
70+
CreatedAt string `json:"created_at"`
71+
} `json:"events"`
72+
Count int `json:"count"`
73+
}
74+
75+
// TestDeployAutodebugPath_FailedDeploy_FullAgentLoop is the §5.1 contract:
76+
// status+error_message AND the events autopsy AND auth-negative AND cross-team,
77+
// asserted as one coherent debug-path test against a real test DB.
78+
func TestDeployAutodebugPath_FailedDeploy_FullAgentLoop(t *testing.T) {
79+
db, cleanDB := testhelpers.SetupTestDB(t)
80+
defer cleanDB()
81+
rdb, cleanRedis := testhelpers.SetupTestRedis(t)
82+
defer cleanRedis()
83+
84+
teamID := testhelpers.MustCreateTeamDB(t, db, "pro")
85+
otherTeamID := testhelpers.MustCreateTeamDB(t, db, "pro")
86+
ownerJWT := testhelpers.MustSignSessionJWT(t,
87+
"aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa", teamID, "adb-owner@example.com")
88+
otherJWT := testhelpers.MustSignSessionJWT(t,
89+
"bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb", otherTeamID, "adb-other@example.com")
90+
91+
// Seed a FAILED deployment with the one-line error_message the worker
92+
// autopsy stamps ("<reason>: <hint snippet>").
93+
depID := uuid.New()
94+
appID := "adb" + uuid.NewString()[:8]
95+
const wantErrorMessage = "OOMKilled: Your app exceeded its memory limit and was killed by the kernel."
96+
_, err := db.Exec(`
97+
INSERT INTO deployments (id, team_id, app_id, port, tier, status, error_message)
98+
VALUES ($1, $2, $3, 8080, 'pro', 'failed', $4)
99+
`, depID, teamID, appID, wantErrorMessage)
100+
require.NoError(t, err)
101+
102+
// Older lifecycle row + newer failure_autopsy row (the real autopsy shape).
103+
_, err = db.Exec(`
104+
INSERT INTO deployment_events
105+
(deployment_id, kind, reason, exit_code, event, last_lines, hint, created_at)
106+
VALUES ($1, 'lifecycle', 'image_pull_failed', NULL, 'ErrImagePull',
107+
'["pulling image","ErrImagePull"]', 'check the image reference',
108+
now() - interval '10 minutes')
109+
`, depID)
110+
require.NoError(t, err)
111+
112+
autopsyLastLines := []string{
113+
"npm ERR! code ELIFECYCLE",
114+
"FATAL: out of memory: Killed process 1 (node)",
115+
}
116+
_, err = db.Exec(`
117+
INSERT INTO deployment_events
118+
(deployment_id, kind, reason, exit_code, event, last_lines, hint, created_at)
119+
VALUES ($1, 'failure_autopsy', 'OOMKilled', 137, 'OOMKilling: Memory cgroup out of memory',
120+
'["npm ERR! code ELIFECYCLE","FATAL: out of memory: Killed process 1 (node)"]',
121+
'Your app exceeded its memory limit and was killed by the kernel.',
122+
now() - interval '1 minute')
123+
`, depID)
124+
require.NoError(t, err)
125+
126+
app, cleanApp := testhelpers.NewTestAppWithServices(t, db, rdb,
127+
"postgres,redis,mongodb,queue,webhook,storage,deploy")
128+
defer cleanApp()
129+
130+
// ── Step 1: GET /api/v1/deployments/:id → status=failed + error_message ──
131+
t.Run("status_and_error_message", func(t *testing.T) {
132+
req := httptest.NewRequest(http.MethodGet, "/api/v1/deployments/"+appID, nil)
133+
req.Header.Set("Authorization", "Bearer "+ownerJWT)
134+
req.Header.Set("X-Forwarded-For", "10.70.0.1")
135+
resp, err := app.Test(req, 5000)
136+
require.NoError(t, err)
137+
defer resp.Body.Close()
138+
require.Equal(t, http.StatusOK, resp.StatusCode)
139+
140+
var env adbDeploymentEnvelope
141+
require.NoError(t, json.NewDecoder(resp.Body).Decode(&env))
142+
assert.True(t, env.OK)
143+
assert.Equal(t, appID, env.Item.AppID)
144+
assert.Equal(t, "failed", env.Item.Status,
145+
"the agent's first read must show the deploy is failed")
146+
assert.NotEmpty(t, env.Item.Error,
147+
"error_message must be non-empty — it is the one-line cause the agent acts on")
148+
assert.Equal(t, wantErrorMessage, env.Item.Error)
149+
})
150+
151+
// ── Step 2: GET /api/v1/deployments/:id/events → autopsy timeline ────────
152+
t.Run("events_autopsy_timeline", func(t *testing.T) {
153+
req := httptest.NewRequest(http.MethodGet, "/api/v1/deployments/"+appID+"/events", nil)
154+
req.Header.Set("Authorization", "Bearer "+ownerJWT)
155+
req.Header.Set("X-Forwarded-For", "10.70.0.2")
156+
resp, err := app.Test(req, 5000)
157+
require.NoError(t, err)
158+
defer resp.Body.Close()
159+
require.Equal(t, http.StatusOK, resp.StatusCode)
160+
161+
var env adbEventsEnvelope
162+
require.NoError(t, json.NewDecoder(resp.Body).Decode(&env))
163+
assert.True(t, env.OK)
164+
assert.Equal(t, depID.String(), env.DeploymentID,
165+
"deployment_id must echo the canonical UUID the agent can re-query")
166+
assert.Equal(t, 2, env.Count)
167+
require.Len(t, env.Events, 2)
168+
169+
// Newest first (DESC by created_at): the autopsy row leads.
170+
autopsy := env.Events[0]
171+
assert.Equal(t, "failure_autopsy", autopsy.Kind,
172+
"the dedicated classified row is kind=failure_autopsy")
173+
assert.Equal(t, "OOMKilled", autopsy.Reason,
174+
"reason is the machine-readable classification the agent branches on")
175+
require.NotNil(t, autopsy.ExitCode)
176+
assert.Equal(t, 137, *autopsy.ExitCode)
177+
assert.NotEmpty(t, autopsy.LastLines,
178+
"last_lines (the real build/pod error tail) MUST be non-empty — "+
179+
"it is the surface the agent reads to fix the Dockerfile/config")
180+
assert.Equal(t, autopsyLastLines, autopsy.LastLines)
181+
assert.NotEmpty(t, autopsy.Hint,
182+
"hint is the plain-language remedy the agent acts on")
183+
assert.Contains(t, autopsy.Hint, "memory")
184+
assert.NotEmpty(t, autopsy.CreatedAt)
185+
186+
// Older row trails.
187+
assert.Equal(t, "image_pull_failed", env.Events[1].Reason, "older row trails (DESC)")
188+
assert.Equal(t, "lifecycle", env.Events[1].Kind)
189+
})
190+
191+
// ── Step 3: auth-negative — the debug surface is gated ───────────────────
192+
t.Run("auth_negative_401", func(t *testing.T) {
193+
// No bearer.
194+
reqNoAuth := httptest.NewRequest(http.MethodGet, "/api/v1/deployments/"+appID+"/events", nil)
195+
reqNoAuth.Header.Set("X-Forwarded-For", "10.70.0.3")
196+
respNoAuth, err := app.Test(reqNoAuth, 5000)
197+
require.NoError(t, err)
198+
defer respNoAuth.Body.Close()
199+
assert.Equal(t, http.StatusUnauthorized, respNoAuth.StatusCode,
200+
"no bearer → 401 (events surface is RequireAuth)")
201+
202+
// Garbage bearer.
203+
reqBad := httptest.NewRequest(http.MethodGet, "/api/v1/deployments/"+appID+"/events", nil)
204+
reqBad.Header.Set("Authorization", "Bearer not-a-valid-jwt")
205+
reqBad.Header.Set("X-Forwarded-For", "10.70.0.4")
206+
respBad, err := app.Test(reqBad, 5000)
207+
require.NoError(t, err)
208+
defer respBad.Body.Close()
209+
assert.Equal(t, http.StatusUnauthorized, respBad.StatusCode,
210+
"invalid bearer → 401")
211+
})
212+
213+
// ── Step 4: cross-team — you can NOT read another team's failure ─────────
214+
t.Run("cross_team_404", func(t *testing.T) {
215+
// /:id (status) read.
216+
reqGet := httptest.NewRequest(http.MethodGet, "/api/v1/deployments/"+appID, nil)
217+
reqGet.Header.Set("Authorization", "Bearer "+otherJWT)
218+
reqGet.Header.Set("X-Forwarded-For", "10.70.0.5")
219+
respGet, err := app.Test(reqGet, 5000)
220+
require.NoError(t, err)
221+
defer respGet.Body.Close()
222+
require.Equal(t, http.StatusNotFound, respGet.StatusCode,
223+
"cross-team GET /:id must be 404, never 403 (no existence leak)")
224+
225+
// /events read.
226+
reqEv := httptest.NewRequest(http.MethodGet, "/api/v1/deployments/"+appID+"/events", nil)
227+
reqEv.Header.Set("Authorization", "Bearer "+otherJWT)
228+
reqEv.Header.Set("X-Forwarded-For", "10.70.0.6")
229+
respEv, err := app.Test(reqEv, 5000)
230+
require.NoError(t, err)
231+
defer respEv.Body.Close()
232+
require.Equal(t, http.StatusNotFound, respEv.StatusCode,
233+
"cross-team /events must be 404, never 403 (no existence leak)")
234+
235+
var envelope struct {
236+
OK bool `json:"ok"`
237+
Error string `json:"error"`
238+
}
239+
require.NoError(t, json.NewDecoder(respEv.Body).Decode(&envelope))
240+
assert.False(t, envelope.OK)
241+
assert.Equal(t, "not_found", envelope.Error)
242+
})
243+
}

0 commit comments

Comments
 (0)