Skip to content

Commit ec4cbfe

Browse files
feat(deploy): scale-to-zero — Scale() compute method + wake endpoint (Task #54)
API half of scale-to-zero (idle descheduling). Flag-gated behind DEPLOY_SCALE_TO_ZERO_ENABLED (default OFF) — fully inert when off. - migration 068: deployments.last_activity_at / scaled_to_zero / always_on (+ partial idle-candidate index; backfill last_activity_at from updated_at). - compute.Provider.Scale(appID, replicas): k8s patches Deployment replicas in place (NotFound = no-op so a stale row can't wedge the scaler; idempotent on already-at-target); noop logs + no-ops. - POST /deploy/:id/wake: explicit fast wake — scales back to 1 + clears sleep state. 501 when flag off (no scale, no DB write — proven by flag-off test). Documented cold-start contract (api is not in the request path; transparent wake-on-request needs an activator, out of scope for v1). - model helpers: MarkDeploymentScaledToZero (CAS: healthy + not-zeroed + not-always-on), WakeDeployment, SetDeploymentAlwaysOn; redeploy (MarkDeploymentBuilding) clears scaled_to_zero + bumps last_activity_at. - deploymentToMap surfaces scaled_to_zero/always_on; OpenAPI documents /wake. Tests: k8s Scale (down/wake/idempotent/notfound/get+update errors), noop Scale, wake flag-off 501-inert (panicking provider proves compute is never reached), model CAS/wake/pin/redeploy-clears (DB-gated, run in CI). Awaiting operator enable of DEPLOY_SCALE_TO_ZERO_ENABLED to verify real scale-down in prod. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 5a62f44 commit ec4cbfe

18 files changed

Lines changed: 828 additions & 4 deletions

internal/config/config.go

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -197,6 +197,15 @@ type Config struct {
197197
// Off → /deploy/new rejects source=git with 501; tarball/image unaffected.
198198
DeploySourceGitEnabled bool
199199

200+
// DeployScaleToZeroEnabled gates scale-to-zero (idle descheduling, Task #54).
201+
// Default FALSE: the worker idle-scaler patches idle Deployments to
202+
// replicas=0 and the api wake path (POST /deploy/:id/wake) brings them back.
203+
// Off → the wake endpoint returns 501 and nothing in the api scales an app;
204+
// the worker idle-scaler is independently gated by its own
205+
// DEPLOY_SCALE_TO_ZERO_ENABLED env so the two services share the flag name.
206+
// Enabling it is an operator action (see infra runbook) after a canary.
207+
DeployScaleToZeroEnabled bool
208+
200209
// GitHub App (P4) — install-once push-to-deploy + short-lived installation
201210
// tokens for private-repo clones. Distinct from the GitHub OAuth *login* app
202211
// above (GitHubClientID/Secret). GitHubAppEnabled gates the whole feature:
@@ -501,6 +510,16 @@ func Load() *Config {
501510
cfg.DeploySourceGitEnabled = false
502511
}
503512

513+
// DEPLOY_SCALE_TO_ZERO_ENABLED: default FALSE (off until operator canary).
514+
// Shared flag name with the worker idle-scaler; the api half gates the wake
515+
// endpoint + any api-initiated scale, the worker half gates the idle sweep.
516+
switch strings.ToLower(strings.TrimSpace(os.Getenv("DEPLOY_SCALE_TO_ZERO_ENABLED"))) {
517+
case "true", "1", "yes":
518+
cfg.DeployScaleToZeroEnabled = true
519+
default:
520+
cfg.DeployScaleToZeroEnabled = false
521+
}
522+
504523
// GITHUB_APP_ENABLED: default FALSE (off until the operator registers the
505524
// App and provisions GITHUB_APP_* secrets — see infra/GITHUB-APP-RUNBOOK.md).
506525
switch strings.ToLower(strings.TrimSpace(os.Getenv("GITHUB_APP_ENABLED"))) {
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
-- 068_deploy_scale_to_zero.sql — scale-to-zero (idle descheduling) state columns.
2+
--
3+
-- WHY: a deployed-but-idle app costs a full pod's worth of compute even when it
4+
-- serves zero requests. Scale-to-zero (Task #54) lets the worker patch an idle
5+
-- Deployment to replicas=0 (~$0 compute) and wake it back to replicas=1 on
6+
-- demand. This migration adds the per-deployment state the idle-scaler and the
7+
-- wake path read/write. The whole feature is gated behind the
8+
-- DEPLOY_SCALE_TO_ZERO_ENABLED worker env flag (default OFF), so these columns
9+
-- are inert — populated at create-time but acted upon only when an operator
10+
-- enables the flag.
11+
--
12+
-- Columns:
13+
-- last_activity_at TIMESTAMPTZ — floor "last known activity" marker. Set to
14+
-- now() at create-time, bumped on every wake
15+
-- and on redeploy. The idle-scaler descheduals
16+
-- a Deployment only when
17+
-- now() - last_activity_at > idle_threshold.
18+
--
19+
-- v1 NOTE: the api is NOT in the request path
20+
-- (apps are served by k8s Ingress straight to
21+
-- the per-app Service), and no nginx-ingress
22+
-- request-total scrape is wired yet, so the
23+
-- honest "activity" signal v1 captures is
24+
-- deploy / redeploy / explicit-wake events —
25+
-- NOT per-HTTP-request traffic. A follow-up
26+
-- (documented in the worker job header) will
27+
-- wire an ingress request-counter to bump this
28+
-- column on real traffic for true
29+
-- traffic-based idle detection.
30+
--
31+
-- scaled_to_zero BOOLEAN — true while the app is currently descheduled
32+
-- (replicas=0). The wake path reads this to
33+
-- decide whether a scale-up is needed; the
34+
-- dashboard/agent reads it to show "sleeping".
35+
-- The idle-scaler sets it true on scale-down,
36+
-- the wake path sets it false on scale-up.
37+
--
38+
-- always_on BOOLEAN — per-app opt-out. A pinned app (an operator
39+
-- or Pro+ user who wants zero cold-starts) is
40+
-- never descheduled by the idle-scaler. Default
41+
-- false → eligible for scale-to-zero.
42+
--
43+
-- Idempotent + forward-only. Existing rows get last_activity_at backfilled from
44+
-- updated_at (their most recent known activity) so the idle-scaler does not
45+
-- immediately deschedule every pre-existing deploy the first time the flag is
46+
-- turned on; scaled_to_zero / always_on default to false.
47+
48+
ALTER TABLE deployments
49+
ADD COLUMN IF NOT EXISTS last_activity_at TIMESTAMPTZ,
50+
ADD COLUMN IF NOT EXISTS scaled_to_zero BOOLEAN NOT NULL DEFAULT false,
51+
ADD COLUMN IF NOT EXISTS always_on BOOLEAN NOT NULL DEFAULT false;
52+
53+
-- Backfill: seed last_activity_at from updated_at for every pre-existing row so
54+
-- the very first idle-scaler tick after the flag is enabled treats existing
55+
-- deploys as "recently active" rather than immediately idle. New rows set
56+
-- last_activity_at = now() at INSERT time (see CreateDeployment).
57+
UPDATE deployments
58+
SET last_activity_at = COALESCE(updated_at, created_at, now())
59+
WHERE last_activity_at IS NULL;
60+
61+
-- Partial index: the idle-scaler scans for healthy, eligible, not-yet-zeroed
62+
-- deployments ordered by activity. Excluding always_on + already-zeroed +
63+
-- terminal rows keeps the index narrow and the scan cheap.
64+
CREATE INDEX IF NOT EXISTS idx_deployments_idle_candidates
65+
ON deployments (last_activity_at)
66+
WHERE status = 'healthy'
67+
AND scaled_to_zero = false
68+
AND always_on = false;

internal/handlers/deploy.go

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -572,6 +572,11 @@ func deploymentToMapWithDB(d *models.Deployment, db *sql.DB) fiber.Map {
572572
// image_ref is echoed (caller-supplied, no secret); registry_creds is
573573
// NEVER returned — only registry_creds_set lifecycle metadata.
574574
"source": deploymentSourceOrDefault(d.Source),
575+
// Scale-to-zero state (migration 068). scaled_to_zero=true → the app is
576+
// asleep (replicas=0); the dashboard/agent surfaces "sleeping — wake"
577+
// and POSTs /deploy/:id/wake. always_on=true → pinned (never descheduled).
578+
"scaled_to_zero": d.ScaledToZero,
579+
"always_on": d.AlwaysOn,
575580
}
576581
if d.Source == "image" {
577582
m["image_ref"] = d.ImageRef

internal/handlers/deploy_buildfailed_autopsy_test.go

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,9 @@ func (m *mockProvider) Redeploy(_ context.Context, _ string, _ []byte, _ map[str
5656
func (m *mockProvider) UpdateAccessControl(_ context.Context, _ string, _ bool, _ []string) error {
5757
panic("mockProvider.UpdateAccessControl: not expected in this test")
5858
}
59+
func (m *mockProvider) Scale(_ context.Context, _ string, _ int32) error {
60+
panic("mockProvider.Scale: not expected in this test")
61+
}
5962

6063
// mockBuildLogFetcher wraps mockProvider and adds FetchBuildLogs so the handler
6164
// code can type-assert to compute.BuildLogFetcher.

internal/handlers/deploy_stack_internal_coverage_test.go

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,9 @@ func (covPanicProvider) Redeploy(context.Context, string, []byte, map[string]str
6464
func (covPanicProvider) UpdateAccessControl(context.Context, string, bool, []string) error {
6565
panic("covPanicProvider.UpdateAccessControl: not expected")
6666
}
67+
func (covPanicProvider) Scale(context.Context, string, int32) error {
68+
panic("covPanicProvider.Scale: not expected")
69+
}
6770

6871
// covFailProvider's Deploy/Redeploy return a configurable error. It does NOT
6972
// implement BuildLogFetcher, so fetchBuildLogsForAutopsy returns nil

internal/handlers/deploy_teardown_reconciler_test.go

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,9 @@ func (f *fakeTeardownProvider) Redeploy(context.Context, string, []byte, map[str
7575
func (f *fakeTeardownProvider) UpdateAccessControl(context.Context, string, bool, []string) error {
7676
return nil
7777
}
78+
func (f *fakeTeardownProvider) Scale(context.Context, string, int32) error {
79+
return nil
80+
}
7881

7982
func reconcilerRequireDB(t *testing.T) {
8083
t.Helper()

internal/handlers/deploy_wake.go

Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
package handlers
2+
3+
// deploy_wake.go — explicit wake path for scale-to-zero (Task #54).
4+
//
5+
// WHY AN EXPLICIT WAKE (v1 design decision)
6+
//
7+
// instanode.dev serves a deployed app via a k8s Ingress on
8+
// *.deployment.instanode.dev that routes straight to the per-app Service in
9+
// the instant-deploy-<appID> namespace. The api process is NOT in the request
10+
// path. Transparent wake-on-request (a request to a sleeping app
11+
// auto-scales it and holds the connection until ready) therefore requires an
12+
// ACTIVATOR proxy in front of every app — KEDA http-add-on or a Knative-style
13+
// activator. That is a significant new dependency and is explicitly out of
14+
// scope for the scale-to-zero v1.
15+
//
16+
// v1 ships scale-DOWN (worker idle-scaler) + this fast EXPLICIT wake:
17+
//
18+
// POST /deploy/:id/wake → scales the app back to replicas=1 and returns once
19+
// the scale patch is accepted by k8s. The pod still needs its normal startup
20+
// time before it serves traffic, so a request that races the wake gets the
21+
// app's own cold-start latency (a brief 502/503 from the ingress until the
22+
// pod is Ready), exactly as a fresh rollout would. Callers/dashboard/agents
23+
// surface "sleeping — wake" and retry the app URL after waking.
24+
//
25+
// COLD-START CONTRACT (documented v1 limitation)
26+
//
27+
// - While scaled_to_zero, the app URL returns the ingress's upstream-down
28+
// response (502/503) because there is no pod. This is the documented v1
29+
// trade-off of explicit wake vs a transparent activator.
30+
// - POST /deploy/:id/wake is idempotent: waking an already-awake app just
31+
// refreshes last_activity_at (so it won't be re-descheduled immediately).
32+
// - The endpoint is gated by DEPLOY_SCALE_TO_ZERO_ENABLED. With the flag OFF
33+
// it returns 501 and performs NO scaling and NO DB writes (flag-off inert).
34+
35+
import (
36+
"errors"
37+
"log/slog"
38+
39+
"github.com/gofiber/fiber/v2"
40+
41+
"instant.dev/internal/middleware"
42+
"instant.dev/internal/models"
43+
)
44+
45+
// Wake handles POST /deploy/:id/wake. It scales a (possibly scaled-to-zero)
46+
// deployment back to replicas=1 and clears the scaled_to_zero flag, returning
47+
// the refreshed deployment. See the file header for the cold-start contract.
48+
func (h *DeployHandler) Wake(c *fiber.Ctx) error {
49+
if !h.cfg.DeployScaleToZeroEnabled {
50+
// Flag OFF → fully inert: no scale call, no DB write.
51+
return respondError(c, fiber.StatusNotImplemented, "scale_to_zero_disabled",
52+
"Scale-to-zero is not enabled on this platform")
53+
}
54+
55+
team, err := h.requireTeam(c)
56+
if err != nil {
57+
return err
58+
}
59+
60+
appID := c.Params("id")
61+
d, err := models.GetDeploymentByAppID(c.Context(), h.db, appID)
62+
if err != nil {
63+
var notFound *models.ErrDeploymentNotFound
64+
if errors.As(err, &notFound) {
65+
return respondError(c, fiber.StatusNotFound, "not_found", "Deployment not found")
66+
}
67+
return respondError(c, fiber.StatusServiceUnavailable, "fetch_failed", "Failed to fetch deployment")
68+
}
69+
70+
if d.TeamID != team.ID {
71+
// 404 not 403: never confirm the existence of another team's deployment.
72+
return respondError(c, fiber.StatusNotFound, "not_found", "Deployment not found")
73+
}
74+
75+
// Scale the k8s Deployment back to 1 replica. A NotFound Deployment is a
76+
// no-op inside compute.Scale (the row may have been torn down), so this only
77+
// errors on a real k8s transport failure — surface it so the caller retries.
78+
if d.ProviderID != "" {
79+
if scaleErr := h.compute.Scale(c.Context(), appID, 1); scaleErr != nil {
80+
slog.Warn("deploy.wake.scale_failed",
81+
"app_id", appID, "provider_id", d.ProviderID, "error", scaleErr,
82+
"request_id", middleware.GetRequestID(c))
83+
return respondError(c, fiber.StatusServiceUnavailable, "wake_failed",
84+
"Failed to wake deployment; please retry")
85+
}
86+
}
87+
88+
// DB half: clear scaled_to_zero + bump last_activity_at so the idle-scaler
89+
// doesn't immediately re-deschedule the just-woken app.
90+
if _, dbErr := models.WakeDeployment(c.Context(), h.db, d.ID); dbErr != nil {
91+
slog.Error("deploy.wake.db_failed",
92+
"app_id", appID, "error", dbErr,
93+
"request_id", middleware.GetRequestID(c))
94+
return respondError(c, fiber.StatusServiceUnavailable, "wake_failed",
95+
"Failed to record wake; please retry")
96+
}
97+
98+
// Re-read so the response reflects the cleared flag + new activity stamp.
99+
fresh, err := models.GetDeploymentByID(c.Context(), h.db, d.ID)
100+
if err != nil {
101+
// The scale + DB write already succeeded; a re-read failure shouldn't
102+
// fail the wake. Fall back to the pre-read row with the fields we just set.
103+
d.ScaledToZero = false
104+
fresh = d
105+
}
106+
107+
slog.Info("deploy.woke",
108+
"app_id", appID, "team_id", team.ID,
109+
"request_id", middleware.GetRequestID(c))
110+
111+
return c.JSON(fiber.Map{
112+
"ok": true,
113+
"message": "Deployment woken — the app will be reachable once its pod is Ready (cold start).",
114+
"deployment": deploymentToMapWithDB(fresh, h.db),
115+
})
116+
}
Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
package handlers
2+
3+
// deploy_wake_test.go — scale-to-zero wake endpoint coverage (Task #54).
4+
//
5+
// The flag-off path is the load-bearing safety property (rule: default OFF,
6+
// inert when off). It must short-circuit with 501 BEFORE any auth lookup, scale
7+
// call, or DB write — so this test constructs the handler with the flag off and
8+
// asserts a 501 with no compute interaction. A panicking compute provider proves
9+
// the handler never reaches the scale layer when the flag is off.
10+
11+
import (
12+
"context"
13+
"io"
14+
"net/http"
15+
"net/http/httptest"
16+
"strings"
17+
"testing"
18+
19+
"github.com/gofiber/fiber/v2"
20+
21+
"instant.dev/internal/config"
22+
"instant.dev/internal/providers/compute"
23+
)
24+
25+
// wakePanicProvider satisfies compute.Provider; Scale panics so a flag-off wake
26+
// that incorrectly reaches the compute layer fails loudly.
27+
type wakePanicProvider struct{}
28+
29+
func (wakePanicProvider) Deploy(context.Context, compute.DeployOptions) (*compute.AppDeployment, error) {
30+
panic("Deploy: not expected")
31+
}
32+
func (wakePanicProvider) Status(context.Context, string) (*compute.AppDeployment, error) {
33+
panic("Status: not expected")
34+
}
35+
func (wakePanicProvider) Logs(context.Context, string, bool) (io.ReadCloser, error) {
36+
panic("Logs: not expected")
37+
}
38+
func (wakePanicProvider) Teardown(context.Context, string) error { panic("Teardown: not expected") }
39+
func (wakePanicProvider) Redeploy(context.Context, string, []byte, map[string]string) (*compute.AppDeployment, error) {
40+
panic("Redeploy: not expected")
41+
}
42+
func (wakePanicProvider) UpdateAccessControl(context.Context, string, bool, []string) error {
43+
panic("UpdateAccessControl: not expected")
44+
}
45+
func (wakePanicProvider) Scale(context.Context, string, int32) error {
46+
panic("Scale: not expected when scale-to-zero flag is OFF")
47+
}
48+
49+
// TestWake_FlagOff_Returns501Inert proves the wake endpoint is fully inert when
50+
// DEPLOY_SCALE_TO_ZERO_ENABLED is off: 501 response, and the (panicking)
51+
// compute provider is never touched.
52+
func TestWake_FlagOff_Returns501Inert(t *testing.T) {
53+
h := &DeployHandler{
54+
cfg: &config.Config{DeployScaleToZeroEnabled: false},
55+
compute: wakePanicProvider{},
56+
}
57+
// Mirror the production fiber ErrorHandler so respondError's
58+
// ErrResponseWritten sentinel isn't turned into a 500 by the default handler.
59+
app := fiber.New(fiber.Config{
60+
ErrorHandler: func(_ *fiber.Ctx, err error) error {
61+
if err == ErrResponseWritten {
62+
return nil
63+
}
64+
return err
65+
},
66+
})
67+
app.Post("/deploy/:id/wake", h.Wake)
68+
69+
req := httptest.NewRequest(http.MethodPost, "/deploy/app-123/wake", nil)
70+
resp, err := app.Test(req, 1000)
71+
if err != nil {
72+
t.Fatalf("app.Test: %v", err)
73+
}
74+
defer resp.Body.Close()
75+
76+
if resp.StatusCode != http.StatusNotImplemented {
77+
t.Fatalf("flag-off wake status = %d, want 501", resp.StatusCode)
78+
}
79+
body, _ := io.ReadAll(resp.Body)
80+
if !strings.Contains(string(body), "scale_to_zero_disabled") {
81+
t.Errorf("flag-off body = %q; want scale_to_zero_disabled error code", string(body))
82+
}
83+
}

internal/handlers/openapi.go

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -586,6 +586,21 @@ const openAPISpec = `{
586586
}
587587
}
588588
},
589+
"/deploy/{id}/wake": {
590+
"post": {
591+
"summary": "Wake a scaled-to-zero (sleeping) deployment",
592+
"description": "Scale-to-zero (Task #54). Scales an idle, descheduled app back to one replica and clears its sleeping state. The app becomes reachable once its pod is Ready (a one-time cold start — a request that races the wake gets the ingress's upstream-down response until the pod is up). Idempotent: waking an already-awake app just refreshes its last-activity marker so the idle-scaler won't immediately re-deschedule it. Returns 501 when scale-to-zero is not enabled on the platform (the default). Cross-tenant requests return 404.",
593+
"security": [{ "bearerAuth": [] }],
594+
"parameters": [{ "name": "id", "in": "path", "required": true, "schema": { "type": "string" }, "description": "Deployment id (UUID or short app_id slug)." }],
595+
"responses": {
596+
"200": { "description": "Deployment woken", "content": { "application/json": { "schema": { "$ref": "#/components/schemas/DeployResponse" } } } },
597+
"401": { "description": "Unauthorized" },
598+
"404": { "description": "Not found (or owned by another team)" },
599+
"501": { "description": "scale_to_zero_disabled — scale-to-zero is not enabled on this platform (default)." },
600+
"503": { "description": "wake_failed — transient failure scaling the app; retry." }
601+
}
602+
}
603+
},
589604
"/api/v1/deployments/{id}/make-permanent": {
590605
"post": {
591606
"summary": "Opt a deployment out of the auto-24h TTL",

0 commit comments

Comments
 (0)