Skip to content

Commit 2226a30

Browse files
committed
fix(k8s): delete auto-created PVCs on sandbox deletion when opted in
Follow-up to #661, addressing the post-merge review by @yinsyim (#661 (comment)). The original PR assumed StorageClass.reclaimPolicy would clean up auto-created PVCs, but reclaimPolicy only governs what happens to the PV *after* the PVC is deleted — it does not delete the PVC when the sandbox/workload terminates. Auto-created PVCs were leaking on every sandbox deletion. This PR fixes that by: - Gating cleanup on the existing deleteOnSandboxTermination PVC field (default false, opt-in), uniform with Docker semantics. Opt-in PVCs are labeled with `opensandbox.io/volume-managed-by=server` and `opensandbox.io/id=<sandbox_id>`. Pre-existing PVCs and opt-out PVCs are never deleted by the server. - Attaching ownerReferences from each opt-in PVC to the just-created workload CR after create_workload succeeds. Kubernetes garbage collection then deletes the PVC whenever the CR goes away — by the delete_sandbox API, by TTL expiry handled in the controller, or by any cascade. The label-based sweep in delete_sandbox remains as a fallback. - Sweeping auto-created PVCs when sandbox creation fails before returning. Previously create_workload / _wait_for_sandbox_ready failures left labeled PVCs orphaned because the caller has no sandbox id to call delete_sandbox with. - Rejecting 400 when the same claim_name appears in multiple Volume entries with inconsistent createIfNotExists / deleteOnSandboxTermination flags (first-wins previously caused either silent leaks or silent deletions). Server changes: - services/k8s/client.py: add list_pvcs, delete_pvc, patch_pvc. - services/k8s/kubernetes_service.py: - _ensure_pvc_volumes: label only on opt-in; reject conflicting flags; return the list of newly auto-created managed PVCs. - new _attach_pvc_owner_references: best-effort patch after create_workload, gated on workload_info exposing apiVersion/kind/uid. - new _cleanup_managed_pvcs: list-by-selector + delete; best-effort. - delete_sandbox: call cleanup after workload deletion succeeds or 404s, never on transient errors. - create_sandbox: finally-sweep on any failure before return. - services/k8s/batchsandbox_provider.py and agent_sandbox_provider.py: return apiVersion and kind alongside name and uid so the ownerReference can be constructed. Spec, docs, SDKs, RBAC: - specs/sandbox-lifecycle.yml + server/api/schema.py: replace the stale "Docker only / no effect on K8s / StorageClass reclaim policy handles it" description with correct cross-backend semantics. The reclaim-policy reference remains but is reframed (the PVC delete triggers it). - docs/public/api/spec-inline.js: regenerated via `pnpm docs:spec` so the published API docs reflect the new description. - SDKs: regenerated Python lifecycle and JS lifecycle from the spec; hand-edited Python sandbox model, C# model, and Kotlin model docstrings. - kubernetes/charts/opensandbox-server: grant `delete` and `list` on persistentvolumeclaims so the cleanup path runs under the stock chart. Tests: - Label stamping for opt-in and opt-out paths. - Cleanup on success, on 404 sweep, no-cleanup on transient error, best-effort tolerance of list_pvcs failure, no-touch for pre-existing PVCs. - Conflict validation rejects mismatched flags and allows matching ones. - create_sandbox finally-sweep covers create_workload and wait_for_ready failures. - _attach_pvc_owner_references patches each managed PVC with the right body, is best-effort on failure, and skips when owner info is incomplete. End-to-end verified on a real cluster: - opt-in PVC carries labels and ownerReferences; delete_sandbox API cleans it via the label sweep; direct CR delete (simulating TTL) is cleaned via K8s GC. - opt-out PVC has no labels or ownerReferences and survives any CR delete. - Pre-existing PVCs are untouched.
1 parent deffb33 commit 2226a30

16 files changed

Lines changed: 631 additions & 38 deletions

File tree

docs/public/api/spec-inline.js

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

kubernetes/charts/opensandbox-server/templates/server.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ rules:
2828
verbs: ["create", "delete", "get"]
2929
- apiGroups: [""]
3030
resources: ["persistentvolumeclaims"]
31-
verbs: ["create", "get"]
31+
verbs: ["create", "delete", "get", "list", "patch"]
3232
- apiGroups: ["node.k8s.io"]
3333
resources: ["runtimeclasses"]
3434
verbs: ["get", "list"]

sdks/sandbox/csharp/src/OpenSandbox/Models/Sandboxes.cs

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -163,7 +163,8 @@ public class PVC
163163
public bool? CreateIfNotExists { get; set; }
164164

165165
/// <summary>
166-
/// Gets or sets whether auto-created Docker volumes should be removed on sandbox deletion.
166+
/// Gets or sets whether the auto-created volume (Docker named volume or Kubernetes PVC)
167+
/// should be removed on sandbox deletion. Pre-existing volumes are never removed.
167168
/// </summary>
168169
[JsonPropertyName("deleteOnSandboxTermination")]
169170
public bool? DeleteOnSandboxTermination { get; set; }

sdks/sandbox/javascript/src/api/lifecycle.ts

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1302,9 +1302,9 @@ export interface components {
13021302
/**
13031303
* @description When true, the volume is automatically removed when the sandbox
13041304
* is deleted. Only applies to volumes that were auto-created by the
1305-
* server (Docker only). Pre-existing volumes are never removed.
1306-
* Has no effect on Kubernetes PVCs, whose lifecycle is managed by
1307-
* the StorageClass reclaim policy.
1305+
* server on this request; pre-existing volumes are never removed.
1306+
* For Kubernetes, the resulting PVC delete triggers the bound PV's
1307+
* StorageClass reclaim policy (`Retain`/`Delete`).
13081308
* @default false
13091309
*/
13101310
deleteOnSandboxTermination: boolean;

sdks/sandbox/kotlin/sandbox/src/main/kotlin/com/alibaba/opensandbox/sandbox/domain/models/sandboxes/SandboxModels.kt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -406,7 +406,7 @@ class Host private constructor(
406406
* @property claimName Name of the platform volume. In Kubernetes this is the PVC name;
407407
* in Docker this is the named volume name.
408408
* @property createIfNotExists When true (default), auto-create volume if absent.
409-
* @property deleteOnSandboxTermination When true, delete auto-created Docker volume on sandbox deletion.
409+
* @property deleteOnSandboxTermination When true, delete the auto-created volume (Docker named volume or Kubernetes PVC) on sandbox deletion. Pre-existing volumes are never removed.
410410
* @property storageClass Kubernetes StorageClass for auto-created PVCs. Null means default class.
411411
* @property storage PVC storage request for auto-created PVCs (e.g. "1Gi").
412412
* @property accessModes Access modes for auto-created PVCs (e.g. ["ReadWriteOnce"]).

sdks/sandbox/python/src/opensandbox/api/lifecycle/models/pvc.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -46,9 +46,9 @@ class PVC:
4646
Default: True.
4747
delete_on_sandbox_termination (bool | Unset): When true, the volume is automatically removed when the sandbox
4848
is deleted. Only applies to volumes that were auto-created by the
49-
server (Docker only). Pre-existing volumes are never removed.
50-
Has no effect on Kubernetes PVCs, whose lifecycle is managed by
51-
the StorageClass reclaim policy.
49+
server on this request; pre-existing volumes are never removed.
50+
For Kubernetes, the resulting PVC delete triggers the bound PV's
51+
StorageClass reclaim policy (`Retain`/`Delete`).
5252
Default: False.
5353
storage_class (None | str | Unset): Kubernetes StorageClass name for auto-created PVCs. Null means
5454
use the cluster default. Ignored for Docker volumes.

sdks/sandbox/python/src/opensandbox/models/sandboxes.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -200,8 +200,9 @@ class PVC(BaseModel):
200200
default=False,
201201
alias="deleteOnSandboxTermination",
202202
description=(
203-
"When true, auto-created Docker volume is removed on sandbox deletion. "
204-
"Ignored for Kubernetes PVCs."
203+
"When true, the auto-created volume (Docker named volume or "
204+
"Kubernetes PVC) is removed on sandbox deletion. Pre-existing "
205+
"volumes are never removed."
205206
),
206207
)
207208
storage_class: str | None = Field(

server/opensandbox_server/api/schema.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -180,9 +180,9 @@ class PVC(BaseModel):
180180
description=(
181181
"When true, the volume is automatically removed when the sandbox is "
182182
"deleted. Only applies to volumes that were auto-created by the server "
183-
"(Docker only). Pre-existing volumes are never removed. Has no effect "
184-
"on Kubernetes PVCs, whose lifecycle is managed by the StorageClass "
185-
"reclaim policy."
183+
"on this request; pre-existing volumes are never removed. For "
184+
"Kubernetes, the resulting PVC delete then triggers the bound PV's "
185+
"StorageClass reclaim policy (Retain/Delete)."
186186
),
187187
)
188188

server/opensandbox_server/services/k8s/agent_sandbox_provider.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -210,6 +210,8 @@ def create_workload(
210210
return {
211211
"name": created["metadata"]["name"],
212212
"uid": created["metadata"]["uid"],
213+
"apiVersion": f"{self.group}/{self.version}",
214+
"kind": "Sandbox",
213215
}
214216

215217
def _apply_platform_node_selector(

server/opensandbox_server/services/k8s/batchsandbox_provider.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -296,6 +296,8 @@ def create_workload(
296296
return {
297297
"name": created["metadata"]["name"],
298298
"uid": created["metadata"]["uid"],
299+
"apiVersion": f"{self.group}/{self.version}",
300+
"kind": "BatchSandbox",
299301
}
300302

301303
def _apply_platform_node_selector(
@@ -361,6 +363,8 @@ def _create_workload_from_pool(
361363
return {
362364
"name": created["metadata"]["name"],
363365
"uid": created["metadata"]["uid"],
366+
"apiVersion": f"{self.group}/{self.version}",
367+
"kind": "BatchSandbox",
364368
}
365369

366370
def _extract_template_pod_extras(self) -> tuple[list[Dict[str, Any]], list[Dict[str, Any]]]:

0 commit comments

Comments
 (0)