|
| 1 | +# Kubernetes AGENTS |
| 2 | + |
| 3 | +You are working on the OpenSandbox Kubernetes operator and task-executor. Treat CRD types and annotation contracts as public interfaces, and prefer additive, backward-compatible changes. |
| 4 | + |
| 5 | +For detailed development setup, architecture deep-dive, coding standards, testing guide, and deployment workflows, see [DEVELOPMENT.md](./DEVELOPMENT.md). |
| 6 | + |
| 7 | +## Scope |
| 8 | + |
| 9 | +- `apis/`: CRD type definitions (BatchSandbox, Pool) |
| 10 | +- `cmd/controller/`: controller manager entry point |
| 11 | +- `cmd/task-executor/`: task-executor entry point |
| 12 | +- `internal/controller/`: BatchSandbox and Pool reconcilers, allocator, eviction, update, and strategy logic |
| 13 | +- `internal/scheduler/`: in-process task scheduler (assigns tasks to sandbox pods) |
| 14 | +- `internal/task-executor/`: task execution runtime (process/container), manager, and HTTP server |
| 15 | +- `internal/utils/`: shared helpers (pod, finalizer, field index, expectations, logging) |
| 16 | +- `pkg/client/`: generated clientset, informer, and lister |
| 17 | +- `pkg/task-executor/`: task-executor public types and config |
| 18 | +- `config/`: Kustomize overlays, RBAC, CRD bases, samples |
| 19 | +- `charts/opensandbox-controller/`: Helm chart for deployment |
| 20 | +- `test/e2e/`: end-to-end tests (Kind-based) |
| 21 | +- `test/e2e_task/`: task-executor e2e tests |
| 22 | +- `test/e2e_runtime/`: runtime-class e2e tests (gVisor) |
| 23 | +- `docs/`: design documents and troubleshooting guides |
| 24 | + |
| 25 | +If the task changes CRD schemas in `apis/`, also run `make manifests` and `make generate` to keep CRD YAML and DeepCopy methods in sync. |
| 26 | + |
| 27 | +For E2E test failure diagnosis, see [docs/E2E-TROUBLESHOOTING.md](./docs/E2E-TROUBLESHOOTING.md). |
| 28 | + |
| 29 | +## Key Paths |
| 30 | + |
| 31 | +- `apis/sandbox/v1alpha1/`: CRD Go types and source of truth for API shapes |
| 32 | +- `internal/controller/batchsandbox_controller.go`: BatchSandbox reconciler (scale, pool alloc parsing, task scheduling, status) |
| 33 | +- `internal/controller/pool_controller.go`: Pool reconciler (sandbox scheduling, scale, update, eviction, status) |
| 34 | +- `internal/controller/allocator.go`: in-memory allocation store, annotation syncer, and default allocator |
| 35 | +- `internal/controller/strategy/`: strategy interfaces and defaults (PoolStrategy, TaskSchedulingStrategy) |
| 36 | +- `internal/controller/eviction/`: pod eviction handler interface and default |
| 37 | +- `internal/controller/pool_update.go`: rolling update logic for pool pods |
| 38 | +- `internal/scheduler/`: TaskScheduler interface and default implementation (task-to-pod assignment, recovery) |
| 39 | +- `internal/task-executor/`: task-executor manager, runtime (process/container), HTTP handler |
| 40 | + |
| 41 | +## Annotation Contracts |
| 42 | + |
| 43 | +The controller communicates allocation state through annotations on BatchSandbox objects. These are treated as internal but stability-sensitive: |
| 44 | + |
| 45 | +- `sandbox.opensandbox.io/alloc-status`: JSON `{"pods":["pod-1","pod-2"]}` — current pod allocation |
| 46 | +- `sandbox.opensandbox.io/alloc-release`: JSON `{"pods":["pod-3"]}` — pods released back to pool |
| 47 | + |
| 48 | +Do not change annotation keys or JSON shapes without updating both the writer (`allocator.go`, `apis.go`) and all readers (`batchsandbox_controller.go`, `allocation_store_test.go`). |
| 49 | + |
| 50 | +## Label Contracts |
| 51 | + |
| 52 | +- `sandbox.opensandbox.io/pool-name`: labels pool-owned pods |
| 53 | +- `sandbox.opensandbox.io/pool-revision`: revision hash for rolling updates |
| 54 | +- `batch-sandbox.sandbox.opensandbox.io/pod-index`: pod index within a BatchSandbox |
| 55 | + |
| 56 | +## Commands |
| 57 | + |
| 58 | +Unit tests (envtest-based, uses Ginkgo/Gomega): |
| 59 | + |
| 60 | +```bash |
| 61 | +cd kubernetes |
| 62 | +make setup-envtest |
| 63 | +make test |
| 64 | +``` |
| 65 | + |
| 66 | +Focused unit test (standard `testing` functions): |
| 67 | + |
| 68 | +```bash |
| 69 | +cd kubernetes |
| 70 | +go test ./internal/controller/ -run TestAllocatorSchedule -v |
| 71 | +go test ./internal/controller/eviction/ -run TestDefaultEvictionHandler -v |
| 72 | +``` |
| 73 | + |
| 74 | +Focused unit test (Ginkgo suite in `internal/controller/` — entrypoint is `TestControllers`): |
| 75 | + |
| 76 | +```bash |
| 77 | +cd kubernetes |
| 78 | +go test ./internal/controller/ -run TestControllers -v -ginkgo.focus='Pool allocate' |
| 79 | +``` |
| 80 | + |
| 81 | +Build: |
| 82 | + |
| 83 | +```bash |
| 84 | +cd kubernetes |
| 85 | +make build |
| 86 | +``` |
| 87 | + |
| 88 | +Lint: |
| 89 | + |
| 90 | +```bash |
| 91 | +cd kubernetes |
| 92 | +make lint |
| 93 | +``` |
| 94 | + |
| 95 | +End-to-end tests (requires Kind and Docker): |
| 96 | + |
| 97 | +```bash |
| 98 | +cd kubernetes |
| 99 | +make test-e2e # full suite: core + task-executor + gVisor |
| 100 | +make test-e2e-main # core e2e only (test/e2e/) |
| 101 | +``` |
| 102 | + |
| 103 | +Run controller locally: |
| 104 | + |
| 105 | +```bash |
| 106 | +cd kubernetes |
| 107 | +make run |
| 108 | +``` |
| 109 | + |
| 110 | +Deploy via Kustomize: |
| 111 | + |
| 112 | +```bash |
| 113 | +cd kubernetes |
| 114 | +make deploy |
| 115 | +``` |
| 116 | + |
| 117 | +Deploy via Helm: |
| 118 | + |
| 119 | +```bash |
| 120 | +cd kubernetes |
| 121 | +make helm-install |
| 122 | +``` |
| 123 | + |
| 124 | +Regenerate CRD manifests and DeepCopy: |
| 125 | + |
| 126 | +```bash |
| 127 | +cd kubernetes |
| 128 | +make manifests generate |
| 129 | +``` |
| 130 | + |
| 131 | +## Architecture Overview |
| 132 | + |
| 133 | +Two controllers run inside the controller manager: |
| 134 | + |
| 135 | +1. **BatchSandboxReconciler**: Owns Pod objects. Handles pod scaling (non-pooled mode), pool allocation parsing, task scheduling, status updates, and expiry cleanup. |
| 136 | +2. **PoolReconciler**: Owns Pod objects and watches BatchSandbox objects. Handles pod allocation to sandboxes, pool scaling (buffer/pool min/max), rolling updates, eviction, and status. |
| 137 | + |
| 138 | +Allocation flow: PoolReconciler.Schedule → Allocator.Schedule → allocate/deallocate → PersistPoolAllocation → SyncSandboxAllocation (writes annotation to BatchSandbox). |
| 139 | + |
| 140 | +The task-executor runs as a separate binary and in-pod HTTP server. The BatchSandboxReconciler drives task scheduling through the in-process TaskScheduler, which dispatches task execution requests to the task-executor running inside sandbox pods. |
| 141 | + |
| 142 | +## Guardrails |
| 143 | + |
| 144 | +Always: |
| 145 | + |
| 146 | +- Run `make manifests generate` after changing `apis/` types. |
| 147 | +- Run `make test` after controller or allocator changes. |
| 148 | +- Add focused regression tests for bug fixes in controller or allocator logic. |
| 149 | +- Keep reconciler logic idempotent — controllers may reconcile the same object concurrently. |
| 150 | +- Preserve annotation backward compatibility; add new fields rather than renaming existing ones. |
| 151 | +- Use envtest for unit tests; reserve Kind-based e2e for integration validation. |
| 152 | + |
| 153 | +Ask first: |
| 154 | + |
| 155 | +- Changing CRD spec fields (additive changes are fine; removal or renaming is breaking) |
| 156 | +- Changing annotation keys or JSON shapes |
| 157 | +- Changing pool allocation or scheduling semantics |
| 158 | +- Large reorganizations across `controller/`, `scheduler/`, and `task-executor/` |
| 159 | + |
| 160 | +Never: |
| 161 | + |
| 162 | +- Change annotation keys or JSON shapes without updating all readers and writers |
| 163 | +- Change CRD types without running `make manifests generate` |
| 164 | +- Put business logic directly in reconciler Reconcile() — delegate to helpers, strategies, or allocators |
| 165 | +- Mix unrelated controller changes into the same PR |
0 commit comments