Skip to content

Commit 3170b08

Browse files
SpgroundPangjiping
authored andcommitted
docs(k8s): Add docs to provide harness
1 parent 773ea85 commit 3170b08

6 files changed

Lines changed: 1264 additions & 275 deletions

File tree

kubernetes/AGENTS.md

Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
# Kubernetes AGENTS
2+
3+
You are working on the OpenSandbox Kubernetes operator and task-executor. Treat CRD types and annotation contracts as public interfaces, and prefer additive, backward-compatible changes.
4+
5+
For detailed development setup, architecture deep-dive, coding standards, testing guide, and deployment workflows, see [DEVELOPMENT.md](./DEVELOPMENT.md).
6+
7+
## Scope
8+
9+
- `apis/`: CRD type definitions (BatchSandbox, Pool)
10+
- `cmd/controller/`: controller manager entry point
11+
- `cmd/task-executor/`: task-executor entry point
12+
- `internal/controller/`: BatchSandbox and Pool reconcilers, allocator, eviction, update, and strategy logic
13+
- `internal/scheduler/`: in-process task scheduler (assigns tasks to sandbox pods)
14+
- `internal/task-executor/`: task execution runtime (process/container), manager, and HTTP server
15+
- `internal/utils/`: shared helpers (pod, finalizer, field index, expectations, logging)
16+
- `pkg/client/`: generated clientset, informer, and lister
17+
- `pkg/task-executor/`: task-executor public types and config
18+
- `config/`: Kustomize overlays, RBAC, CRD bases, samples
19+
- `charts/opensandbox-controller/`: Helm chart for deployment
20+
- `test/e2e/`: end-to-end tests (Kind-based)
21+
- `test/e2e_task/`: task-executor e2e tests
22+
- `test/e2e_runtime/`: runtime-class e2e tests (gVisor)
23+
- `docs/`: design documents and troubleshooting guides
24+
25+
If the task changes CRD schemas in `apis/`, also run `make manifests` and `make generate` to keep CRD YAML and DeepCopy methods in sync.
26+
27+
For E2E test failure diagnosis, see [docs/E2E-TROUBLESHOOTING.md](./docs/E2E-TROUBLESHOOTING.md).
28+
29+
## Key Paths
30+
31+
- `apis/sandbox/v1alpha1/`: CRD Go types and source of truth for API shapes
32+
- `internal/controller/batchsandbox_controller.go`: BatchSandbox reconciler (scale, pool alloc parsing, task scheduling, status)
33+
- `internal/controller/pool_controller.go`: Pool reconciler (sandbox scheduling, scale, update, eviction, status)
34+
- `internal/controller/allocator.go`: in-memory allocation store, annotation syncer, and default allocator
35+
- `internal/controller/strategy/`: strategy interfaces and defaults (PoolStrategy, TaskSchedulingStrategy)
36+
- `internal/controller/eviction/`: pod eviction handler interface and default
37+
- `internal/controller/pool_update.go`: rolling update logic for pool pods
38+
- `internal/scheduler/`: TaskScheduler interface and default implementation (task-to-pod assignment, recovery)
39+
- `internal/task-executor/`: task-executor manager, runtime (process/container), HTTP handler
40+
41+
## Annotation Contracts
42+
43+
The controller communicates allocation state through annotations on BatchSandbox objects. These are treated as internal but stability-sensitive:
44+
45+
- `sandbox.opensandbox.io/alloc-status`: JSON `{"pods":["pod-1","pod-2"]}` — current pod allocation
46+
- `sandbox.opensandbox.io/alloc-release`: JSON `{"pods":["pod-3"]}` — pods released back to pool
47+
48+
Do not change annotation keys or JSON shapes without updating both the writer (`allocator.go`, `apis.go`) and all readers (`batchsandbox_controller.go`, `allocation_store_test.go`).
49+
50+
## Label Contracts
51+
52+
- `sandbox.opensandbox.io/pool-name`: labels pool-owned pods
53+
- `sandbox.opensandbox.io/pool-revision`: revision hash for rolling updates
54+
- `batch-sandbox.sandbox.opensandbox.io/pod-index`: pod index within a BatchSandbox
55+
56+
## Commands
57+
58+
Unit tests (envtest-based, uses Ginkgo/Gomega):
59+
60+
```bash
61+
cd kubernetes
62+
make setup-envtest
63+
make test
64+
```
65+
66+
Focused unit test (standard `testing` functions):
67+
68+
```bash
69+
cd kubernetes
70+
go test ./internal/controller/ -run TestAllocatorSchedule -v
71+
go test ./internal/controller/eviction/ -run TestDefaultEvictionHandler -v
72+
```
73+
74+
Focused unit test (Ginkgo suite in `internal/controller/` — entrypoint is `TestControllers`):
75+
76+
```bash
77+
cd kubernetes
78+
go test ./internal/controller/ -run TestControllers -v -ginkgo.focus='Pool allocate'
79+
```
80+
81+
Build:
82+
83+
```bash
84+
cd kubernetes
85+
make build
86+
```
87+
88+
Lint:
89+
90+
```bash
91+
cd kubernetes
92+
make lint
93+
```
94+
95+
End-to-end tests (requires Kind and Docker):
96+
97+
```bash
98+
cd kubernetes
99+
make test-e2e # full suite: core + task-executor + gVisor
100+
make test-e2e-main # core e2e only (test/e2e/)
101+
```
102+
103+
Run controller locally:
104+
105+
```bash
106+
cd kubernetes
107+
make run
108+
```
109+
110+
Deploy via Kustomize:
111+
112+
```bash
113+
cd kubernetes
114+
make deploy
115+
```
116+
117+
Deploy via Helm:
118+
119+
```bash
120+
cd kubernetes
121+
make helm-install
122+
```
123+
124+
Regenerate CRD manifests and DeepCopy:
125+
126+
```bash
127+
cd kubernetes
128+
make manifests generate
129+
```
130+
131+
## Architecture Overview
132+
133+
Two controllers run inside the controller manager:
134+
135+
1. **BatchSandboxReconciler**: Owns Pod objects. Handles pod scaling (non-pooled mode), pool allocation parsing, task scheduling, status updates, and expiry cleanup.
136+
2. **PoolReconciler**: Owns Pod objects and watches BatchSandbox objects. Handles pod allocation to sandboxes, pool scaling (buffer/pool min/max), rolling updates, eviction, and status.
137+
138+
Allocation flow: PoolReconciler.Schedule → Allocator.Schedule → allocate/deallocate → PersistPoolAllocation → SyncSandboxAllocation (writes annotation to BatchSandbox).
139+
140+
The task-executor runs as a separate binary and in-pod HTTP server. The BatchSandboxReconciler drives task scheduling through the in-process TaskScheduler, which dispatches task execution requests to the task-executor running inside sandbox pods.
141+
142+
## Guardrails
143+
144+
Always:
145+
146+
- Run `make manifests generate` after changing `apis/` types.
147+
- Run `make test` after controller or allocator changes.
148+
- Add focused regression tests for bug fixes in controller or allocator logic.
149+
- Keep reconciler logic idempotent — controllers may reconcile the same object concurrently.
150+
- Preserve annotation backward compatibility; add new fields rather than renaming existing ones.
151+
- Use envtest for unit tests; reserve Kind-based e2e for integration validation.
152+
153+
Ask first:
154+
155+
- Changing CRD spec fields (additive changes are fine; removal or renaming is breaking)
156+
- Changing annotation keys or JSON shapes
157+
- Changing pool allocation or scheduling semantics
158+
- Large reorganizations across `controller/`, `scheduler/`, and `task-executor/`
159+
160+
Never:
161+
162+
- Change annotation keys or JSON shapes without updating all readers and writers
163+
- Change CRD types without running `make manifests generate`
164+
- Put business logic directly in reconciler Reconcile() — delegate to helpers, strategies, or allocators
165+
- Mix unrelated controller changes into the same PR

0 commit comments

Comments
 (0)