Skip to content

Commit 544815a

Browse files
authored
Merge branch 'main' into fix/1530-mcp-tool-call-cpu-spin
2 parents fb5ecbd + 8d6340c commit 544815a

File tree

220 files changed

+28889
-2797
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

220 files changed

+28889
-2797
lines changed

.agents/skills/kagent

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../../.claude/skills/kagent

.agents/skills/kagent-dev

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../../.claude/skills/kagent-dev

.claude/skills/kagent-dev/SKILL.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,9 @@ make helm-install # Builds images and deploys to Kind
1818
make controller-manifests # generate + copy CRDs to helm (recommended)
1919
make -C go generate # DeepCopy methods only
2020

21+
# sqlc (after editing go/core/internal/database/queries/*.sql)
22+
cd go/core/internal/database && sqlc generate # regenerate gen/ — commit both
23+
2124
# Build & test
2225
make -C go test # Unit tests (includes golden file checks)
2326
make -C go e2e # E2E tests (needs KAGENT_URL)
@@ -43,7 +46,7 @@ kagent/
4346
│ ├── api/ # Shared types module
4447
│ │ ├── v1alpha2/ # Current CRD types (agent_types.go, etc.)
4548
│ │ ├── adk/ # ADK config types (types.go) — flows to Python runtime
46-
│ │ ├── database/ # GORM models
49+
│ │ ├── database/ # database models
4750
│ │ ├── httpapi/ # HTTP API types
4851
│ │ └── config/crd/bases/ # Generated CRD YAML
4952
│ ├── core/ # Infrastructure module
@@ -231,7 +234,7 @@ curl -v $KAGENT_URL/healthz # Controller reach
231234

232235
**Reproducing locally (without cluster):** Follow `go/core/test/e2e/README.md` — extract agent config, start mock LLM server, run agent with `kagent-adk test`. Much faster iteration than full cluster.
233236

234-
**CI-specific:** E2E runs in matrix (`sqlite` + `postgres`). If only one database variant fails, it's likely database-related. If both fail, it's infrastructure. Most common CI-only failure: mock LLM unreachability because `KAGENT_LOCAL_HOST` detection fails on Linux.
237+
**CI-specific:** Most common CI-only failure: mock LLM unreachability because `KAGENT_LOCAL_HOST` detection fails on Linux.
235238

236239
See `references/e2e-debugging.md` for comprehensive debugging techniques.
237240

@@ -349,3 +352,4 @@ Don't use Go template syntax (`{{ }}`) in doc comments — Helm will try to pars
349352
- `references/translator-guide.md` - Translator patterns, `deployments.go` and `adk_api_translator.go`
350353
- `references/e2e-debugging.md` - Comprehensive E2E debugging, local reproduction
351354
- `references/ci-failures.md` - CI failure patterns and fixes
355+
- `references/database-migrations.md` - Migration authoring rules, sqlc workflow, multi-instance safety, expand/contract pattern

.claude/skills/kagent-dev/references/ci-failures.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ Common GitHub Actions CI failures and how to fix them.
77
| Failure | Likely Cause | Quick Fix |
88
|---------|--------------|-----------|
99
| manifests-check | CRD manifests out of date | `make -C go generate && cp go/api/config/crd/bases/*.yaml helm/kagent-crds/templates/` |
10+
| sqlc-generate-check | `gen/` out of sync with queries | `cd go/core/internal/database && sqlc generate`, commit `gen/` |
1011
| go-lint depguard | Forbidden package used | Replace with allowed alternative (e.g., `slices.Sort` not `sort.Strings`) |
1112
| test-e2e timeout | Agent not starting or KAGENT_URL wrong | Check pod status, verify KAGENT_URL setup in CI |
1213
| golden files mismatch | Translator output changed | `UPDATE_GOLDEN=true make -C go test` and commit |
@@ -520,6 +521,7 @@ make init-git-hooks
520521
Before submitting PR:
521522

522523
- [ ] Ran `make -C go generate` after CRD changes
524+
- [ ] Ran `cd go/core/internal/database && sqlc generate` after query changes, committed `gen/`
523525
- [ ] Ran `make lint` and fixed issues
524526
- [ ] Ran `make -C go test` and all pass
525527
- [ ] Regenerated golden files if translator changed
Lines changed: 155 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
# Database Migrations Guide
2+
3+
kagent uses [golang-migrate](https://github.com/golang-migrate/migrate) with embedded SQL files and [sqlc](https://sqlc.dev/) for type-safe query generation. Migrations run **in-app at startup** — the controller applies them before accepting traffic.
4+
5+
## Structure
6+
7+
```
8+
go/core/pkg/migrations/
9+
├── migrations.go # Embeds the FS (go:embed); exports FS for downstream consumers
10+
├── runner.go # RunUp (applies pending migrations at startup)
11+
├── core/ # Core schema (tracked in schema_migrations table)
12+
│ ├── 000001_initial.up.sql / .down.sql
13+
│ ├── 000002_add_session_source.up.sql / .down.sql
14+
│ └── ...
15+
└── vector/ # pgvector schema (tracked in vector_schema_migrations table)
16+
├── 000001_vector_support.up.sql / .down.sql
17+
└── ...
18+
19+
go/core/internal/database/
20+
├── queries/ # Hand-written SQL queries (source of truth)
21+
│ ├── sessions.sql
22+
│ ├── memory.sql
23+
│ └── ...
24+
├── gen/ # sqlc-generated Go code — DO NOT edit manually
25+
│ ├── db.go
26+
│ ├── models.go
27+
│ └── *.sql.go
28+
└── sqlc.yaml # sqlc configuration
29+
```
30+
31+
Migrations manage two independent tracks — `core` and `vector` — and roll back both if either fails. The `--database-vector-enabled` flag (default `true`) controls whether the vector track runs.
32+
33+
## sqlc Workflow
34+
35+
When you add or change a SQL query:
36+
37+
1. Edit (or add) a `.sql` file under `go/core/internal/database/queries/`
38+
2. Regenerate:
39+
```bash
40+
cd go/core/internal/database && sqlc generate
41+
```
42+
3. Commit both the query file and the updated `gen/` files together.
43+
44+
A CI check (`.github/workflows/sqlc-generate-check.yaml`) fails the PR if `gen/` is out of sync with the queries. Never edit `gen/` by hand.
45+
46+
**sqlc annotations used:**
47+
- `:one` — returns a single row
48+
- `:many` — returns a slice
49+
- `:exec` — returns only error (use for INSERT/UPDATE/DELETE that don't need the result)
50+
51+
## Writing Migrations
52+
53+
### Backward-compatible schema changes
54+
55+
During a rolling deploy, old pods will be reading and writing a schema that has already been upgraded. **Every migration must be backward-compatible with the previous version's code.**
56+
57+
| Change | Old code behavior | Safe? |
58+
|--------|------------------|-------|
59+
| Add nullable column | SELECT ignores it; INSERT omits it (goes NULL) ||
60+
| Add column with `DEFAULT x` | INSERT omits it; DB fills default ||
61+
| Add NOT NULL column **without** default | Old INSERT missing the column → error ||
62+
| Add index | Invisible to application code ||
63+
| Add foreign key | Old INSERT may fail constraint ||
64+
| Drop/rename column old code references | Old SELECT/INSERT errors ||
65+
| Change compatible type (e.g. `int``bigint`) | Usually fine | ⚠️ |
66+
67+
**Expand-then-contract pattern for schema changes:**
68+
1. **Version N (Expand)**: add the new column/table (nullable or with default); old code still works
69+
2. **Version N (Deploy)**: ship new code that uses the new structure
70+
3. **Version N+1 (Contract)**: drop the old column/table once version N is fully deployed and no pods run version N-1
71+
72+
### Idempotency and cross-track safety
73+
74+
All DDL statements must use `IF EXISTS` / `IF NOT EXISTS` guards:
75+
76+
```sql
77+
-- Up
78+
CREATE TABLE IF NOT EXISTS foo (...);
79+
ALTER TABLE foo ADD COLUMN IF NOT EXISTS bar TEXT;
80+
81+
-- Down
82+
DROP TABLE IF EXISTS foo;
83+
ALTER TABLE foo DROP COLUMN IF EXISTS bar;
84+
```
85+
86+
Guards provide defense-in-depth for crash recovery and dirty-state cleanup, where a partially-applied migration may be re-run or rolled back.
87+
88+
### Naming
89+
90+
Files must follow `NNNNNN_description.up.sql` / `NNNNNN_description.down.sql` with zero-padded 6-digit sequence numbers.
91+
92+
### Down migrations
93+
94+
Every `.up.sql` must have a corresponding `.down.sql` that exactly reverses it. Down migrations are used for rollbacks and by automatic rollback on migration failure. They must be **idempotent** — the two-track rollback logic (roll back core if vector fails) may call them more than once in failure scenarios.
95+
96+
## Multi-Instance Safety
97+
98+
### How the advisory lock works
99+
100+
The migration runner acquires a PostgreSQL **session-level** advisory lock (`pg_advisory_lock`) before running.
101+
102+
### Rolling deploy concurrency
103+
104+
If multiple pods start simultaneously (e.g., rolling deploy with replicas > 1):
105+
1. One controller acquires the advisory lock and runs migrations.
106+
2. Others block on `pg_advisory_lock`.
107+
3. When the winner finishes and its connection closes, the next waiter acquires the lock, calls `Up()`, gets `ErrNoChange`, and exits immediately.
108+
109+
This is safe. The only risk is if the winning controller crashes mid-migration (see Dirty State below).
110+
111+
### Dirty state recovery
112+
113+
If the controller crashes mid-migration, the migration runner records the version as `dirty = true` in the tracking table. The next startup detects dirty state and calls `rollbackToVersion`, which:
114+
1. Calls `mg.Force(version - 1)` to clear the dirty flag.
115+
2. Runs the down migration to restore the previous clean state.
116+
3. Re-runs the failed up migration.
117+
118+
**Requirement**: down migrations must be idempotent and correctly reverse their up migration. A missing or broken down migration requires manual recovery.
119+
120+
### Rollout strategy
121+
122+
For backward-compatible migrations a rolling update is safe:
123+
124+
1. New pod starts → migration runner applies pending migrations (advisory lock serializes concurrent runs)
125+
2. New pod passes readiness probe → old pod terminates
126+
3. Backward-compatible schema means old pods continue operating during the window
127+
128+
For a migration that is **not** backward-compatible, restructure it using the expand-then-contract pattern (add new column/table in version N, ship code that uses it, drop the old column in version N+1).
129+
130+
## Static Analysis Enforcement
131+
132+
The policies above are enforced by static analysis tests in `go/core/pkg/migrations/cross_track_test.go`. These run against the embedded SQL files — no database required.
133+
134+
| Test | What it enforces |
135+
|------|-----------------|
136+
| `TestNoCrossTrackDDL` | No track may `ALTER TABLE` or `CREATE INDEX ON` a table owned by another track |
137+
| `TestMigrationGuards` | Up migrations must use `IF NOT EXISTS` on all `CREATE`/`ADD COLUMN`; down migrations must use `IF EXISTS` on all `DROP` statements |
138+
139+
**Adding a new track**: add the track directory name to the `tracks` slice in each test so the new track is covered by the same checks.
140+
141+
These tests catch policy violations at PR time without needing a running database. They complement the integration tests in `runner_test.go`, which verify the runner's rollback and concurrency behavior against a real Postgres instance.
142+
143+
## Downstream Extension Model
144+
145+
The migration layer is designed for downstream consumers to extend with their own migrations alongside OSS. The extension points are:
146+
147+
1. **SQL files as the contract.** The migration files in `go/core/pkg/migrations/core/` and `vector/` are the stable interface. Downstream consumers sync these files into their own repos and build their own migration runners. Don't move or reorganize migration file paths without considering downstream impact.
148+
149+
2. **`MigrationRunner` DI callback.** Downstream consumers pass a custom `MigrationRunner` to `app.Start` to take full ownership of the migration process — running OSS migrations alongside their own in whatever order they need. The signature `func(ctx context.Context, url string, vectorEnabled bool) error` is stable.
150+
151+
3. **Vector track stays separate.** The vector track is conditionally applied and has its own tracking table. Downstream extensions should not modify vector-owned tables (enforced by `TestNoCrossTrackDDL`).
152+
153+
### What this means for OSS development
154+
155+
- **Migration immutability is cross-repo.** Once a migration file is merged and tagged, downstream consumers may have synced it. Modifying it breaks their tracking table state.

.github/workflows/ci.yaml

Lines changed: 22 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ env:
1515
# Cache key components for better organization
1616
CACHE_KEY_PREFIX: kagent-v2
1717
BRANCH_CACHE_KEY: ${{ github.head_ref || github.ref_name }}
18+
AGENT_SANDBOX_VERSION: v0.3.10
1819
# Consistent builder configuration
1920
BUILDX_BUILDER_NAME: kagent-builder-v0.23.0
2021
BUILDX_VERSION: v0.23.0
@@ -66,6 +67,17 @@ jobs:
6667
with:
6768
install_only: true
6869

70+
- name: Create Kind cluster
71+
run: |
72+
make create-kind-cluster
73+
74+
- name: Install agent-sandbox
75+
run: |
76+
kubectl apply -f "https://github.com/kubernetes-sigs/agent-sandbox/releases/download/${AGENT_SANDBOX_VERSION}/manifest.yaml"
77+
kubectl wait --for=condition=Established crd/sandboxes.agents.x-k8s.io --timeout=90s
78+
kubectl rollout status deployment/agent-sandbox-controller -n agent-sandbox-system --timeout=120s
79+
kubectl wait --for=condition=Ready pod -l app=agent-sandbox-controller -n agent-sandbox-system --timeout=120s
80+
6981
- name: Install Kagent
7082
id: install-kagent
7183
env:
@@ -79,10 +91,11 @@ jobs:
7991
--platform=linux/amd64
8092
--push
8193
run: |
82-
make create-kind-cluster
8394
echo "Cache key: ${{ needs.setup.outputs.cache-key }}"
8495
make helm-install
8596
make push-test-agent push-test-skill
97+
kubectl rollout status deployment/kagent-controller -n kagent --timeout=120s
98+
kubectl wait --for=condition=Ready pod -l app.kubernetes.io/component=controller -n kagent --timeout=120s
8699
kubectl wait --for=condition=Ready agents.kagent.dev -n kagent --all --timeout=60s || kubectl get po -n kagent -o wide ||:
87100
kubectl wait --for=condition=Ready agents.kagent.dev -n kagent --all --timeout=60s
88101
@@ -113,15 +126,15 @@ jobs:
113126
run: |
114127
# Upgrade helm to use namespace-scoped RBAC
115128
make helm-install-provider
116-
129+
117130
# Wait for controller to be ready after upgrade
118131
kubectl rollout status deployment/kagent-controller -n kagent --timeout=90s
119-
132+
120133
# Setup environment variables (reusing logic from previous step)
121134
HOST_IP=$(docker network inspect kind -f '{{range .IPAM.Config}}{{if .Gateway}}{{.Gateway}}{{"\n"}}{{end}}{{end}}' | grep -E '^[0-9]+\.' | head -1)
122135
export KAGENT_LOCAL_HOST=$HOST_IP
123136
export KAGENT_URL="http://$(kubectl get svc -n kagent kagent-controller -o jsonpath='{.status.loadBalancer.ingress[0].ip}'):8083"
124-
137+
125138
# Run critical tests with namespace-scoped RBAC to verify the controller didn't lose needed permissions
126139
cd go
127140
go test -v github.com/kagent-dev/kagent/go/core/test/e2e -run '^TestE2EInvokeInlineAgent$|^TestE2EInvokeDeclarativeAgentWithMcpServerTool$' -failfast
@@ -131,6 +144,10 @@ jobs:
131144
echo "::error::Failed to run e2e tests"
132145
echo "::error::Kubectl get pods -n kagent"
133146
kubectl describe pods -n kagent
147+
echo "::error::Kubectl get pods -n agent-sandbox-system"
148+
kubectl get pods -n agent-sandbox-system -o wide || true
149+
echo "::error::Kubectl logs -n agent-sandbox-system deployment/agent-sandbox-controller"
150+
kubectl logs -n agent-sandbox-system deployment/agent-sandbox-controller || true
134151
echo "::error::Kubectl get events -n kagent"
135152
kubectl get events -n kagent
136153
echo "::error::Kubectl get agents -n kagent"
@@ -248,6 +265,7 @@ jobs:
248265
- app
249266
- cli
250267
- golang-adk
268+
- golang-adk-full
251269
- skills-init
252270
runs-on: ubuntu-latest
253271
services:

.github/workflows/image-scan.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ jobs:
3131
- app
3232
- skills-init
3333
- golang-adk
34+
- golang-adk-full
3435
runs-on: ubuntu-latest
3536
services:
3637
registry:
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
name: Migration Immutability
2+
3+
on:
4+
pull_request:
5+
branches: [main]
6+
paths:
7+
- "go/core/pkg/migrations/**"
8+
9+
jobs:
10+
check:
11+
runs-on: ubuntu-latest
12+
steps:
13+
- uses: actions/checkout@v4
14+
with:
15+
fetch-depth: 0
16+
17+
- name: Fail if any existing migration file was modified
18+
run: |
19+
# List files under go/core/pkg/migrations/ that were changed relative
20+
# to the merge base of this PR. We only care about modifications (M)
21+
# and renames (R); additions (A) are fine.
22+
BASE=$(git merge-base HEAD origin/${{ github.base_ref }})
23+
MODIFIED=$(git diff --name-only --diff-filter=MR "$BASE" HEAD \
24+
-- 'go/core/pkg/migrations/**/*.sql')
25+
26+
if [ -n "$MODIFIED" ]; then
27+
echo "ERROR: The following migration files were modified."
28+
echo "Migration files are immutable once merged."
29+
echo "Fix bugs with a new migration instead."
30+
echo ""
31+
echo "$MODIFIED"
32+
exit 1
33+
fi
34+
35+
echo "OK: no existing migration files were modified."
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
name: sqlc Generate Check
2+
3+
on:
4+
pull_request:
5+
branches: [main]
6+
paths:
7+
- "go/core/internal/database/queries/**"
8+
- "go/core/internal/database/sqlc.yaml"
9+
- "go/core/pkg/migrations/**"
10+
11+
jobs:
12+
check:
13+
runs-on: ubuntu-latest
14+
steps:
15+
- uses: actions/checkout@v4
16+
17+
- uses: actions/setup-go@v6
18+
with:
19+
go-version: "1.26"
20+
cache: true
21+
cache-dependency-path: go/go.sum
22+
23+
- name: Run sqlc generate
24+
working-directory: go
25+
run: make sqlc-generate
26+
27+
- name: Fail if generated files differ
28+
run: |
29+
if ! git diff --quiet go/core/internal/database/gen/; then
30+
echo "ERROR: sqlc generate produced changes. Run sqlc generate locally and commit the result."
31+
echo ""
32+
git diff go/core/internal/database/gen/
33+
exit 1
34+
fi
35+
echo "OK: generated files are up to date."

.github/workflows/tag.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ jobs:
2121
- ui
2222
- app
2323
- golang-adk
24+
- golang-adk-full
2425
- skills-init
2526
runs-on: ubuntu-latest
2627
permissions:

0 commit comments

Comments
 (0)