Skip to content

Commit 7a61618

Browse files
authored
Merge branch 'main' into support-sap-gen-ai-hub
Signed-off-by: Lis <1205913055@qq.com>
2 parents 1cfa50a + 256d11a commit 7a61618

107 files changed

Lines changed: 7485 additions & 1856 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.claude/skills/kagent-dev/SKILL.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,9 @@ make helm-install # Builds images and deploys to Kind
1818
make controller-manifests # generate + copy CRDs to helm (recommended)
1919
make -C go generate # DeepCopy methods only
2020

21+
# sqlc (after editing go/core/internal/database/queries/*.sql)
22+
cd go/core/internal/database && sqlc generate # regenerate gen/ — commit both
23+
2124
# Build & test
2225
make -C go test # Unit tests (includes golden file checks)
2326
make -C go e2e # E2E tests (needs KAGENT_URL)
@@ -43,7 +46,7 @@ kagent/
4346
│ ├── api/ # Shared types module
4447
│ │ ├── v1alpha2/ # Current CRD types (agent_types.go, etc.)
4548
│ │ ├── adk/ # ADK config types (types.go) — flows to Python runtime
46-
│ │ ├── database/ # GORM models
49+
│ │ ├── database/ # database models
4750
│ │ ├── httpapi/ # HTTP API types
4851
│ │ └── config/crd/bases/ # Generated CRD YAML
4952
│ ├── core/ # Infrastructure module
@@ -231,7 +234,7 @@ curl -v $KAGENT_URL/healthz # Controller reach
231234

232235
**Reproducing locally (without cluster):** Follow `go/core/test/e2e/README.md` — extract agent config, start mock LLM server, run agent with `kagent-adk test`. Much faster iteration than full cluster.
233236

234-
**CI-specific:** E2E runs in matrix (`sqlite` + `postgres`). If only one database variant fails, it's likely database-related. If both fail, it's infrastructure. Most common CI-only failure: mock LLM unreachability because `KAGENT_LOCAL_HOST` detection fails on Linux.
237+
**CI-specific:** Most common CI-only failure: mock LLM unreachability because `KAGENT_LOCAL_HOST` detection fails on Linux.
235238

236239
See `references/e2e-debugging.md` for comprehensive debugging techniques.
237240

@@ -349,3 +352,4 @@ Don't use Go template syntax (`{{ }}`) in doc comments — Helm will try to pars
349352
- `references/translator-guide.md` - Translator patterns, `deployments.go` and `adk_api_translator.go`
350353
- `references/e2e-debugging.md` - Comprehensive E2E debugging, local reproduction
351354
- `references/ci-failures.md` - CI failure patterns and fixes
355+
- `references/database-migrations.md` - Migration authoring rules, sqlc workflow, multi-instance safety, expand/contract pattern

.claude/skills/kagent-dev/references/ci-failures.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ Common GitHub Actions CI failures and how to fix them.
77
| Failure | Likely Cause | Quick Fix |
88
|---------|--------------|-----------|
99
| manifests-check | CRD manifests out of date | `make -C go generate && cp go/api/config/crd/bases/*.yaml helm/kagent-crds/templates/` |
10+
| sqlc-generate-check | `gen/` out of sync with queries | `cd go/core/internal/database && sqlc generate`, commit `gen/` |
1011
| go-lint depguard | Forbidden package used | Replace with allowed alternative (e.g., `slices.Sort` not `sort.Strings`) |
1112
| test-e2e timeout | Agent not starting or KAGENT_URL wrong | Check pod status, verify KAGENT_URL setup in CI |
1213
| golden files mismatch | Translator output changed | `UPDATE_GOLDEN=true make -C go test` and commit |
@@ -520,6 +521,7 @@ make init-git-hooks
520521
Before submitting PR:
521522

522523
- [ ] Ran `make -C go generate` after CRD changes
524+
- [ ] Ran `cd go/core/internal/database && sqlc generate` after query changes, committed `gen/`
523525
- [ ] Ran `make lint` and fixed issues
524526
- [ ] Ran `make -C go test` and all pass
525527
- [ ] Regenerated golden files if translator changed
Lines changed: 155 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
# Database Migrations Guide
2+
3+
kagent uses [golang-migrate](https://github.com/golang-migrate/migrate) with embedded SQL files and [sqlc](https://sqlc.dev/) for type-safe query generation. Migrations run **in-app at startup** — the controller applies them before accepting traffic.
4+
5+
## Structure
6+
7+
```
8+
go/core/pkg/migrations/
9+
├── migrations.go # Embeds the FS (go:embed); exports FS for downstream consumers
10+
├── runner.go # RunUp (applies pending migrations at startup)
11+
├── core/ # Core schema (tracked in schema_migrations table)
12+
│ ├── 000001_initial.up.sql / .down.sql
13+
│ ├── 000002_add_session_source.up.sql / .down.sql
14+
│ └── ...
15+
└── vector/ # pgvector schema (tracked in vector_schema_migrations table)
16+
├── 000001_vector_support.up.sql / .down.sql
17+
└── ...
18+
19+
go/core/internal/database/
20+
├── queries/ # Hand-written SQL queries (source of truth)
21+
│ ├── sessions.sql
22+
│ ├── memory.sql
23+
│ └── ...
24+
├── gen/ # sqlc-generated Go code — DO NOT edit manually
25+
│ ├── db.go
26+
│ ├── models.go
27+
│ └── *.sql.go
28+
└── sqlc.yaml # sqlc configuration
29+
```
30+
31+
Migrations manage two independent tracks — `core` and `vector` — and roll back both if either fails. The `--database-vector-enabled` flag (default `true`) controls whether the vector track runs.
32+
33+
## sqlc Workflow
34+
35+
When you add or change a SQL query:
36+
37+
1. Edit (or add) a `.sql` file under `go/core/internal/database/queries/`
38+
2. Regenerate:
39+
```bash
40+
cd go/core/internal/database && sqlc generate
41+
```
42+
3. Commit both the query file and the updated `gen/` files together.
43+
44+
A CI check (`.github/workflows/sqlc-generate-check.yaml`) fails the PR if `gen/` is out of sync with the queries. Never edit `gen/` by hand.
45+
46+
**sqlc annotations used:**
47+
- `:one` — returns a single row
48+
- `:many` — returns a slice
49+
- `:exec` — returns only error (use for INSERT/UPDATE/DELETE that don't need the result)
50+
51+
## Writing Migrations
52+
53+
### Backward-compatible schema changes
54+
55+
During a rolling deploy, old pods will be reading and writing a schema that has already been upgraded. **Every migration must be backward-compatible with the previous version's code.**
56+
57+
| Change | Old code behavior | Safe? |
58+
|--------|------------------|-------|
59+
| Add nullable column | SELECT ignores it; INSERT omits it (goes NULL) ||
60+
| Add column with `DEFAULT x` | INSERT omits it; DB fills default ||
61+
| Add NOT NULL column **without** default | Old INSERT missing the column → error ||
62+
| Add index | Invisible to application code ||
63+
| Add foreign key | Old INSERT may fail constraint ||
64+
| Drop/rename column old code references | Old SELECT/INSERT errors ||
65+
| Change compatible type (e.g. `int``bigint`) | Usually fine | ⚠️ |
66+
67+
**Expand-then-contract pattern for schema changes:**
68+
1. **Version N (Expand)**: add the new column/table (nullable or with default); old code still works
69+
2. **Version N (Deploy)**: ship new code that uses the new structure
70+
3. **Version N+1 (Contract)**: drop the old column/table once version N is fully deployed and no pods run version N-1
71+
72+
### Idempotency and cross-track safety
73+
74+
All DDL statements must use `IF EXISTS` / `IF NOT EXISTS` guards:
75+
76+
```sql
77+
-- Up
78+
CREATE TABLE IF NOT EXISTS foo (...);
79+
ALTER TABLE foo ADD COLUMN IF NOT EXISTS bar TEXT;
80+
81+
-- Down
82+
DROP TABLE IF EXISTS foo;
83+
ALTER TABLE foo DROP COLUMN IF EXISTS bar;
84+
```
85+
86+
Guards provide defense-in-depth for crash recovery and dirty-state cleanup, where a partially-applied migration may be re-run or rolled back.
87+
88+
### Naming
89+
90+
Files must follow `NNNNNN_description.up.sql` / `NNNNNN_description.down.sql` with zero-padded 6-digit sequence numbers.
91+
92+
### Down migrations
93+
94+
Every `.up.sql` must have a corresponding `.down.sql` that exactly reverses it. Down migrations are used for rollbacks and by automatic rollback on migration failure. They must be **idempotent** — the two-track rollback logic (roll back core if vector fails) may call them more than once in failure scenarios.
95+
96+
## Multi-Instance Safety
97+
98+
### How the advisory lock works
99+
100+
The migration runner acquires a PostgreSQL **session-level** advisory lock (`pg_advisory_lock`) before running.
101+
102+
### Rolling deploy concurrency
103+
104+
If multiple pods start simultaneously (e.g., rolling deploy with replicas > 1):
105+
1. One controller acquires the advisory lock and runs migrations.
106+
2. Others block on `pg_advisory_lock`.
107+
3. When the winner finishes and its connection closes, the next waiter acquires the lock, calls `Up()`, gets `ErrNoChange`, and exits immediately.
108+
109+
This is safe. The only risk is if the winning controller crashes mid-migration (see Dirty State below).
110+
111+
### Dirty state recovery
112+
113+
If the controller crashes mid-migration, the migration runner records the version as `dirty = true` in the tracking table. The next startup detects dirty state and calls `rollbackToVersion`, which:
114+
1. Calls `mg.Force(version - 1)` to clear the dirty flag.
115+
2. Runs the down migration to restore the previous clean state.
116+
3. Re-runs the failed up migration.
117+
118+
**Requirement**: down migrations must be idempotent and correctly reverse their up migration. A missing or broken down migration requires manual recovery.
119+
120+
### Rollout strategy
121+
122+
For backward-compatible migrations a rolling update is safe:
123+
124+
1. New pod starts → migration runner applies pending migrations (advisory lock serializes concurrent runs)
125+
2. New pod passes readiness probe → old pod terminates
126+
3. Backward-compatible schema means old pods continue operating during the window
127+
128+
For a migration that is **not** backward-compatible, restructure it using the expand-then-contract pattern (add new column/table in version N, ship code that uses it, drop the old column in version N+1).
129+
130+
## Static Analysis Enforcement
131+
132+
The policies above are enforced by static analysis tests in `go/core/pkg/migrations/cross_track_test.go`. These run against the embedded SQL files — no database required.
133+
134+
| Test | What it enforces |
135+
|------|-----------------|
136+
| `TestNoCrossTrackDDL` | No track may `ALTER TABLE` or `CREATE INDEX ON` a table owned by another track |
137+
| `TestMigrationGuards` | Up migrations must use `IF NOT EXISTS` on all `CREATE`/`ADD COLUMN`; down migrations must use `IF EXISTS` on all `DROP` statements |
138+
139+
**Adding a new track**: add the track directory name to the `tracks` slice in each test so the new track is covered by the same checks.
140+
141+
These tests catch policy violations at PR time without needing a running database. They complement the integration tests in `runner_test.go`, which verify the runner's rollback and concurrency behavior against a real Postgres instance.
142+
143+
## Downstream Extension Model
144+
145+
The migration layer is designed for downstream consumers to extend with their own migrations alongside OSS. The extension points are:
146+
147+
1. **SQL files as the contract.** The migration files in `go/core/pkg/migrations/core/` and `vector/` are the stable interface. Downstream consumers sync these files into their own repos and build their own migration runners. Don't move or reorganize migration file paths without considering downstream impact.
148+
149+
2. **`MigrationRunner` DI callback.** Downstream consumers pass a custom `MigrationRunner` to `app.Start` to take full ownership of the migration process — running OSS migrations alongside their own in whatever order they need. The signature `func(ctx context.Context, url string, vectorEnabled bool) error` is stable.
150+
151+
3. **Vector track stays separate.** The vector track is conditionally applied and has its own tracking table. Downstream extensions should not modify vector-owned tables (enforced by `TestNoCrossTrackDDL`).
152+
153+
### What this means for OSS development
154+
155+
- **Migration immutability is cross-repo.** Once a migration file is merged and tagged, downstream consumers may have synced it. Modifying it breaks their tracking table state.

.github/workflows/ci.yaml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ jobs:
5656
driver-opts: network=host
5757

5858
- name: Set up Helm
59-
uses: azure/setup-helm@v4.2.0
59+
uses: azure/setup-helm@v5.0.0
6060
with:
6161
version: v3.18.0
6262

@@ -164,7 +164,7 @@ jobs:
164164
uses: actions/checkout@v6
165165

166166
- name: Set up Helm
167-
uses: azure/setup-helm@v4.2.0
167+
uses: azure/setup-helm@v5.0.0
168168
with:
169169
version: v3.18.0
170170
# Install unittest plugin
@@ -229,7 +229,7 @@ jobs:
229229
- name: Publish to Chromatic
230230
# Requires repo secret CHROMATIC_PROJECT_TOKEN. Skipped when unset, or on fork PRs (no token access).
231231
if: ${{ env.CHROMATIC_PROJECT_TOKEN != '' && (github.event.pull_request == null || github.event.pull_request.head.repo.full_name == github.repository) }}
232-
uses: chromaui/action@v16.0.0
232+
uses: chromaui/action@v16.1.0
233233
with:
234234
projectToken: ${{ env.CHROMATIC_PROJECT_TOKEN }}
235235
workingDir: ui
@@ -314,7 +314,7 @@ jobs:
314314
uses: actions/checkout@v6
315315

316316
- name: Install uv
317-
uses: astral-sh/setup-uv@v5
317+
uses: astral-sh/setup-uv@v7
318318

319319
- name: Install python
320320
run: uv python install ${{ matrix.python-version }}
@@ -338,7 +338,7 @@ jobs:
338338
uses: actions/checkout@v6
339339

340340
- name: Install uv
341-
uses: astral-sh/setup-uv@v5
341+
uses: astral-sh/setup-uv@v7
342342

343343
- name: Install python
344344
run: uv python install 3.10
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
name: Migration Immutability
2+
3+
on:
4+
pull_request:
5+
branches: [main]
6+
paths:
7+
- "go/core/pkg/migrations/**"
8+
9+
jobs:
10+
check:
11+
runs-on: ubuntu-latest
12+
steps:
13+
- uses: actions/checkout@v4
14+
with:
15+
fetch-depth: 0
16+
17+
- name: Fail if any existing migration file was modified
18+
run: |
19+
# List files under go/core/pkg/migrations/ that were changed relative
20+
# to the merge base of this PR. We only care about modifications (M)
21+
# and renames (R); additions (A) are fine.
22+
BASE=$(git merge-base HEAD origin/${{ github.base_ref }})
23+
MODIFIED=$(git diff --name-only --diff-filter=MR "$BASE" HEAD \
24+
-- 'go/core/pkg/migrations/**/*.sql')
25+
26+
if [ -n "$MODIFIED" ]; then
27+
echo "ERROR: The following migration files were modified."
28+
echo "Migration files are immutable once merged."
29+
echo "Fix bugs with a new migration instead."
30+
echo ""
31+
echo "$MODIFIED"
32+
exit 1
33+
fi
34+
35+
echo "OK: no existing migration files were modified."
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
name: sqlc Generate Check
2+
3+
on:
4+
pull_request:
5+
branches: [main]
6+
paths:
7+
- "go/core/internal/database/queries/**"
8+
- "go/core/internal/database/sqlc.yaml"
9+
- "go/core/pkg/migrations/**"
10+
11+
jobs:
12+
check:
13+
runs-on: ubuntu-latest
14+
steps:
15+
- uses: actions/checkout@v4
16+
17+
- uses: actions/setup-go@v6
18+
with:
19+
go-version: "1.26"
20+
cache: true
21+
cache-dependency-path: go/go.sum
22+
23+
- name: Run sqlc generate
24+
working-directory: go
25+
run: make sqlc-generate
26+
27+
- name: Fail if generated files differ
28+
run: |
29+
if ! git diff --quiet go/core/internal/database/gen/; then
30+
echo "ERROR: sqlc generate produced changes. Run sqlc generate locally and commit the result."
31+
echo ""
32+
git diff go/core/internal/database/gen/
33+
exit 1
34+
fi
35+
echo "OK: generated files are up to date."

.github/workflows/tag.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@ jobs:
9595
- name: 'Checkout GitHub Action'
9696
uses: actions/checkout@main
9797
- name: Install uv
98-
uses: astral-sh/setup-uv@v6
98+
uses: astral-sh/setup-uv@v7
9999
- name: 'Release Python Packages'
100100
working-directory: python
101101
run: |

Makefile

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,6 @@ APP_IMAGE_TAG ?= $(VERSION)
4747
KAGENT_ADK_IMAGE_TAG ?= $(VERSION)
4848
GOLANG_ADK_IMAGE_TAG ?= $(VERSION)
4949
SKILLS_INIT_IMAGE_TAG ?= $(VERSION)
50-
5150
CONTROLLER_IMG ?= $(DOCKER_REGISTRY)/$(DOCKER_REPO)/$(CONTROLLER_IMAGE_NAME):$(CONTROLLER_IMAGE_TAG)
5251
UI_IMG ?= $(DOCKER_REGISTRY)/$(DOCKER_REPO)/$(UI_IMAGE_NAME):$(UI_IMAGE_TAG)
5352
APP_IMG ?= $(DOCKER_REGISTRY)/$(DOCKER_REPO)/$(APP_IMAGE_NAME):$(APP_IMAGE_TAG)

docker/skills-init/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,5 +14,5 @@ RUN CGO_ENABLED=0 go build -trimpath -ldflags="-s -w" -o /build/krane .
1414

1515
FROM alpine:3.23
1616

17-
RUN apk add --no-cache git
17+
RUN apk upgrade --no-cache && apk add --no-cache git
1818
COPY --from=krane-builder /build/krane /usr/local/bin/krane

docs/architecture/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -149,7 +149,7 @@ The controller uses SQLite (default) or PostgreSQL for persistent state that sup
149149
**Why a separate DB?** The Kubernetes API is not designed for high-frequency read patterns like listing conversations or searching tools. The DB provides fast lookups for the HTTP API and UI, while the CRDs remain the source of truth for agent configuration.
150150

151151
**Key files:**
152-
- `go/api/database/models.go`GORM models
152+
- `go/api/database/models.go`database models
153153
- `go/core/internal/database/client.go` — Database client implementation
154154
- `go/core/internal/database/service.go` — Business logic with atomic upserts
155155

@@ -398,7 +398,7 @@ go/
398398
├── go.work
399399
├── api/ # github.com/kagent-dev/kagent/go/api
400400
│ ├── v1alpha2/ # CRD type definitions
401-
│ ├── database/ # GORM database models
401+
│ ├── database/ # database models
402402
│ ├── httpapi/ # HTTP API request/response types
403403
│ ├── client/ # REST client SDK for the HTTP API
404404
│ └── config/crd/ # Generated CRD manifests

0 commit comments

Comments
 (0)