Skip to content

Commit 3d57781

Browse files
theosandersonclaude
andcommitted
feat(deployment): dev-only Job to seed SeqSet-citation test data
Add a Playwright "seed" project and an in-cluster Job that populates a fresh dev/E2E deployment with SeqSet-citation data, reusing the existing integration-test page objects. tests/seed.setup.ts logs in as the dev super user and, in one flow: - creates a submitting group, - bulk-submits a few dummy-organism sequences and releases them, - builds a SeqSet from the released accessions, - adds a curated citation via the new superuser-only POST /create-curated-citation endpoint (authenticated with the logged-in access_token cookie). It is idempotent (skips if the seed SeqSet already exists). The seed project is only registered when RUN_SEED=true, so normal test runs never trigger it. A new integration-tests image (Dockerfile) runs `npm run seed`, and templates/seed-test-data-job.yaml runs it as a Helm post-install/post-upgrade hook gated on seedTestData.enabled (off by default, on in values_e2e_and_dev.yaml). An init container waits for the website and backend before seeding. Schema and chart validated with helm lint/template and prettier; the TS is type-checked and linted. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 38c62fa commit 3d57781

9 files changed

Lines changed: 397 additions & 105 deletions

File tree

integration-tests/Dockerfile

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# Image for running the integration-tests Playwright suite (and the dev data seeder)
2+
# inside the cluster. Based on the official Playwright image so browsers are preinstalled;
3+
# keep the version in sync with the @playwright/test version in package.json.
4+
FROM mcr.microsoft.com/playwright:v1.60.0-noble
5+
6+
WORKDIR /app
7+
8+
# Install dependencies first for better layer caching.
9+
COPY package.json package-lock.json* ./
10+
RUN npm ci || npm install
11+
12+
# Copy the test sources and configuration.
13+
COPY . .
14+
15+
# By default, seed the deployment with SeqSet-citation test data. The Helm Job sets
16+
# PLAYWRIGHT_TEST_BASE_URL / PLAYWRIGHT_TEST_BACKEND_URL to the in-cluster services.
17+
CMD ["npm", "run", "seed"]

integration-tests/package.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,8 @@
55
"scripts": {
66
"format": "eslint --cache --fix && prettier --write \"**/*.{ts,js,mjs,json}\"",
77
"format:check": "eslint --cache && prettier --check \"**/*.{ts,js,mjs,json}\"",
8-
"test": "npx playwright test"
8+
"test": "npx playwright test",
9+
"seed": "RUN_SEED=true npx playwright test --project=seed --reporter=list"
910
},
1011
"devDependencies": {
1112
"@eslint/js": "^9.39.2",

integration-tests/playwright.config.ts

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,16 @@ const config = {
8888

8989
const testSuite = process.env.TEST_SUITE || 'all';
9090

91-
if (testSuite === 'cli') {
91+
if (process.env.RUN_SEED === 'true') {
92+
// Dev-only data seeding: run nothing but the seed setup (see tests/seed.setup.ts).
93+
config.projects = [
94+
{
95+
name: 'seed',
96+
use: { ...devices['Desktop Chrome'] },
97+
testMatch: /seed\.setup\.ts/,
98+
},
99+
];
100+
} else if (testSuite === 'cli') {
92101
// Run only CLI tests
93102
config.projects = config.projects.filter((p) => p.name === 'cli-tests');
94103
} else if (browser) {

integration-tests/seed/SPEC.md

Lines changed: 74 additions & 102 deletions
Original file line numberDiff line numberDiff line change
@@ -69,120 +69,92 @@ the SeqSet version. The link is by `(seqset_id, seqset_version)`, so **no minted
6969
> realm role. Recommend logging in as the existing dev superuser (`superuser`/`superuser`, created
7070
> when `createTestAccounts: true`) for that one call, while the submit/seqset steps use the seed user.
7171
72-
## Component shape
72+
## Component shape (as implemented)
7373

74-
A Kubernetes **Job** (not a long-running Deployment) gated on a new dev-only value. Built from the
75-
`integration-tests/` image (Playwright + node_modules already present) with a non-test entrypoint.
74+
The seeder is a **Playwright project** (`seed`) running a single setup file, packaged in a new
75+
`integration-tests` image and run in-cluster by a Helm-hook **Job**. No bespoke Node entrypoint and
76+
no DB access — it drives the existing page objects and calls the new citation endpoint over HTTP.
7677

7778
```
7879
integration-tests/
79-
seed/
80-
SPEC.md <- this file
81-
seed.ts <- standalone entrypoint (launches chromium, composes page objects, then pg insert)
82-
Dockerfile <- (new or extended) builds an image usable as both test-runner and seeder
80+
seed/SPEC.md <- this file
81+
tests/seed.setup.ts <- the seed setup (reuses page objects; the whole flow)
82+
playwright.config.ts <- adds the RUN_SEED-gated `seed` project
83+
package.json <- `npm run seed` => RUN_SEED=true playwright test --project=seed
84+
Dockerfile <- mcr.microsoft.com/playwright image; CMD ["npm","run","seed"]
85+
kubernetes/loculus/templates/seed-test-data-job.yaml <- Helm-hook Job, gated on seedTestData.enabled
8386
```
8487

85-
`seed.ts` outline (all calls are existing page-object methods unless noted):
86-
87-
```ts
88-
const browser = await chromium.launch({ headless: true });
89-
const page = await browser.newPage({ baseURL: process.env.PLAYWRIGHT_TEST_BASE_URL });
90-
91-
// idempotency: bail if the seed user already exists (login succeeds)
92-
if (await new AuthPage(page).login(SEED_USER, SEED_PW)) { log('already seeded'); process.exit(0); }
93-
94-
await new AuthPage(page).createAccount(seedAccount);
95-
const groupId = await new GroupPage(page).createGroup(buildTestGroup('seed-group'));
96-
97-
const accessions: string[] = [];
98-
for (const s of SEED_SEQUENCES) { // ~3 sequences
99-
const review = await submissionPage.completeSubmission(
100-
{ ...s, groupId: String(groupId) }, s.sequenceData); // dummy-organism form
101-
await review.waitForAllProcessed(); // dummy pipeline runs in-cluster
102-
await review.releaseAndGoToReleasedSequences();
103-
accessions.push(await readAccession(page)); // small helper (parse released table/URL)
104-
}
105-
106-
const { seqSetId, seqSetVersion } = // createSeqSet returns id+version (parse from URL)
107-
await new SeqSetPage(page).createSeqSet({
108-
name: 'Seed SeqSet', description: 'Auto-seeded for dev',
109-
focalAccessions: [accessions[0]], backgroundAccessions: accessions.slice(1),
110-
});
111-
112-
// citation: call the superuser-only endpoint with a super-user token
113-
const superUserToken = await getToken('superuser', 'superuser'); // keycloak password grant
114-
await page.request.post(`${BACKEND_URL}/create-curated-citation`, {
115-
headers: { authorization: `Bearer ${superUserToken}` },
116-
data: {
117-
seqSetId, seqSetVersion,
118-
source: {
119-
sourceDOI: '10.0000/seed-citation-1', title: 'Seed reference publication',
120-
year: 2024, contributors: [{ givenName: 'Ada', surname: 'Lovelace' }],
121-
},
122-
},
123-
});
124-
await browser.close();
125-
```
126-
127-
Two small additions to the page-object layer are needed (both trivial, reusable by future tests):
128-
- `SeqSetPage.createSeqSet` should return `{ seqSetId, seqSetVersion }` (parse from the post-create URL).
129-
- a `readAccession(page)` helper to pull the accession of a just-released sequence.
130-
131-
## Kubernetes wiring
132-
133-
New template `kubernetes/loculus/templates/seed-test-data-job.yaml`:
134-
135-
- `kind: Job`, gated: `{{- if .Values.seedTestData.enabled }}` (whole file).
136-
- Image: `ghcr.io/loculus-project/integration-tests:{{ $dockerTag }}` (new image built in CI from
137-
`integration-tests/Dockerfile`), `command: ["node", "seed/seed.js"]`.
138-
- Env:
139-
- `PLAYWRIGHT_TEST_BASE_URL: http://loculus-website-service:3000` (verified service name,
140-
`templates/website-service.yaml`).
141-
- `DB_URL` / `DB_USERNAME` / `DB_PASSWORD` from the `database` secret (same refs as backend).
142-
- **Ordering / readiness:** website + backend + dummy-preprocessing must be up before it runs.
143-
Two viable mechanisms (pick one):
144-
1. **ArgoCD PostSync hook** (mirror `templates/ingest.yaml:127` `loculus-ingest-trigger`):
145-
`argocd.argoproj.io/hook: PostSync`, `backoffLimit`, `ttlSecondsAfterFinished: 600`.
146-
Cleanest fit with how this repo already bootstraps post-deploy work.
147-
2. Plain Job + an init-container that curls `…/website` and `…/backend` health until ready.
148-
> **Recommendation:** PostSync hook (option 1) — consistent with `ingest-trigger`.
149-
- `backoffLimit: 1`, `ttlSecondsAfterFinished: 600`, `restartPolicy: Never`.
88+
`tests/seed.setup.ts` runs everything **as the dev super user** (`superuser`/`superuser`), which can
89+
submit, create SeqSets, and add curated citations — so a single login covers all four steps:
90+
91+
1. `AuthPage.login('superuser', …)`; `GroupPage.getOrCreateGroup(seedGroup)`.
92+
2. Idempotency: `SeqSetPage.gotoList()`; if a `Seed SeqSet` cell exists, `setup.skip()`.
93+
3. `BulkSubmissionPage``uploadMetadataFile` (submissionId/date/country/pangoLineage) +
94+
`uploadSequencesFile``submitAndWaitForProcessingDone``releaseAndGoToReleasedSequences`.
95+
4. Collect released `LOC_…` accessions from the group's released page (poll-with-reload).
96+
5. `SeqSetPage.createSeqSet({focal, background})`; read `seqSetId`/`version` from the
97+
`/seqsets/<id>.<version>` URL.
98+
6. Read the `access_token` cookie from the logged-in context and `POST /create-curated-citation`
99+
(super-user token) via a backend `APIRequestContext`.
100+
101+
No page-object changes were required — `createSeqSet`'s result is recovered from the detail URL, and
102+
accessions are read with the same `LOC_` regex the seqset test uses.
103+
104+
### Gating it so normal runs never seed
105+
106+
The `seed` project is only added to `playwright.config.ts` when `RUN_SEED=true`. `seed.setup.ts` ends
107+
in `.setup.ts`, so no other project's `testMatch` picks it up. Default `npm test` therefore never runs it.
108+
109+
## Kubernetes wiring (as implemented)
110+
111+
`kubernetes/loculus/templates/seed-test-data-job.yaml`:
112+
113+
- `kind: Job`, whole file gated on `{{- if .Values.seedTestData.enabled }}`.
114+
- **Helm hooks** `post-install,post-upgrade` with `hook-delete-policy: before-hook-creation`, so it
115+
re-runs on each deploy and is recreated cleanly (avoids the immutable-Job problem on `helm upgrade`).
116+
Plain Helm (deploy.py/k3d) runs it as a post-deploy hook; Argo CD honours Helm hooks too — so this
117+
one mechanism covers both, no separate Argo annotations needed.
118+
- **Readiness:** an init container (`curlimages/curl`) loops until the website and backend respond,
119+
so the seeder doesn't start before services are up.
120+
- Image `{{ .Values.images.integrationTests.repository }}:{{ tag|default dockerTag }}`, built in CI
121+
from `integration-tests/Dockerfile`; `command: ["npm","run","seed"]`.
122+
- Env: `PLAYWRIGHT_TEST_BASE_URL=http://loculus-website-service:3000`,
123+
`PLAYWRIGHT_TEST_BACKEND_URL=http://loculus-backend-service:8079`,
124+
`SEED_SUPER_USER` / `SEED_SUPER_USER_PASSWORD` from `seedTestData.superUser`. No DB secret.
125+
- `activeDeadlineSeconds: 900`, `backoffLimit: 1`, `ttlSecondsAfterFinished: 86400`, `restartPolicy: Never`.
150126

151127
### Values
152128

153-
`kubernetes/loculus/values.yaml` (default OFF, production-safe):
154-
```yaml
155-
seedTestData:
156-
enabled: false
157-
user: { username: seed_user, password: seed_user }
158-
organism: dummy-organism
159-
sequenceCount: 3
160-
```
161-
`kubernetes/loculus/values_e2e_and_dev.yaml` (turn ON for dev/E2E):
162-
```yaml
163-
seedTestData:
164-
enabled: true
165-
```
166-
Add the `seedTestData` object to `values.schema.json`, then:
167-
`npx prettier@3.6.2 --write kubernetes/loculus/values.schema.json` and
168-
`helm lint kubernetes/loculus -f kubernetes/loculus/values.yaml` (per `kubernetes/AGENTS.md`).
129+
- `values.yaml` — adds `images.integrationTests` and a default-OFF `seedTestData` block
130+
(`enabled: false`, `superUser: {username: superuser, password: superuser}`).
131+
- `values_e2e_and_dev.yaml``seedTestData.enabled: true`.
132+
- `values.schema.json` — registers `images.integrationTests` (required: `images` has
133+
`additionalProperties: false`) and the `seedTestData` object.
134+
135+
Validated with `helm lint` (prod + dev values), `helm template` (Job renders only when enabled),
136+
`prettier` on the schema, and `tsc`/`prettier`/`eslint` on the new TS.
169137

170138
## Idempotency & safety
171139

172-
- Re-running on an already-seeded cluster is a no-op (seed user login check up front).
173-
- `enabled: false` by default → never runs in production. The CURATED-citation SQL and the
174-
`database` secret mount only exist on dev because the whole template is gated.
175-
- Uses the dummy organism only, so no real pathogen data or real DOIs/CrossRef calls.
140+
- Re-running on an already-seeded cluster is a no-op (the `Seed SeqSet` existence check `setup.skip()`s).
141+
- `enabled: false` by default → the whole template is gated, so it never renders in production.
142+
- Uses the dummy organism only — no real pathogen data, no real DOIs/CrossRef calls.
176143

177144
## Decisions
178145

179-
1. **Citation mechanism — DECIDED: new superuser-only `POST /create-curated-citation` endpoint**
180-
(implemented in this branch). Seed job calls it with a super-user token; no DB secret needed.
181-
2. **Submission driver — DECIDED: Playwright UI**, reusing the integration-test page objects.
182-
183-
## Open questions for reviewer
184-
185-
1. **Trigger:** ArgoCD PostSync hook (recommended) vs. readiness-gated plain Job.
186-
2. **Image:** extend the existing `integration-tests` image with a `seed/` entrypoint
187-
(recommended) vs. a separate slimmer image.
188-
```
146+
1. **Citation mechanism — new superuser-only `POST /create-curated-citation` endpoint** (implemented
147+
in this branch). Seeder calls it with the super-user token; no DB secret.
148+
2. **Submission driver — Playwright UI**, reusing the integration-test page objects.
149+
3. **Run identity — the dev super user** for all steps (one login; can submit + seqset + cite).
150+
4. **Trigger — Helm hooks** (post-install/upgrade), which work under both plain Helm and Argo CD.
151+
5. **Image — extend the integration-tests image** with a `seed` Playwright project (RUN_SEED-gated).
152+
153+
## Validating on a live cluster (not yet done)
154+
155+
The flow is type-checked and the chart renders, but it has not been run end-to-end against a cluster.
156+
Two assumptions to confirm there (both have a clear fallback):
157+
- The dummy-organism **bulk** submission accepts `submissionId/date/country/pangoLineage`. If a field
158+
is rejected, adjust `METADATA_HEADERS`/`SUBMISSIONS` in `seed.setup.ts`.
159+
- The website stores the Keycloak access token in an **`access_token` cookie** usable as a backend
160+
Bearer token. If not, swap the citation step to a Keycloak password-grant (needs a keycloak URL env).

0 commit comments

Comments
 (0)