Skip to content

Commit 38c62fa

Browse files
theosandersonclaude
andcommitted
docs(integration-tests): spec for dev-only test-data seeding component
Spec a Kubernetes Job (dev/E2E deployments only) that reuses the integration-test Playwright page objects to submit dummy-organism sequences, release them, build a SeqSet, and add a curated citation via the new /create-curated-citation endpoint. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 8eb7517 commit 38c62fa

1 file changed

Lines changed: 188 additions & 0 deletions

File tree

integration-tests/seed/SPEC.md

Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
# Spec: `seed-test-data` — dev-only test-data seeding component
2+
3+
## Goal
4+
5+
A new component that runs **inside the Kubernetes cluster on dev/E2E deployments only** and
6+
seeds a realistic slice of SeqSet-citation test data, so that the SeqSet + citation features
7+
(PR #6304) are visibly populated on a fresh dev deployment without a human clicking through the UI.
8+
9+
On each fresh dev deploy it will:
10+
11+
1. Register a seed user + create a group.
12+
2. Submit a handful of sequences to the **dummy organism**, drive them through the dummy
13+
preprocessing pipeline, and **release** them (so they get accessions).
14+
3. Create a **SeqSet** referencing those released accessions (focal + background).
15+
4. Insert a **manual ("CURATED") citation** of that SeqSet into the database.
16+
17+
## Why reuse the integration tests
18+
19+
The whole submit→preprocess→release→seqset flow is already implemented as Playwright page
20+
objects in `integration-tests/`. Rather than re-deriving the backend REST choreography, the
21+
seed component is a thin entrypoint that drives those same page objects against the in-cluster
22+
website. This keeps the seed path and the test path exercising identical code.
23+
24+
Reused page objects (all under `integration-tests/tests/pages/`):
25+
26+
| Page object | Method(s) used | Source |
27+
|---|---|---|
28+
| `AuthPage` | `createAccount` / `tryLoginOrRegister` | `auth.page.ts:12` |
29+
| `GroupPage` | `createGroup` | `group.page.ts:25` |
30+
| `SubmissionPage` | `fillSubmissionFormDummyOrganism`, `fillSequenceData`, `acceptTerms`, `completeSubmission` | `submission.page.ts:101,116,136` |
31+
| `ReviewPage` | `waitForAllProcessed`, `releaseAndGoToReleasedSequences` | `review.page.ts:121,160` |
32+
| `SeqSetPage` | `gotoList`, `createSeqSet` | `seqset.page.ts:15,31` |
33+
34+
Reused helpers: `buildTestGroup` (`utils/testGroup.ts:23`), sequence constants in
35+
`test-helpers/test-data.ts`, and the dummy-organism display name **"Test Dummy Organism"**
36+
(`kubernetes/loculus/values.yaml:1776`).
37+
38+
## The citation mechanism
39+
40+
Citations land in the DB two ways (`backend/.../db/migration/V1.29__add_seqset_citations_table.sql`):
41+
42+
- `origin = 'CROSSREF'` — written by the scheduled `SeqSetCrossRefCitationsTask` that polls the
43+
CrossRef cited-by API every 6h. Not reproducible on a dev cluster (no real CrossRef, no real DOIs).
44+
- `origin = 'CURATED'` — now written by the **`POST /create-curated-citation`** backend endpoint
45+
added in this branch (superuser-only). See `SeqSetCitationsController.createCuratedCitation`.
46+
47+
The seed job creates the manual citation by calling that endpoint with a **super-user** token:
48+
49+
```
50+
POST /create-curated-citation (Authorization: Bearer <super-user JWT>)
51+
{
52+
"seqSetId": "<from createSeqSet>",
53+
"seqSetVersion": 1,
54+
"source": {
55+
"sourceDOI": "10.0000/seed-citation-1",
56+
"title": "Seed reference publication",
57+
"year": 2024,
58+
"contributors": [{ "givenName": "Ada", "surname": "Lovelace" }]
59+
}
60+
}
61+
```
62+
63+
The endpoint enforces `authenticatedUser.isSuperUser` (else 403), validates the SeqSet exists
64+
(else 404), upserts the citation source (reusing an existing DOI row if present), then links it to
65+
the SeqSet version. The link is by `(seqset_id, seqset_version)`, so **no minted DOI is required**.
66+
67+
> **Implications for the seed job:** no DB secret needed — it's a plain authenticated HTTP call via
68+
> Playwright's `page.request` (or `fetch`). The citation step must use a token with the `super_user`
69+
> realm role. Recommend logging in as the existing dev superuser (`superuser`/`superuser`, created
70+
> when `createTestAccounts: true`) for that one call, while the submit/seqset steps use the seed user.
71+
72+
## Component shape
73+
74+
A Kubernetes **Job** (not a long-running Deployment) gated on a new dev-only value. Built from the
75+
`integration-tests/` image (Playwright + node_modules already present) with a non-test entrypoint.
76+
77+
```
78+
integration-tests/
79+
seed/
80+
SPEC.md <- this file
81+
seed.ts <- standalone entrypoint (launches chromium, composes page objects, then pg insert)
82+
Dockerfile <- (new or extended) builds an image usable as both test-runner and seeder
83+
```
84+
85+
`seed.ts` outline (all calls are existing page-object methods unless noted):
86+
87+
```ts
88+
const browser = await chromium.launch({ headless: true });
89+
const page = await browser.newPage({ baseURL: process.env.PLAYWRIGHT_TEST_BASE_URL });
90+
91+
// idempotency: bail if the seed user already exists (login succeeds)
92+
if (await new AuthPage(page).login(SEED_USER, SEED_PW)) { log('already seeded'); process.exit(0); }
93+
94+
await new AuthPage(page).createAccount(seedAccount);
95+
const groupId = await new GroupPage(page).createGroup(buildTestGroup('seed-group'));
96+
97+
const accessions: string[] = [];
98+
for (const s of SEED_SEQUENCES) { // ~3 sequences
99+
const review = await submissionPage.completeSubmission(
100+
{ ...s, groupId: String(groupId) }, s.sequenceData); // dummy-organism form
101+
await review.waitForAllProcessed(); // dummy pipeline runs in-cluster
102+
await review.releaseAndGoToReleasedSequences();
103+
accessions.push(await readAccession(page)); // small helper (parse released table/URL)
104+
}
105+
106+
const { seqSetId, seqSetVersion } = // createSeqSet returns id+version (parse from URL)
107+
await new SeqSetPage(page).createSeqSet({
108+
name: 'Seed SeqSet', description: 'Auto-seeded for dev',
109+
focalAccessions: [accessions[0]], backgroundAccessions: accessions.slice(1),
110+
});
111+
112+
// citation: call the superuser-only endpoint with a super-user token
113+
const superUserToken = await getToken('superuser', 'superuser'); // keycloak password grant
114+
await page.request.post(`${BACKEND_URL}/create-curated-citation`, {
115+
headers: { authorization: `Bearer ${superUserToken}` },
116+
data: {
117+
seqSetId, seqSetVersion,
118+
source: {
119+
sourceDOI: '10.0000/seed-citation-1', title: 'Seed reference publication',
120+
year: 2024, contributors: [{ givenName: 'Ada', surname: 'Lovelace' }],
121+
},
122+
},
123+
});
124+
await browser.close();
125+
```
126+
127+
Two small additions to the page-object layer are needed (both trivial, reusable by future tests):
128+
- `SeqSetPage.createSeqSet` should return `{ seqSetId, seqSetVersion }` (parse from the post-create URL).
129+
- a `readAccession(page)` helper to pull the accession of a just-released sequence.
130+
131+
## Kubernetes wiring
132+
133+
New template `kubernetes/loculus/templates/seed-test-data-job.yaml`:
134+
135+
- `kind: Job`, gated: `{{- if .Values.seedTestData.enabled }}` (whole file).
136+
- Image: `ghcr.io/loculus-project/integration-tests:{{ $dockerTag }}` (new image built in CI from
137+
`integration-tests/Dockerfile`), `command: ["node", "seed/seed.js"]`.
138+
- Env:
139+
- `PLAYWRIGHT_TEST_BASE_URL: http://loculus-website-service:3000` (verified service name,
140+
`templates/website-service.yaml`).
141+
- `DB_URL` / `DB_USERNAME` / `DB_PASSWORD` from the `database` secret (same refs as backend).
142+
- **Ordering / readiness:** website + backend + dummy-preprocessing must be up before it runs.
143+
Two viable mechanisms (pick one):
144+
1. **ArgoCD PostSync hook** (mirror `templates/ingest.yaml:127` `loculus-ingest-trigger`):
145+
`argocd.argoproj.io/hook: PostSync`, `backoffLimit`, `ttlSecondsAfterFinished: 600`.
146+
Cleanest fit with how this repo already bootstraps post-deploy work.
147+
2. Plain Job + an init-container that curls `…/website` and `…/backend` health until ready.
148+
> **Recommendation:** PostSync hook (option 1) — consistent with `ingest-trigger`.
149+
- `backoffLimit: 1`, `ttlSecondsAfterFinished: 600`, `restartPolicy: Never`.
150+
151+
### Values
152+
153+
`kubernetes/loculus/values.yaml` (default OFF, production-safe):
154+
```yaml
155+
seedTestData:
156+
enabled: false
157+
user: { username: seed_user, password: seed_user }
158+
organism: dummy-organism
159+
sequenceCount: 3
160+
```
161+
`kubernetes/loculus/values_e2e_and_dev.yaml` (turn ON for dev/E2E):
162+
```yaml
163+
seedTestData:
164+
enabled: true
165+
```
166+
Add the `seedTestData` object to `values.schema.json`, then:
167+
`npx prettier@3.6.2 --write kubernetes/loculus/values.schema.json` and
168+
`helm lint kubernetes/loculus -f kubernetes/loculus/values.yaml` (per `kubernetes/AGENTS.md`).
169+
170+
## Idempotency & safety
171+
172+
- Re-running on an already-seeded cluster is a no-op (seed user login check up front).
173+
- `enabled: false` by default → never runs in production. The CURATED-citation SQL and the
174+
`database` secret mount only exist on dev because the whole template is gated.
175+
- Uses the dummy organism only, so no real pathogen data or real DOIs/CrossRef calls.
176+
177+
## Decisions
178+
179+
1. **Citation mechanism — DECIDED: new superuser-only `POST /create-curated-citation` endpoint**
180+
(implemented in this branch). Seed job calls it with a super-user token; no DB secret needed.
181+
2. **Submission driver — DECIDED: Playwright UI**, reusing the integration-test page objects.
182+
183+
## Open questions for reviewer
184+
185+
1. **Trigger:** ArgoCD PostSync hook (recommended) vs. readiness-gated plain Job.
186+
2. **Image:** extend the existing `integration-tests` image with a `seed/` entrypoint
187+
(recommended) vs. a separate slimmer image.
188+
```

0 commit comments

Comments
 (0)