Skip to content

WIP: feat(backend, deployment): seed dev SeqSet-citation test data + curated-citation endpoint#6635

Draft
theosanderson wants to merge 6 commits into
seqset-citationsfrom
seqset-citations-theo
Draft

WIP: feat(backend, deployment): seed dev SeqSet-citation test data + curated-citation endpoint#6635
theosanderson wants to merge 6 commits into
seqset-citationsfrom
seqset-citations-theo

Conversation

@theosanderson

@theosanderson theosanderson commented Jun 9, 2026

Copy link
Copy Markdown
Member

WIP / do not merge. Stacked on top of #6304 (seqset-citations) — review/merge that first. This PR targets the seqset-citations branch, not main.

What this adds

Two related pieces on top of the SeqSet citation-tracking work:

1. Superuser-only endpoint to add curated citations (backend)

PR #6304 defines a CURATED citation origin in the DB but nothing writes it — only the scheduled CrossRef task inserts citations, which isn't reproducible on a dev cluster. This adds a real way to record a manual citation:

  • POST /create-curated-citation (super-user only), body { seqSetId, seqSetVersion, source: { sourceDOI, title, year, contributors } }.
  • Enforces isSuperUser (403 otherwise), validates the SeqSet version exists (404 otherwise), upserts the citation source by DOI (reusing an existing source row rather than clobbering, e.g. one already discovered via CrossRef), and links it to the SeqSet version. The link is by (seqset_id, seqset_version), so no minted DOI is required.
  • Adds the SubmittedCuratedCitation request type, a test-client helper, and tests covering success, forbidden (non-super-user), not-found, and unauthorized cases. Backend tests pass locally.

2. Dev-only test-data seeder (deployment / integration-tests)

A Job that populates a fresh dev/E2E deployment so the SeqSet + citation feature is visibly exercised, reusing the existing integration-test Playwright page objects.

tests/seed.setup.ts logs in as the dev super user and, in one flow:

  1. creates a submitting group,
  2. bulk-submits a few dummy-organism sequences and releases them,
  3. builds a SeqSet from the released accessions,
  4. adds a curated citation via the new endpoint (authenticated with the logged-in access_token cookie).

It is idempotent (skips if the seed SeqSet already exists).

  • The seed Playwright project is only registered when RUN_SEED=true, so normal test runs never trigger it.
  • New integration-tests/Dockerfile runs npm run seed.
  • templates/seed-test-data-job.yaml runs it as a Helm post-install/post-upgrade hook gated on seedTestData.enabled (off by default; on in values_e2e_and_dev.yaml), with an init container that waits for the website + backend. Helm hooks are honoured by both plain Helm and Argo CD.

Design notes and decisions are written up in integration-tests/seed/SPEC.md.

Validation

  • Backend: ./gradlew test --tests CitationEndpointsTest passes; compiles + ktlint clean.
  • Chart: helm lint (prod + dev values) and helm template pass; the Job renders only when enabled. Schema formatted with prettier.
  • Integration-tests TS: tsc, prettier, eslint clean.

Not yet done (the WIP part)

The seeder is type-checked and the chart renders, but it has not been run end-to-end against a live cluster. Two assumptions to confirm there (both have a documented fallback in the SPEC):

  • the dummy-organism bulk form accepts submissionId/date/country/pangoLineage;
  • the website stores the Keycloak access token in an access_token cookie usable as a backend bearer token.

🤖 Generated with Claude Code

🚀 Preview: Add preview label to enable

theosanderson and others added 3 commits June 9, 2026 14:06
…tions

Add POST /create-curated-citation, restricted to super users, which records
a manually curated citation (origin = CURATED) against a SeqSet version.

The endpoint validates that the caller is a super user (403 otherwise) and
that the target SeqSet version exists (404 otherwise), upserts the citation
source by DOI (reusing an existing source row rather than overwriting, e.g.
one already discovered via CrossRef), and links it to the SeqSet version.
The link is by (seqset_id, seqset_version), so the SeqSet needs no minted DOI.

Adds the SubmittedCuratedCitation request type, a test client helper, and
tests covering the success, forbidden (non-super-user), not-found, and
unauthorized cases.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Spec a Kubernetes Job (dev/E2E deployments only) that reuses the
integration-test Playwright page objects to submit dummy-organism
sequences, release them, build a SeqSet, and add a curated citation via
the new /create-curated-citation endpoint.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a Playwright "seed" project and an in-cluster Job that populates a fresh
dev/E2E deployment with SeqSet-citation data, reusing the existing
integration-test page objects.

tests/seed.setup.ts logs in as the dev super user and, in one flow:
  - creates a submitting group,
  - bulk-submits a few dummy-organism sequences and releases them,
  - builds a SeqSet from the released accessions,
  - adds a curated citation via the new superuser-only
    POST /create-curated-citation endpoint (authenticated with the logged-in
    access_token cookie).
It is idempotent (skips if the seed SeqSet already exists).

The seed project is only registered when RUN_SEED=true, so normal test runs
never trigger it. A new integration-tests image (Dockerfile) runs `npm run
seed`, and templates/seed-test-data-job.yaml runs it as a Helm
post-install/post-upgrade hook gated on seedTestData.enabled (off by default,
on in values_e2e_and_dev.yaml). An init container waits for the website and
backend before seeding. Schema and chart validated with helm lint/template and
prettier; the TS is type-checked and linted.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@claude claude Bot added backend related to the loculus backend component deployment Code changes targetting the deployment infrastructure labels Jun 9, 2026
@claude

claude Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

This PR may be related to: #6497 (Inconsistent use of the term 'Citation' in backend) — the new /create-curated-citation endpoint and associated types introduced here touch the same citation naming area flagged in that issue.

@theosanderson theosanderson added the preview Triggers a deployment to argocd label Jun 9, 2026
theosanderson and others added 3 commits June 9, 2026 14:35
…ments

The seed Job pulls ghcr.io/loculus-project/integration-tests, but no workflow
built that image, so it would ImagePullBackOff. Add integration-tests-image.yml
(modelled on the other per-image workflows) to build and push it with the same
commit-/branch tags the preview deployments resolve to.

Also enable seedTestData in values_preview_server.yaml so preview deployments
actually run the seeder (preview renders values.yaml + values_preview_server.yaml,
not values_e2e_and_dev.yaml). createTestAccounts is already true there, so the
super user the seeder logs in as exists.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
seed.setup.ts used the console-warnings fixture, which hard-fails on any
non-allowlisted browser console error. On a freshly-syncing preview the
still-stabilizing services emit a transient 400, which aborted the seeder
mid-login. Seeding shouldn't be gated on console cleanliness, so use the base
Playwright test instead.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The seeder drives the real login flow, whose OIDC redirect goes to the public
Keycloak host. Pointing PLAYWRIGHT_TEST_BASE_URL at the in-cluster website
service meant the login page (served from Keycloak) never rendered, so the
seeder timed out waiting for the username field.

Use the chart's websiteUrl/backendUrl helpers (public hosts on server
deployments, localhost on k3d) for both the seeder and the readiness init
container, and also wait for the keycloak realm endpoint before starting. Raise
the per-test timeout to 10m and backoffLimit to 2 for the full submit/seqset/
citation flow.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@theosanderson theosanderson removed the preview Triggers a deployment to argocd label Jun 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend related to the loculus backend component deployment Code changes targetting the deployment infrastructure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant