WIP: feat(backend, deployment): seed dev SeqSet-citation test data + curated-citation endpoint#6635
Draft
theosanderson wants to merge 6 commits into
Draft
WIP: feat(backend, deployment): seed dev SeqSet-citation test data + curated-citation endpoint#6635theosanderson wants to merge 6 commits into
theosanderson wants to merge 6 commits into
Conversation
…tions Add POST /create-curated-citation, restricted to super users, which records a manually curated citation (origin = CURATED) against a SeqSet version. The endpoint validates that the caller is a super user (403 otherwise) and that the target SeqSet version exists (404 otherwise), upserts the citation source by DOI (reusing an existing source row rather than overwriting, e.g. one already discovered via CrossRef), and links it to the SeqSet version. The link is by (seqset_id, seqset_version), so the SeqSet needs no minted DOI. Adds the SubmittedCuratedCitation request type, a test client helper, and tests covering the success, forbidden (non-super-user), not-found, and unauthorized cases. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Spec a Kubernetes Job (dev/E2E deployments only) that reuses the integration-test Playwright page objects to submit dummy-organism sequences, release them, build a SeqSet, and add a curated citation via the new /create-curated-citation endpoint. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a Playwright "seed" project and an in-cluster Job that populates a fresh
dev/E2E deployment with SeqSet-citation data, reusing the existing
integration-test page objects.
tests/seed.setup.ts logs in as the dev super user and, in one flow:
- creates a submitting group,
- bulk-submits a few dummy-organism sequences and releases them,
- builds a SeqSet from the released accessions,
- adds a curated citation via the new superuser-only
POST /create-curated-citation endpoint (authenticated with the logged-in
access_token cookie).
It is idempotent (skips if the seed SeqSet already exists).
The seed project is only registered when RUN_SEED=true, so normal test runs
never trigger it. A new integration-tests image (Dockerfile) runs `npm run
seed`, and templates/seed-test-data-job.yaml runs it as a Helm
post-install/post-upgrade hook gated on seedTestData.enabled (off by default,
on in values_e2e_and_dev.yaml). An init container waits for the website and
backend before seeding. Schema and chart validated with helm lint/template and
prettier; the TS is type-checked and linted.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
|
This PR may be related to: #6497 (Inconsistent use of the term 'Citation' in backend) — the new |
…ments The seed Job pulls ghcr.io/loculus-project/integration-tests, but no workflow built that image, so it would ImagePullBackOff. Add integration-tests-image.yml (modelled on the other per-image workflows) to build and push it with the same commit-/branch tags the preview deployments resolve to. Also enable seedTestData in values_preview_server.yaml so preview deployments actually run the seeder (preview renders values.yaml + values_preview_server.yaml, not values_e2e_and_dev.yaml). createTestAccounts is already true there, so the super user the seeder logs in as exists. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
seed.setup.ts used the console-warnings fixture, which hard-fails on any non-allowlisted browser console error. On a freshly-syncing preview the still-stabilizing services emit a transient 400, which aborted the seeder mid-login. Seeding shouldn't be gated on console cleanliness, so use the base Playwright test instead. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The seeder drives the real login flow, whose OIDC redirect goes to the public Keycloak host. Pointing PLAYWRIGHT_TEST_BASE_URL at the in-cluster website service meant the login page (served from Keycloak) never rendered, so the seeder timed out waiting for the username field. Use the chart's websiteUrl/backendUrl helpers (public hosts on server deployments, localhost on k3d) for both the seeder and the readiness init container, and also wait for the keycloak realm endpoint before starting. Raise the per-test timeout to 10m and backoffLimit to 2 for the full submit/seqset/ citation flow. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this adds
Two related pieces on top of the SeqSet citation-tracking work:
1. Superuser-only endpoint to add curated citations (
backend)PR #6304 defines a
CURATEDcitation origin in the DB but nothing writes it — only the scheduled CrossRef task inserts citations, which isn't reproducible on a dev cluster. This adds a real way to record a manual citation:POST /create-curated-citation(super-user only), body{ seqSetId, seqSetVersion, source: { sourceDOI, title, year, contributors } }.isSuperUser(403 otherwise), validates the SeqSet version exists (404 otherwise), upserts the citation source by DOI (reusing an existing source row rather than clobbering, e.g. one already discovered via CrossRef), and links it to the SeqSet version. The link is by(seqset_id, seqset_version), so no minted DOI is required.SubmittedCuratedCitationrequest type, a test-client helper, and tests covering success, forbidden (non-super-user), not-found, and unauthorized cases. Backend tests pass locally.2. Dev-only test-data seeder (
deployment/integration-tests)A Job that populates a fresh dev/E2E deployment so the SeqSet + citation feature is visibly exercised, reusing the existing integration-test Playwright page objects.
tests/seed.setup.tslogs in as the dev super user and, in one flow:access_tokencookie).It is idempotent (skips if the seed SeqSet already exists).
seedPlaywright project is only registered whenRUN_SEED=true, so normal test runs never trigger it.integration-tests/Dockerfilerunsnpm run seed.templates/seed-test-data-job.yamlruns it as a Helmpost-install/post-upgradehook gated onseedTestData.enabled(off by default; on invalues_e2e_and_dev.yaml), with an init container that waits for the website + backend. Helm hooks are honoured by both plain Helm and Argo CD.Design notes and decisions are written up in
integration-tests/seed/SPEC.md.Validation
./gradlew test --tests CitationEndpointsTestpasses; compiles + ktlint clean.helm lint(prod + dev values) andhelm templatepass; the Job renders only when enabled. Schema formatted with prettier.tsc,prettier,eslintclean.Not yet done (the WIP part)
The seeder is type-checked and the chart renders, but it has not been run end-to-end against a live cluster. Two assumptions to confirm there (both have a documented fallback in the SPEC):
submissionId/date/country/pangoLineage;access_tokencookie usable as a backend bearer token.🤖 Generated with Claude Code
🚀 Preview: Add
previewlabel to enable