Skip to content

feat(ingest, backend): improve ingest curation warning - add diff#6633

Merged
anna-parker merged 15 commits into
mainfrom
curation_warning_add_diff
Jun 9, 2026
Merged

feat(ingest, backend): improve ingest curation warning - add diff#6633
anna-parker merged 15 commits into
mainfrom
curation_warning_add_diff

Conversation

@anna-parker

@anna-parker anna-parker commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

resolves #3084

Adds the option to filter by AccessionVersion on the get-unprocessed-metadata endpoint. Have ingest call this endpoint to get all previous metadata (includes sequence hashes, so it will indirectly tell us if a sequence has been changed) when we get into the case where ingest would like to revise a curated sequence.

Ingest will then add a diff of the curated metadata and the metadata it would like to revise to the notification so that curators can decide better to to handle the situation.

PR Checklist

  • All necessary documentation has been adapted for backend endpoints
  • The backend endpoint updates are covered by unit tests
  • Any manual testing that has been done is documented (i.e. what exactly was tested?)
  1. Ingest sequences from the INSDC with initial hash
  2. Curate a sequence from the INSDC as a superuser (add a revision, do not change hash)
  3. Ingest ignores revision as the hash has not changed
  4. Log into database and manually modify the hash of the curated entry (unprocessed data not original data)
  5. Run the ingest cronjob and verify that an informative message is sent to the loculus notifications slack:
image

🚀 Preview: https://curation-warning-add-diff.loculus.org

@claude claude Bot added backend related to the loculus backend component ingest Ingest pipeline preprocessing Issues related to the preprocessing component website Tasks related to the web application labels Jun 8, 2026
@anna-parker anna-parker force-pushed the curation_warning_add_diff branch from 649aca5 to 712cd29 Compare June 8, 2026 15:49
@anna-parker anna-parker added preview Triggers a deployment to argocd and removed website Tasks related to the web application preprocessing Issues related to the preprocessing component labels Jun 8, 2026
@anna-parker anna-parker changed the title Curation warning add diff feat(ingest, backend): improve ingest curation warning - add diff Jun 8, 2026
Comment thread ingest/scripts/loculus_client.py Outdated
@anna-parker anna-parker marked this pull request as ready for review June 9, 2026 07:20
@claude

claude Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Claude finished @anna-parker's task in 1m 56s —— View job


Code Review

  • Gather context and read changed files
  • Review backend changes (Kotlin)
  • Review ingest changes (Python)
  • Post inline comments
  • Final summary

Good feature overall — adding the metadata diff to the curation warning makes the Slack notification much more actionable. The backend filter implementation is clean and well-tested. A couple of issues worth addressing before merge:

One functional bug: get_sequence_status() is called before the early-return in get_submitted when output is None, so every call from calculate_metadata_diff fetches statuses for all sequences in the group unnecessarily. See inline comment on loculus_client.py:498-501. Fix this →

Code style: Two print() calls in calculate_metadata_diff (lines 127, 131) should be logger.debug() or logger.info(). See inline comment.

Slack message length: The full metadata diff is inlined directly into the notification string. For sequences with many metadata fields, this could easily exceed Slack's ~3000 char limit, causing the notification to be silently dropped. Worth truncating or summarising. See inline comment on compare_hashes.py:181.

Other observations:

  • The backend accession version parsing using lastIndexOf('.') is the right approach — it correctly handles accessions that themselves contain dots.
  • The two new backend tests cover the happy path and the invalid-format 400 case well.
  • No automated tests for calculate_metadata_diff — acceptable given it's a notification path and was manually verified, but worth a note.
  • The # type: ignore is acknowledged in an inline comment; the structural match is correct at runtime.

Comment thread ingest/scripts/loculus_client.py Outdated
Comment thread ingest/scripts/compare_hashes.py Outdated
Comment thread ingest/scripts/compare_hashes.py
Comment thread ingest/scripts/compare_hashes.py Outdated
Comment thread ingest/scripts/compare_hashes.py Outdated
Comment thread ingest/scripts/compare_hashes.py Outdated
Comment thread ingest/scripts/compare_hashes.py Outdated
Comment thread ingest/scripts/compare_hashes.py Outdated
anna-parker and others added 5 commits June 9, 2026 16:43
Co-authored-by: Cornelius Roemer <cornelius.roemer@gmail.com>
## Summary

This introduces a single strict parser for Loculus accession-version
strings and uses it in both backend locations that previously parsed
`accession.version` values independently.

`AccessionVersion.fromString()` now validates that the input contains
exactly one separator, that both accession and version are present, and
that the version is numeric. Accessions containing dots are rejected as
invalid input. Parsing failures surface as
`UnprocessableEntityException` with a message that distinguishes
malformed structure from a non-numeric version.

The submission metadata filter and sequence-set validation now delegate
to this centralized parser. This removes duplicated parsing logic and
ensures both endpoints apply the same validation rules. Invalid
accession-version filters consequently return HTTP 422 rather than HTTP
400.

## Validation

- `./gradlew ktlintFormat --console=plain`
- `./gradlew test --console=plain -x checkDocker --tests
org.loculus.backend.api.SubmissionTypesTest`
- Kotlin main and test compilation completed successfully as part of the
focused test run.

The endpoint test suites could not run in this environment: Docker is
unavailable, while `USE_NONDOCKER_INFRA=true` requires the missing
`runuser` binary.

🚀 Preview: Add `preview` label to enable
@anna-parker anna-parker merged commit 0caf4ef into main Jun 9, 2026
45 checks passed
@anna-parker anna-parker deleted the curation_warning_add_diff branch June 9, 2026 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend related to the loculus backend component ingest Ingest pipeline preview Triggers a deployment to argocd

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ingest should not treat curator revisions as latest and revise

2 participants