Skip to content

refactor(backend): centralize accession version parsing#6638

Merged
anna-parker merged 1 commit into
curation_warning_add_difffrom
refactor/backend-accession-version-parser
Jun 9, 2026
Merged

refactor(backend): centralize accession version parsing#6638
anna-parker merged 1 commit into
curation_warning_add_difffrom
refactor/backend-accession-version-parser

Conversation

@corneliusroemer

@corneliusroemer corneliusroemer commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

This introduces a single strict parser for Loculus accession-version strings and uses it in both backend locations that previously parsed accession.version values independently.

AccessionVersion.fromString() now validates that the input contains exactly one separator, that both accession and version are present, and that the version is numeric. Accessions containing dots are rejected as invalid input. Parsing failures surface as UnprocessableEntityException with a message that distinguishes malformed structure from a non-numeric version.

The submission metadata filter and sequence-set validation now delegate to this centralized parser. This removes duplicated parsing logic and ensures both endpoints apply the same validation rules. Invalid accession-version filters consequently return HTTP 422 rather than HTTP 400.

Validation

  • ./gradlew ktlintFormat --console=plain
  • ./gradlew test --console=plain -x checkDocker --tests org.loculus.backend.api.SubmissionTypesTest
  • Kotlin main and test compilation completed successfully as part of the focused test run.

The endpoint test suites could not run in this environment: Docker is unavailable, while USE_NONDOCKER_INFRA=true requires the missing runuser binary.

🚀 Preview: Add preview label to enable

@claude claude Bot added the backend related to the loculus backend component label Jun 9, 2026
@corneliusroemer corneliusroemer marked this pull request as ready for review June 9, 2026 14:50
Copilot AI review requested due to automatic review settings June 9, 2026 14:50
@claude

claude Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Claude finished @corneliusroemer's task in 2m 38s —— View job


Code Review

  • Gather context and read changed files
  • Review code quality and logic
  • Review testing coverage
  • Post findings

Clean refactor. Consolidating the two independent parsing implementations into a single AccessionVersion.fromString() is the right call — the old SeqSetCitationsDatabaseService and SubmissionController paths had different error messages and subtly different validation (the controller allowed multi-dot accessions; the service didn't). The new implementation is more strict and consistent. The substringBeforeLast/substringAfterLast approach is idiomatic Kotlin and the test suite covers the main cases.

A few minor notes left as inline comments:

  1. getUnprocessedMetadata missing @ApiResponse(responseCode = "422") — the endpoint now returns 422 on invalid filter input, but there's no OpenAPI annotation documenting it. (See inline on SubmissionController.kt:440.)
  2. Negative version numbers silently acceptedtoLongOrNull() returns -1L for "LOC_123.-1". Probably not a real concern but worth a conscious check. (See inline on SubmissionTypes.kt:42.)
  3. Test edge cases — suggest adding "" and "." to the invalid-format list in SubmissionTypesTest. (See inline on SubmissionTypesTest.kt:17.)
  4. '.' in accession readability — the condition is correct but non-obvious because it relies on substringBeforeLast semantics; a short comment would help. (See inline on SubmissionTypes.kt:36.)

The 400→422 status change is intentional and documented in the PR description; the test updates correctly reflect it.

Comment thread backend/src/main/kotlin/org/loculus/backend/api/SubmissionTypes.kt
)
}

val version = versionString.toLongOrNull()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Negative versions (e.g. "LOC_123.-1") pass this check since toLongOrNull() accepts -1L. If negative versions are semantically invalid, consider adding a version < 1 guard here. This may be pre-existing behaviour and outside this PR's scope, but worth a conscious decision.

val version = it.substring(lastDot + 1).toLongOrNull()
?: throw BadRequestException("Invalid version in accession version '$it', expected a number")
AccessionVersion(it.substring(0, lastDot), version)
AccessionVersion.fromString(it)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The getUnprocessedMetadata endpoint now returns 422 on an invalid filter but has no @ApiResponse(responseCode = "422", ...) annotation. The old 400 case was also undocumented, but since the status changed it's worth adding one to keep the OpenAPI spec accurate.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR centralizes parsing and validation of accession.version strings into a single strict parser (AccessionVersion.fromString) and reuses it in backend endpoints/services that previously implemented their own parsing logic, making validation rules consistent and improving error reporting.

Changes:

  • Added AccessionVersion.fromString() to strictly parse accession.version (exactly one dot, both parts present, numeric version, no dots in accession).
  • Replaced duplicated parsing logic in SubmissionController and SeqSetCitationsDatabaseService with the centralized parser.
  • Updated/added tests to reflect the new parsing behavior and HTTP 422 responses for invalid accession-version inputs.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
backend/src/main/kotlin/org/loculus/backend/api/SubmissionTypes.kt Adds centralized strict AccessionVersion.fromString() parsing/validation logic.
backend/src/main/kotlin/org/loculus/backend/controller/SubmissionController.kt Delegates query filter parsing to AccessionVersion.fromString() (invalid filters now surface as 422).
backend/src/main/kotlin/org/loculus/backend/service/seqsetcitations/SeqSetCitationsDatabaseService.kt Replaces manual split/parse with AccessionVersion.fromString() for accession-version handling.
backend/src/test/kotlin/org/loculus/backend/api/SubmissionTypesTest.kt Adds unit tests for the centralized parser (valid, invalid-format, non-numeric version).
backend/src/test/kotlin/org/loculus/backend/controller/submission/GetUnprocessedMetadataEndpointTest.kt Updates expectation from HTTP 400 to 422 for invalid accessionVersions filter.
backend/src/test/kotlin/org/loculus/backend/controller/seqsetcitations/SeqSetValidationEndpointsTest.kt Updates expected validation error message for non-numeric version to match centralized parser output.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

val accession = value.substringBeforeLast('.', missingDelimiterValue = "")
val versionString = value.substringAfterLast('.', missingDelimiterValue = "")

if (accession.isEmpty() || versionString.isEmpty() || '.' in accession) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The '.' in accession condition rejects inputs like "a.b.1" by checking the substring before the last dot. This is correct, but the logic may be slightly non-obvious at a glance. Since accession = value.substringBeforeLast('.'), the only way '.' in accession is true is if there are 2+ dots in value. A comment here could help future readers understand the invariant, or alternatively you could rewrite it as an explicit dot-count check.

)
}

val version = versionString.toLongOrNull()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Negative versions (e.g. "LOC_123.-1") pass this check since toLongOrNull() accepts -1L. If negative versions are semantically invalid, consider adding if (version < 1) (or < 0) here. This may be pre-existing behaviour and outside this PR's scope, but worth a conscious decision.


@Test
fun `GIVEN accession version with invalid format THEN throws format exception`() {
listOf("LOC_123", ".1", "LOC_123.", "LOC.123.1").forEach {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding "" (empty string) and "." (single dot only) to the invalid format list. Both are rejected by the current logic, and explicit tests make the boundary conditions clear.

Suggested change
listOf("LOC_123", ".1", "LOC_123.", "LOC.123.1").forEach {
listOf("LOC_123", ".1", "LOC_123.", "LOC.123.1", "", ".").forEach {

val version = it.substring(lastDot + 1).toLongOrNull()
?: throw BadRequestException("Invalid version in accession version '$it', expected a number")
AccessionVersion(it.substring(0, lastDot), version)
AccessionVersion.fromString(it)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This endpoint no longer has a @ApiResponse(responseCode = "422", ...) annotation to document the new 422 behaviour. The old 400 case was also undocumented on this method, but since the status is now different from 400 and the method already exposes 422 responses elsewhere in the controller, it's worth adding one here to keep the OpenAPI spec accurate. Fix this →

@anna-parker anna-parker merged commit cf3bea6 into curation_warning_add_diff Jun 9, 2026
69 checks passed
@anna-parker anna-parker deleted the refactor/backend-accession-version-parser branch June 9, 2026 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend related to the loculus backend component

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants