Skip to content

feat!(website, backend): SeqSet citation tracking#6304

Open
tombch wants to merge 95 commits into
mainfrom
seqset-citations
Open

feat!(website, backend): SeqSet citation tracking#6304
tombch wants to merge 95 commits into
mainfrom
seqset-citations

Conversation

@tombch

@tombch tombch commented Apr 27, 2026

Copy link
Copy Markdown
Collaborator

Summary

Breaking Changes

  • This PR is marked as breaking due to removal of the get-seqset-cited-by-publication endpoint, as it is replaced with the endpoint get-seqset-citations. The former endpoint used the CitedBy return type, and the new one uses List<SeqSetCitation>.

Schema Changes

  • Adds a seqset_citation_source table: a 'citation source' being any publication (or possibly other item) that references one or more SeqSets. A citation source must have a DOI, title, year and contributors, and can have an origin either from Crossref or manually curated (manual curation endpoints will be added in a follow-up PR).
  • Adds a seqset_to_citation_source table for database joins between citation sources and the seqsets they cite.

Backend Changes

  • Adds a scheduled task to update SeqSet citation sources with results from Crossref. The task has an initial delay of one minute, then runs every 6 hours. It first checks if the Crossref backend service is active, then queries the Crossref Cited-by API to retrieve all citations for the DOI prefix configured in Loculus. If the returned XML cannot be parsed, or doesn't have a crossref_result tag, an IllegalStateException is thrown. If individual forward_links fail validation (missing dois/titles/years) these are added as CrossRefValidationError objects and logged in the task output. Validated citations are then merged into individual citation sources, conflicts logged, and results upserted in the database. If a curated citation source exists for a DOI found on Crossref, it is updated with the results from Crossref.
  • Adds the backend endpoint /get-seqset-citations: for a Seqset ID and version, retrieves citations and returns their source DOIs, titles, years and contributors for each citation of the SeqSet.
  • Removes the get-seqset-cited-by-publication endpoint as this is now made redundant.
  • There aren't any endpoints yet for manual curation of citations, or for aggregating citations for groups, sequences etc - but I think these would be better in a follow-up PR.
  • Also - there should probably be some cleanup task for removing citation sources if all their SeqSets all get deleted - if that is still a supported feature.

Website Changes

  • Adds a View Citations modal to the SeqSet details page which lists citations for the seqset.
  • Updates the total count of citations and the citations over time graph on the SeqSet details page to use the get-seqset-citations endpoint.

Additional website changes

  • Updates the other modals on the SeqSet details page to use the BaseDialog component for consistency.

Screenshot

image

PR Checklist

  • All necessary documentation has been adapted.
  • The implemented feature is covered by appropriate, automated tests.
  • Any manual testing that has been done is documented (i.e. what exactly was tested?)

🚀 Preview: https://seqset-citations.loculus.org

@claude claude Bot added website Tasks related to the web application backend related to the loculus backend component seqset Related to seqsets labels Apr 27, 2026
@claude

claude Bot commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

This PR may be related to: #1501 (Check Crossref via API to see what papers citing DOIs)

@tombch tombch changed the title feat(website, backend): SeqSet citation details feat(website, backend): SeqSet citation tracking May 7, 2026
Comment thread website/src/components/SeqSetCitations/CitationTable.tsx
@@ -0,0 +1,26 @@
create table seqset_citation_source (
citation_source_id bigserial,
source_doi text not null unique,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have asked this before, sorry for forgetting, I thought that not all citations might have created a DOI and we wanted to still be able to add them here - so I think we would need to allow this to still be null?

Comment thread website/src/components/SeqSetCitations/CitationTable.tsx Outdated
Comment thread backend/src/main/kotlin/org/loculus/backend/api/SeqSetCitations.kt
Comment thread website/src/components/common/BaseDialog.tsx Outdated

@maverbiest maverbiest left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was finally able to go through the whole PR, apologies that it took so long!

Looks great overall, no major blockers on my side. I think we could make CrossRefServiceTest.kt a bit cleaner with a parametrized test. I would also like to interact with the front-end changes on a preview or locally. Do you have any tips for setting this up so there's some test seqsets to play around with?

@anna-parker anna-parker left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Tom! Sorry only finished looking through this today, its a bit hard to test without CrossRef examples - I added a few comments on the backend but it would be great to put this on staging on Monday and then take a closer look, let me know if you need any help :-)

@tombch tombch changed the title feat(website, backend): SeqSet citation tracking feat!(website, backend): SeqSet citation tracking Jun 9, 2026
@theosanderson

Copy link
Copy Markdown
Member

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e2fb4a4bda

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread backend/src/main/kotlin/org/loculus/backend/service/crossref/CrossRefService.kt Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend related to the loculus backend component preview Triggers a deployment to argocd seqset Related to seqsets update_db_schema website Tasks related to the web application

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants