Skip to content

Commit ff87dee

Browse files
fhennigchaoran-chen
authored andcommitted
fix: address various review comments for file sharing (#3978)
Document Remove dead code Document more things in the schema Add another doc string rename column requested_at -> upload_requested_at ... Update kubernetes/loculus/values.schema.json Co-authored-by: Cornelius Roemer <cornelius.roemer@gmail.com> Update preprocessing/specification.md Co-authored-by: Cornelius Roemer <cornelius.roemer@gmail.com> Update backend/src/main/kotlin/org/loculus/backend/model/SubmitModel.kt Co-authored-by: Cornelius Roemer <cornelius.roemer@gmail.com> Rename function publish -> release truncate new table fix test Refactor huge 'useEffect' in FolderUploadComponent Update schema documentation based on migration changes Replace * with concrete imports Replace * with concrete imports change 600 to 300 Fix formatting issue rename fileField -> fileCategory in the backend Update backend/src/main/kotlin/org/loculus/backend/controller/FilesController.kt Co-authored-by: Cornelius Roemer <cornelius.roemer@gmail.com> Update kubernetes/loculus/templates/_s3-endpoint.tpl Co-authored-by: Cornelius Roemer <cornelius.roemer@gmail.com> remove multipleFiles ... rename in frontend some changes format Add more docs to the endpoints undo * try catch mapping parsing Update error clarify naming of properties Add admin docs Allow region to be null Test if we can remove the region and everything is still fine format test fixes Add us-east-1 back Add some arch docs improve docs Document required bucket policy format Add more docs API docs formatting fix escape Add help text Remove '.' Add 'underline' to file links update sequence diagram with better description improve docs Update backend/docs/file_sharing.md Co-authored-by: Anna (Anya) Parker <50943381+anna-parker@users.noreply.github.com> Update backend/src/main/kotlin/org/loculus/backend/config/BackendSpringConfig.kt Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> eager initialize clients for thread safety Update backend/src/main/kotlin/org/loculus/backend/api/SubmissionTypes.kt Co-authored-by: Anna (Anya) Parker <50943381+anna-parker@users.noreply.github.com> Adress Annas comments file field -> file category tail rename Update preprocessing/specification.md Co-authored-by: Anna (Anya) Parker <50943381+anna-parker@users.noreply.github.com> rename getFileName to getFileIdPath Update schema documentation based on migration changes remove 'silo' property from values.yaml try out a container lifeycle hook ... Add checking of 'enabled' in backend test fix configure resources for minio
1 parent 498dc81 commit ff87dee

45 files changed

Lines changed: 522 additions & 230 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/e2e-k3d.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ jobs:
3434
env:
3535
ALL_BROWSERS: ${{ github.ref == 'refs/heads/main' || github.event.inputs.all_browsers && 'true' || 'false' }}
3636
sha: ${{ github.event.pull_request.head.sha || github.sha }}
37-
wait_timeout: ${{ github.ref == 'refs/heads/main' && 900 || 600 }}
37+
wait_timeout: ${{ github.ref == 'refs/heads/main' && 900 || 300 }}
3838
steps:
3939
- name: Shorten sha
4040
run: echo "sha=${sha::7}" >> $GITHUB_ENV

backend/docs/db/schema.sql

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22
-- PostgreSQL database dump
33
--
44

5-
-- Dumped from database version 15.12 (Debian 15.12-1.pgdg120+1)
6-
-- Dumped by pg_dump version 16.8 (Debian 16.8-1.pgdg120+1)
5+
-- Dumped from database version 15.13 (Debian 15.13-1.pgdg120+1)
6+
-- Dumped by pg_dump version 16.9 (Debian 16.9-1.pgdg120+1)
77

88
SET statement_timeout = 0;
99
SET lock_timeout = 0;
@@ -272,10 +272,10 @@ ALTER VIEW public.external_metadata_view OWNER TO postgres;
272272

273273
CREATE TABLE public.files (
274274
id uuid NOT NULL,
275-
requested_at timestamp without time zone NOT NULL,
275+
upload_requested_at timestamp without time zone NOT NULL,
276276
uploader text NOT NULL,
277277
group_id integer NOT NULL,
278-
published_at timestamp without time zone
278+
released_at timestamp without time zone
279279
);
280280

281281

backend/docs/file_sharing.md

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# File sharing
2+
3+
Loculus supports a "file sharing" feature, where arbitrary files can be submitted alongside regular sequence entry data.
4+
The feature uses S3 blob storage to store the actual raw files, and a table in the backend to keep track of the files.
5+
6+
Just like the rest of sequence data files are not publicly accessible until they are released.
7+
8+
## Submission
9+
10+
![submission](./plantuml/sequenceFileSharingSubmission.svg)
11+
12+
## Preprocessing
13+
14+
![preprocessing](./plantuml/sequenceFileSharingPrepro.svg)
15+
16+
## Releasing
17+
18+
When a sequence entry is released, the associated files are made public as well.
19+
This is done by setting a `released_at` timestamp in the `files` table, and making the file object in S3 public.
20+
The bucket is configured to allow public access for objects which are tagged with `public=true`,
21+
so to make a file public, the backend sets this tag on the object in S3.
22+
23+
## The files table
24+
25+
The database has a table to keep track of the files:
26+
27+
```
28+
CREATE TABLE public.files (
29+
id uuid NOT NULL,
30+
upload_requested_at timestamp without time zone NOT NULL,
31+
uploader text NOT NULL,
32+
group_id integer NOT NULL,
33+
released_at timestamp without time zone
34+
);
35+
```
36+
37+
Entries in this table are created by the backend when the `request-uploads` endpoint is called.
38+
The request records who made the request, for which group and when.
39+
Note that the file does not yet exist in S3, the user still needs to upload it.
40+
41+
On release, the release time is also marked in the database.
42+
43+
This table can also be used to find "orphaned" files, i.e. files that have been requested and uploaded,
44+
but haven't been referenced in any sequence submission.
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
@startuml
2+
participant "Processing Pipeline" as pipeline #LightGreen
3+
participant "Backend" as backend #Orange
4+
participant "S3 / Object Storage" as s3 #LightGray
5+
database "Database" as DB
6+
7+
pipeline -> backend: request entries needing processing
8+
backend -> DB: fetch entries (including file IDs and file names)
9+
backend -> pipeline: return entry data + pre-signed read URLs
10+
11+
loop for each file
12+
pipeline -> s3: download file using pre-signed URL
13+
end loop
14+
15+
pipeline -> backend: request new pre-signed URLs for processed files
16+
backend -> DB: store new file IDs
17+
backend -> pipeline: return new pre-signed write URLs + file IDs
18+
19+
loop for each processed file
20+
pipeline -> s3: upload file using pre-signed write URL
21+
end loop
22+
23+
pipeline -> backend: submit processed file IDs + other data
24+
backend -> DB: store processed file data
25+
26+
@enduml
Lines changed: 1 addition & 0 deletions
Loading
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
@startuml
2+
participant "Frontend / User" as frontend #LightCyan
3+
participant "Backend" as backend #Orange
4+
participant "S3 / Object Storage" as s3 #LightGray
5+
database "Database" as DB
6+
7+
frontend -> backend: request pre-signed URLs and file IDs
8+
backend -> DB: insert file records with file IDs and requesting user
9+
backend -> frontend: return pre-signed URLs and file IDs
10+
11+
loop for each file
12+
frontend -> s3: upload file using pre-signed URL
13+
end loop
14+
15+
frontend -> backend: submit metadata, sequence data and file IDs per submission ID
16+
@enduml

0 commit comments

Comments
 (0)