Skip to content

Commit e7c2d67

Browse files
pyphiliakim
andauthored
docs: add specifications for page assets (#1985)
* docs: add specifications for page assets * refactor: improve page asset doc --------- Co-authored-by: kim <kim.phanhoang@epfl.ch>
1 parent 19a75cd commit e7c2d67

1 file changed

Lines changed: 79 additions & 0 deletions

File tree

src/services/item/plugins/page/README.md

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,85 @@ sequenceDiagram
4444
4545
```
4646

47+
## Assets
48+
49+
A page can contain files (ie. images, videos, documents, etc), that we will call assets. The challenges are storage and reference to these assets given a page content.
50+
51+
**page_assets**
52+
53+
- asset_id: uuid, not unique
54+
- page_id: uuid, foreign on item id (page), delete cascade
55+
- size: number
56+
- created_at: date (default: now())
57+
- referenced_at: date (default: null)
58+
- deleted_at: date (default: null, null if not deleted)
59+
60+
Each asset should not be bigger than 1GB.
61+
62+
```mermaid
63+
erDiagram
64+
page_assets {
65+
asset_id uuid
66+
page_id uuid FK
67+
size number
68+
created_at date
69+
referenced_at date
70+
deleted_at date
71+
}
72+
```
73+
74+
For a row in this table, the related asset will be stored under `pages/<page-id>`. This will be easier for deletion. For copy purposes, we allow the `id` to be non-unique. However, [page_id, asset_id] must be unique (and indexed).
75+
76+
On access, an asset get `deleted_at` cleared and set `referenced_at=now`.
77+
78+
On upload, a row is created with a generated <asset-id> and the asset is stored at `pages/<page-id>/<asset-id>`. The client is expected to wait for the request, to reference the asset with given `<asset-id>`.
79+
A possible edge case: an upload is actually successful but is never used (ie. the uploaded asset isn't integrated in the corresponding page) and end in state (`deleted_at = null` && `referenced_at != null`). We rely on cleaning solutions (see more below).
80+
81+
A deleted asset will be marked by `deleted_at`. This allows for the history mechanism to still function. If an asset is accessed again, the `deleted_at` property is cleared. An edge case (race condition where someone accesses the asset after it has been deleted): a deleted asset is not marked and remains in the storage (state (`deleted_at = null` && `referenced_at != null`)). We rely on cleaning solutions (see more below).
82+
83+
On page copy, the related rows (`deleted_at = null` && `referenced_at != null`) are copied and the folder `pages/<page-id>` is duplicated and renamed `pages/<copy-id>`.
84+
85+
```mermaid
86+
flowchart TD
87+
*-->|Created|A
88+
A[referenced=null, deleted=null]-->|scheduled job|DELETED
89+
A-->|Get|B[referenced=date, deleted=null]
90+
B-->|Delete|C[referenced=date, deleted=date]
91+
C-->|Get|B
92+
C-->|scheduled job|DELETED
93+
A-->|Delete|E[referenced=null, deleted=date]
94+
E-->|scheduled job|DELETED
95+
linkStyle 4 stroke:#ff3,stroke-width:4px,color:red;
96+
style B stroke:#f66,stroke-width:2px,stroke-dasharray: 5 5
97+
```
98+
99+
```mermaid
100+
sequenceDiagram
101+
Client->>Server: POST /pages/page-id/upload
102+
Server->>Client: 201 CREATED: asset-id
103+
Client->>Server: GET /pages/page-id/assets/asset-id
104+
Server->>Client: 200 OK: signed url to asset
105+
```
106+
107+
### Cleaning solutions
108+
109+
- Delete assets where `created_at` is older than 7 days and `referenced_at=null` by a scheduled job.
110+
- Delete assets where `deleted_at` is older than 7 days by a scheduled job.
111+
- If a page is deleted, we delete it's corresponding assets folder and related `page_assets` rows (only way to delete assets with `referenced` not null and `deleted_at` null).
112+
- Possible solution: Endpoint optimization: that send all asset ids and remove non existant assets
113+
114+
### User Storage
115+
116+
A hard limit is set to 10GB. It becomes impossible to upload assets above this limit. We expect majority of pages to be less than 5GB, so there are some margin for inconsistent data.
117+
118+
Inconsistences might happen for assets where `deleted_at = null` and `referenced != null` but actually not being used anymore, because the user won't be able to delete them himself (unless deleting the page). Alternative solution would be to provide a gallery to manage related assets.
119+
120+
### Endpoints
121+
122+
- POST `/pages/<page-id>/upload`: store asset (form-data), if the user has write access to page-id
123+
- GET `/pages/<page-id>/<asset-id>`: get signed url to asset by id, if the asset is comes from page-id and user has read access to page-id, and clear `deleted_at`
124+
- DELETE `/pages/<page-id>/<asset-id>`: mark asset as deleted
125+
47126
## Tests
48127

49128
Controller tests use [`y-websocket`](https://github.com/yjs/y-websocket) to connect to the websocket endpoint. This allows to simulate a change in a yjs document to be reflected in the server.

0 commit comments

Comments
 (0)