Skip to content

Commit daf1fb6

Browse files
pyphiliakimspaenleh
authored
docs: add database documentation (#2058)
* docs: add database documentation * docs: generate db doc to deploy * fix: output db doc in correct folder, remove test data * refactor: ignore db doc output dir * docs: add github pages mention in readme * refactor: update README.md Co-authored-by: Basile Spaenlehauer <spaenleh@gmail.com> --------- Co-authored-by: kim <kim.phanhoang@epfl.ch> Co-authored-by: Basile Spaenlehauer <spaenleh@gmail.com>
1 parent d079383 commit daf1fb6

7 files changed

Lines changed: 1396 additions & 47 deletions

File tree

.github/workflows/deploy-db-doc.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -38,10 +38,10 @@ jobs:
3838
restore-keys: |
3939
${{ runner.os }}-yarn-
4040
41-
# - name: yarn install and generate docs
42-
# run: |
43-
# yarn
44-
# yarn db-doc:generate
41+
- name: yarn install and generate docs
42+
run: |
43+
yarn
44+
yarn db-doc:generate
4545
4646
- name: Upload artifact
4747
uses: actions/upload-pages-artifact@v3

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,3 +60,5 @@ secret-key
6060
# Generated
6161
openapi.json
6262
vacuum-report.html
63+
schema.sql
64+
db-documentation

DATABASE.md

Lines changed: 237 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,237 @@
1+
# Database Structure Guide
2+
3+
## Core Concept: Items & Hierarchies
4+
5+
The heart of this system is **Items**, they are containers for content. Items have a `type` to describe which kind of item they are. Items can be:
6+
7+
- **Folders** (containers that hold other items)
8+
- **Documents** (text based resources)
9+
- **Links** (references to external resources)
10+
- **Files** (images, pdf, etc)
11+
- **H5P** (references to external resources)
12+
- **App** (references to an application)
13+
- **Etherpad** (references to collaborative real-time text editor)
14+
15+
Items are organized in a **tree structure**, similar to folders on your computer:
16+
17+
```
18+
My Workspace
19+
├── Project A (#id-1)
20+
│ ├── Document 1 (#id-2)
21+
│ └── Document 2 (#id-3)
22+
└── Project B (#id-4)
23+
└── Presentation (#id-5)
24+
```
25+
26+
Each item has a `path` that shows its location in this hierarchy based on ids, like: `#id_1.#id_2`. Notice `-` are transformed into `_` in a `path`. For simplification this example uses readable ids, but the database structure uses UUID4 for identifiers (ie. `8a12ce1e-c58c-47a6-8ba2-742e0813d4ef`)
27+
28+
Core information about each item:
29+
30+
- `id`: Unique identifier
31+
- `name`: What it's called
32+
- `type`: Document, folder, link, etc.
33+
- `path`: Location in the hierarchy
34+
- `created_at` / `updated_at`: Timestamps
35+
- `created_by`: The member who created it
36+
- `deleted_at`: When it was deleted (`null` if not deleted) ← _Important for tracking deletions_
37+
38+
---
39+
40+
## Apps & Integrations
41+
42+
### Apps
43+
44+
Third-party or built-in applications that users can add to their folders. An app is composed of a name, description, URL, icon, thumbnail, publisher information, and configuration settings.
45+
46+
These curated apps integrate with the Graasp API using a configured API `key`, and can save user-specific data in three forms: `data` for information apps want to remember, `setting` for per-item configurations, and `action` for tracking usage. They are described in the sections below.
47+
48+
### App Data
49+
50+
App Data stores custom information that applications want to remember for each user, with visibility rules that control whether it's accessible to the account owner, the creator, or other users, and it's always tied to both a specific item and account.
51+
52+
### App Settings
53+
54+
App Settings provide per-item configuration that creators can customize and store as JSON data. These settings are available to all users accessing that item, but only administrators can edit them.
55+
56+
### App Actions
57+
58+
App Actions track when apps are used, recording which app was involved, what the user did with it, which item it happened on, and any custom data specific to that app interaction.
59+
60+
---
61+
62+
## Content Management
63+
64+
### Recycled Items
65+
66+
When items are deleted, they aren't actually removed from the database. Instead, a `deleted_at` timestamp marks the entire hierarchy as deleted, and one record in `recycled_item_data` is created for each deleted root item, allowing users to recover deleted content from the recycle bin.
67+
68+
Recycled items older than 3 months are scheduled to be automatically deleted.
69+
70+
### Item Visibility
71+
72+
The visibility of an item defines how it controls the access to it. There are 3 states:
73+
74+
- **Public**: Visible to anyone with the link
75+
- **Hidden**: Not shown for readers
76+
- **Private** (= no record): Only visible to people with explicit memberships
77+
78+
You can read more about memberships and access control under "Access & Permissions".
79+
80+
### Likes & Bookmarks
81+
82+
The `item_like` table records when someone "likes" an item.
83+
84+
The bookmarks (legacy name being `favorite` but still uses the table named `item_favorite`) tracks when someone saves an item to their bookmarks and have a quick access to it.
85+
86+
### Published Items
87+
88+
Published items are referenced in the library. An item is published if itself or its parent has a record in `published_items`. Its visibility should also be set to `public`. They can be unpublished at any time by the admins of the item.
89+
90+
### Publication Removal Notices
91+
92+
When administrators unpublish content from the admin panel, the system creates removal notices that record the reason for unpublishing and the date it occurred.
93+
94+
---
95+
96+
## Account Management
97+
98+
### Accounts (Members & Guests)
99+
100+
The system has two types of accounts, "members" and "guests".
101+
102+
#### Individual Members
103+
104+
Individual members are real people using the platform with an email address, a profile (including bio and avatar), and the ability to create items and collaborate with others. Their account `type` is set to `individual`.
105+
106+
#### Guests
107+
108+
Guest accounts are temporary accounts created for specific items, typically used to grant access without requiring full platform membership. Guests have limited permissions and may require passwords to access specific items. Their account `type` is set to `guest`.
109+
110+
**Related tables:**
111+
112+
- `guest_password`: Passwords for guest accounts to access a specific item
113+
- `member_password`: Stores encrypted passwords for members
114+
- `member_profile`: Additional info like bio, avatar, preferences
115+
116+
---
117+
118+
## Access & Permissions
119+
120+
### Item Memberships
121+
122+
Controls who has access to an item. They are three levels:
123+
124+
1. **Admin**: Can manage content and share access
125+
1. **Write**: Can create and edit content
126+
1. **Read**: Can view only
127+
128+
Each person can have different permission levels on an item hierarchy. From less permissive to most permissive, and the closest permission takes precedence.
129+
130+
For example if we have the following path `A.B.C` it is allowed to set a membership permission at each level for user Alice as followed:
131+
132+
- A: read
133+
- B: write
134+
- C: admin
135+
136+
So Alice can only read `A`, but is an admin for `C`.
137+
138+
### Membership Requests
139+
140+
When someone wants to join an item but doesn't have access yet, they create a request. Admins can approve or deny these requests.
141+
142+
### Invitations
143+
144+
Admins can invite people by email to access an item. The invitation includes:
145+
146+
- Email address of the person being invited
147+
- Permission level they'll receive
148+
149+
### Short Links
150+
151+
Short Links are easy-to-share shortened URLs that point to specific items, redirecting users to the appropriate platform (builder, player, or library) and tracking who created the link and when.
152+
153+
### Item Login Schemas
154+
155+
When the item is private, it requires users to log in before accessing them. Additionally, an item can also allow "pseudonymized" login. If enabled, it will accept usernames, with or without passwords to access the item.
156+
157+
When someone logs in with these credentials, the system automatically creates a guest account linked to that login method. If the same person logs in again with the same credentials, their guest account is reused rather than creating a new one. This allows controlled access to specific items without requiring a full platform account.
158+
159+
Deleting an item login schema will delete all its related guests.
160+
161+
---
162+
163+
## Collaboration & Interaction
164+
165+
### Chat Messages
166+
167+
Each item has its own chatbox where users can have discussions. Each message records who wrote it (`creator_id`), when it was written (`created_at`), the message content (`body`), and which item it belongs to.
168+
169+
### Chat Mentions
170+
171+
Chat Mentions track when someone is mentioned in a chat message (@username), recording who was mentioned, whether they've read the mention, and when it was created.
172+
173+
---
174+
175+
## Item Publication
176+
177+
### Categories & Tags
178+
179+
Items can be organized with metadata that's especially useful for published content. Tags are user-defined and manually added to items, organized in categories like discipline, resource type, or level. Multiple tags can be assigned to each item to help with searching and filtering.
180+
181+
### Validation & Quality Control
182+
183+
Before publication, items are validated through automated checks including image validation. Each validation has a status of `pending`, `success`, or `failed`.
184+
185+
For each publication request, a validation group is created that references multiple validation records. If any validation fails, the item cannot be published.
186+
187+
The review table exists for future use but isn't currently active.
188+
189+
---
190+
191+
## Exports & Data Downloads
192+
193+
### Item Export Requests
194+
195+
When someone wants to download an item's content, the system records who made the request, which item or sub-tree of items they're exporting, the desired format (JSON, CSV, etc.), and when the request was made.
196+
197+
### Action Request Exports
198+
199+
Action Request Exports allow downloads of activity and action logs for user analytics and behavior tracking, capturing all interactions within a specified date range.
200+
201+
---
202+
203+
## Additional Item features
204+
205+
### Item Flags
206+
207+
Item Flags allow users to report problems with items, recording who reported it, the reason (spam, inappropriate content, etc.), and the status of the report (reviewed, resolved, etc.).
208+
209+
This feature is available but not heavily used currently.
210+
211+
### Geolocation
212+
213+
Items can optionally store location information, allowing them to be browsed and discovered on a map interface.
214+
215+
### Page Updates (Beta)
216+
217+
Items with type "page" use an alternative collaborative, real-time content editor. Updates are recorded in a dedicated table, and page content is built from these incremental updates. Retrieving complete page content requires reconstructing it from individual update records and a special library.
218+
219+
---
220+
221+
## Maintenance Notices
222+
223+
When important updates or critical maintenance is needed, the team schedules maintenance windows to inform users in advance. The system records the scheduled downtime period with start and end times, and can display messages to users about the maintenance.
224+
225+
---
226+
227+
## Actions (Analytic traces)
228+
229+
Actions are what users do, captured for analytics and displayed in dashboard graphics. Each action record includes the type of action, geographic location (if the user has allowed location tracking), when it occurred, which item was involved (if applicable), and which account performed it.
230+
231+
The system currently records a set of standard actions, but this list may be extended in the future.
232+
233+
---
234+
235+
## Need More Help?
236+
237+
Each table has detailed column information available. For specific analysis needs, refer to the technical schema documentation or contact the data team.

README.md

Lines changed: 16 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,6 @@ In order to run the Graasp backend, it requires:
3131
- [Postman](https://www.postman.com) : Application to explore and test your APIs.
3232
- [Starship](https://starship.rs/): A shell prompt enhancer that shows you the current git branch nvm version and package version, very useful for quick look at your environment (works on all shells and is super fast), requires you to use a [NerdFont](https://www.nerdfonts.com/)
3333
- [VS Code](https://code.visualstudio.com) : IDE to manage the database and make changes to the source code.
34-
3534
- [Remote-Containers](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers) : A extension for VS Code. It allows to easily setup the dev environment.
3635

3736
- [SQLTools](https://marketplace.visualstudio.com/items?itemName=mtxr.sqltools) : A extension for VS Code. It allows easy access to the database.
@@ -66,7 +65,7 @@ This will create 11 containers :
6665
> To use garage with the Docker installation, it is necessary to edit your `/etc/hosts` with the following line `127.0.0.1 .s3.garage.localhost`. This is necessary because the backend creates signed urls pointing to this subdomain. Without changing the hosts, the development machine cannot resolve urls like `http://s3.garage.localhost:3900` .
6766
6867
> **Troubleshoot**
69-
> If during setup of the devcontainer you get an error like `nudenet Error pull access denied for public.ecr.aws/g...`
68+
> If during setup of the devcontainer you get an error like `nudenet Error pull access denied for public.ecr.aws/g...`
7069
> This can occure if you previously logged in to the public ECR. When you want to pull from the public ECR, you should be unauthenticated. Simply run the following on you host: `docker logout public.ecr.aws`. It will log you out of the public ECR and you should be able to rebuild the containers without issue. If it persissts please [open an issue](https://github.com/graasp/graasp/issues/new?title=NudeNet%20DevContainer%20Docker%20Install%20Issue)
7170
7271
Then install the required npm packages with `yarn install`. You should run this command in the docker's terminal, because some packages are built depending on the operating system (eg. `bcrypt`).
@@ -235,6 +234,7 @@ You will need to configure the garage instance so you can use the s3 buckets wit
235234
To simplify the commands you can create an alias to the docker exec command:
236235

237236
Run this on the host machine
237+
238238
```sh
239239
# get the container name for the garage service
240240
docker ps
@@ -243,17 +243,17 @@ docker ps
243243
alias garage="docker exec -it <container-name> /garage"
244244
```
245245

246-
You should now be able to run commands against the garage executable running inside the container. Check that it works by running:
246+
You should now be able to run commands against the garage executable running inside the container. Check that it works by running:
247247

248248
```sh
249249
garage status
250250
```
251251

252-
You should see an output similar to:
252+
You should see an output similar to:
253253

254254
```
255-
2025-09-11T05:42:45.393828Z INFO garage_net::netapp: Connected to 127.0.0.1:3901, negotiating handshake...
256-
2025-09-11T05:42:45.436392Z INFO garage_net::netapp: Connection established to fca7df6b0fe8115c
255+
2025-09-11T05:42:45.393828Z INFO garage_net::netapp: Connected to 127.0.0.1:3901, negotiating handshake...
256+
2025-09-11T05:42:45.436392Z INFO garage_net::netapp: Connection established to fca7df6b0fe8115c
257257
==== HEALTHY NODES ====
258258
ID Hostname Address Tags Zone Capacity DataAvail Version
259259
fca7df6b0fe8115c garage 127.0.0.1:3901 [] dc1 1000.0 MB 365.8 GB (36.8%) v2.0.0
@@ -264,12 +264,14 @@ fca7df6b0fe8115c garage 127.0.0.1:3901 [] dc1 1000.0 MB 365.8 GB (36.8%
264264
Now for the real configuration part.
265265

266266
We will:
267+
267268
- setup the layout for the storage (this is required by garage to know how it allocates the capacity)
268269
- create the file-items bucket (h5p bucket can be configured too, guide does not do it currently)
269270
- create an access key for the bucket
270271
- make the correct configurations to be able to access the bucket
271272

272273
Layout setup
274+
273275
```sh
274276
# get the node id
275277
garage status
@@ -282,6 +284,7 @@ garage layout apply --version 1
282284
```
283285

284286
Create a bucket
287+
285288
```sh
286289
garage bucket create file-items
287290

@@ -291,16 +294,14 @@ garage bucket info file-items
291294
```
292295

293296
Create an access key. Make not of the secret key as it will not be shown again !
297+
294298
```sh
295299
garage key create core-s3-key
296300

297301
# allow the key to access the bucket
298302
garage bucket allow --read --write --owner file-items --key core-s3-key
299303
```
300304

301-
302-
303-
304305
### Umami
305306

306307
To log into umami in your local instance: [Umami login documentation](https://umami.is/docs/login)
@@ -317,7 +318,7 @@ You can also run `yarn seed` to feed the database with predefined mock data.
317318

318319
The development [docker-compose.yml](.devcontainer/docker-compose.yml) provides an instance of [mailcatcher](https://mailcatcher.me/), which emulates a SMTP server for sending e-mails. When using the email authentication flow, the mailbox web UI is accessible at [http://localhost:1080](http://localhost:1080).
319320

320-
The development [docker-compose.yml](.devcontainer/docker-compose.yml) provides a [s3-compatible service](https://garagehq.deuxfleurs.fr/) for serving files. Ensure you have setup your /etc/hosts so that it works.
321+
The development [docker-compose.yml](.devcontainer/docker-compose.yml) provides a [s3-compatible service](https://garagehq.deuxfleurs.fr/) for serving files. Ensure you have setup your /etc/hosts so that it works.
321322

322323
## Testing
323324

@@ -332,7 +333,9 @@ This will ensure your tests run on the second database container. As they will c
332333

333334
## Database and Migrations
334335

335-
The application will run migrations on start.
336+
By default, the application will run migrations on start.
337+
338+
For more information about the structure of the database, see [our database structure documentation](./DATABASE.md) as well as [the interactive database schema explorer](https://graasp.github.io/graasp).
336339

337340
### Create a migration
338341

@@ -365,13 +368,14 @@ Up tests start from the previous migration state, insert mock data and apply the
365368

366369
### Nudenet Container can not be pulled
367370

368-
It is possible that the nudenet container pull fails with a 403 status code. This is likely because you are authenticated to the public AWS ECR and trying to pull a public image. Log out of the public ECR with `docker logout public.ecr.aws` and try building the devContainer again.
371+
It is possible that the nudenet container pull fails with a 403 status code. This is likely because you are authenticated to the public AWS ECR and trying to pull a public image. Log out of the public ECR with `docker logout public.ecr.aws` and try building the devContainer again.
369372

370373
### Uploading files results in "AuthorizationHeaderMalformed: Authorization header malformed, unexpected scope"
371374

372375
This upload error occurs when we try to upload a file to s3 (mocked by garage on local dev setup).
373376

374377
You need to check that you:
378+
375379
- have access and secret keys in your env
376380
- have set the region to the same value as the ".devcontainer/garage/garage.toml" file (look under the `s3.api` section for the `s3_region` value.) By default it should be `garage` and not `us-east-1`. Update the value in your `.env.development` file.
377381

db-documentation/index.html

Lines changed: 0 additions & 11 deletions
This file was deleted.

0 commit comments

Comments
 (0)