Skip to content

Commit 9382b0f

Browse files
authored
chore: update seed data and add instructions (#541)
Signed-off-by: tdruez <tdruez@aboutcode.org>
1 parent c338408 commit 9382b0f

2 files changed

Lines changed: 142 additions & 0 deletions

File tree

data/postgresql/README.md

Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
# Generate `initdb.sql.gz` (shareable initial seed)
2+
3+
This guide produces a compressed SQL dump (`initdb.sql.gz`) that Postgres loads
4+
automatically on first startup (via `/docker-entrypoint-initdb.d/`). It uses plain
5+
`docker` commands only, with no extra compose file, and never touches your local stack.
6+
7+
## Prerequisites: Dump the reference data from server
8+
9+
Run against the reference instance:
10+
11+
```sh
12+
docker compose -f /opt/dejacode/docker-compose.yml exec web ./manage.py dumpinitdata nexB > initdb_dataset.json
13+
```
14+
15+
## Steps
16+
17+
Run the following on your local DejaCode checkout.
18+
19+
### 1. Fetch the initdb_dataset.json file from the server
20+
21+
### 2. Build web image from the project root
22+
23+
```sh
24+
cd dejacode
25+
docker build -t dejacode-web .
26+
```
27+
28+
### 3. Create a network and start the empty database
29+
30+
The container has no named volume (ephemeral) and is reachable as `db` on the network.
31+
32+
```sh
33+
docker network create dejacode-seed-net
34+
35+
docker run -d \
36+
--name dejacode-seed-db \
37+
--network dejacode-seed-net \
38+
--network-alias db \
39+
--env-file docker.env \
40+
--shm-size=1g \
41+
docker.io/library/postgres:16.13
42+
```
43+
44+
### 4. Apply migrations on the fresh database
45+
46+
```sh
47+
docker run --rm \
48+
--network dejacode-seed-net \
49+
--env-file docker.env \
50+
-v "$(pwd)/.env:/opt/dejacode/.env" \
51+
-v /etc/dejacode/:/etc/dejacode/ \
52+
dejacode-web ./manage.py migrate
53+
```
54+
55+
### 5. Load the data from stdin
56+
57+
`loaddata` reads the JSON from standard input, so there is no file to mount.
58+
The `-i` flag keeps stdin open for the redirection.
59+
60+
```sh
61+
docker run --rm -i \
62+
--network dejacode-seed-net \
63+
--env-file docker.env \
64+
-v "$(pwd)/.env:/opt/dejacode/.env" \
65+
-v /etc/dejacode/:/etc/dejacode/ \
66+
dejacode-web ./manage.py loaddata --format=json - < initdb_dataset.json
67+
```
68+
69+
This will take over 10 minutes to run.
70+
71+
### 6. Inspect and tweak
72+
73+
```sh
74+
docker run --rm -it \
75+
--network dejacode-seed-net \
76+
--env-file docker.env \
77+
-v "$(pwd)/.env:/opt/dejacode/.env" \
78+
-v /etc/dejacode/:/etc/dejacode/ \
79+
dejacode-web ./manage.py shell
80+
```
81+
82+
```python
83+
dataspace = Dataspace.objects.get_reference()
84+
values = {
85+
"homepage_url": "",
86+
"contact_info": "",
87+
"notes": "",
88+
"logo_url": "",
89+
"address": "",
90+
"open_source_information_url": "",
91+
"open_source_download_url": "",
92+
# "home_page_announcements": "",
93+
"show_license_profile_in_license_list_view": True,
94+
"show_spdx_short_identifier_in_license_list_view": True,
95+
"show_usage_policy_in_user_views": True,
96+
"show_type_in_component_list_view": False,
97+
"hide_empty_fields_in_component_details_view": True,
98+
"set_usage_policy_on_new_component_from_licenses": True,
99+
"enable_package_scanning": True,
100+
"update_packages_from_scan": True,
101+
"enable_purldb_access": True,
102+
"enable_vulnerablecodedb_access": True,
103+
"vulnerabilities_updated_at": None,
104+
}
105+
Dataspace.objects.filter(id=dataspace.id).update(**values)
106+
dataspace.set_configuration("homepage_layout", None)
107+
dataspace.set_configuration("vulnerablecode_url", "https://public.vulnerablecode.io/")
108+
dataspace.set_configuration("purldb_url", "https://public.purldb.io/")
109+
```
110+
111+
### 7. Extract the compressed SQL dump
112+
113+
`pg_dump` runs inside the container (Postgres 16.13). `--no-owner --no-privileges`
114+
makes the dump replayable even if another deployment uses a different `POSTGRES_USER`.
115+
116+
```sh
117+
docker exec dejacode-seed-db \
118+
sh -c 'pg_dump -U "$POSTGRES_USER" -d "$POSTGRES_DB" --no-owner --no-privileges' \
119+
| gzip > data/postgresql/initdb.sql.gz
120+
```
121+
122+
### 8. Verify the seed
123+
124+
The `initdb.d` scripts run **only on a fresh db volume**. To test the artifact without
125+
breaking your local setup, start the main stack with a throwaway project name and a new
126+
volume:
127+
128+
```sh
129+
docker compose -p dejacode-check up -d db
130+
# check the data is present, then:
131+
docker compose -p dejacode-check down -v
132+
```
133+
134+
### 9. Tear everything down
135+
136+
```sh
137+
docker rm -fv dejacode-seed-db
138+
docker network rm dejacode-seed-net
139+
```
140+
141+
`-v` removes the anonymous volume created by the Postgres image. Your local `dejacode`
142+
stack was never touched.

data/postgresql/initdb.sql.gz

1.25 MB
Binary file not shown.

0 commit comments

Comments
 (0)