Skip to content

Commit 0223c60

Browse files
Fix FAQ formatting: remove extra blank lines, replace images with text, split entries
- Remove extra blank lines in code blocks (8c9b0fdaa5, 05aad03ef3, b459de9135, 89733da275, a32ed35da6, 7c97a61529) - Add proper indentation in SQL/YAML/bash code blocks (7df3102580, 30622af850, 30dcc71db8, 7f7aa5f5e6) - Replace images with text descriptions (af2b85f346, 5d66421473, 4ae927f8a0) - Remove missing image references (52858dfd98) - Split inline Q&As from aeaede4fd1 into separate FAQ files
1 parent 2256a09 commit 0223c60

18 files changed

Lines changed: 279 additions & 362 deletions

_questions/data-engineering-zoomcamp/module-1/025_30dcc71db8_postgres-docker-volume-backup-restore.md

Lines changed: 47 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -4,30 +4,50 @@ question: How can I back up and restore PostgreSQL data stored in a Docker volum
44
sort_order: 25
55
---
66

7-
- Method 1: Docker volume backup
8-
```bash
9-
# List Docker volumes
10-
docker volume ls
11-
# Backup while the container is running
12-
docker run --rm \
13-
-v ny_taxi_postgres_data:/data \
14-
-v $(pwd):/backup \
15-
ubuntu tar czf /backup/postgres_backup.tar.gz /data
16-
# Restore
17-
docker run --rm \
18-
-v ny_taxi_postgres_data:/data \
19-
-v $(pwd):/backup \
20-
ubuntu tar xzf /backup/postgres_backup.tar.gz -C /
21-
```
22-
- Method 2: Using pg_dump
23-
```bash
24-
# Backup
25-
docker exec -t postgres_container pg_dump -U root -d ny_taxi > ny_taxi_backup.sql
26-
# Restore
27-
docker exec -i postgres_container psql -U root -d ny_taxi < ny_taxi_backup.sql
28-
```
29-
- Method 3: Copying the host directory
30-
```bash
31-
# When using a host-mounted directory in docker-compose.yaml
32-
cp -r ./ny_taxi_postgres_data ./ny_taxi_postgres_data_backup
33-
```
7+
### Method 1: Docker volume backup
8+
9+
List Docker volumes:
10+
11+
```bash
12+
docker volume ls
13+
```
14+
15+
Backup while the container is running:
16+
17+
```bash
18+
docker run --rm \
19+
-v ny_taxi_postgres_data:/data \
20+
-v $(pwd):/backup \
21+
ubuntu tar czf /backup/postgres_backup.tar.gz /data
22+
```
23+
24+
Restore:
25+
26+
```bash
27+
docker run --rm \
28+
-v ny_taxi_postgres_data:/data \
29+
-v $(pwd):/backup \
30+
ubuntu tar xzf /backup/postgres_backup.tar.gz -C /
31+
```
32+
33+
### Method 2: Using `pg_dump`
34+
35+
Backup:
36+
37+
```bash
38+
docker exec -t postgres_container pg_dump -U root -d ny_taxi > ny_taxi_backup.sql
39+
```
40+
41+
Restore:
42+
43+
```bash
44+
docker exec -i postgres_container psql -U root -d ny_taxi < ny_taxi_backup.sql
45+
```
46+
47+
### Method 3: Copying the host directory
48+
49+
When using a host-mounted directory in `docker-compose.yaml`:
50+
51+
```bash
52+
cp -r ./ny_taxi_postgres_data ./ny_taxi_postgres_data_backup
53+
```

_questions/data-engineering-zoomcamp/module-1/060_30622af850_ingest-pipeline-setup.md

Lines changed: 53 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -5,65 +5,78 @@ question: How do I ensure that the ingestion pipeline runs successfully and in w
55
sort_order: 60
66
---
77

8-
Step 1: Create a common network
9-
Ensure that you have created a common network (`pg-network`). This is to ensure that you run several containers in the same network so that they can communicate with each other.
10-
Pg-network is the broader network layer on top of which you will run -
11-
1. Postgres container,
12-
2. the Dockerized script container (the container which you will have your ingestion script)
13-
3. pgadmin container
14-
Command:
8+
### Step 1: Create a common network
9+
10+
Ensure that you have created a common network (`pg-network`). This allows several containers to communicate with each other. On top of this network you will run:
11+
12+
1. Postgres container
13+
2. The Dockerized ingestion script container
14+
3. pgAdmin container
15+
1516
```bash
1617
docker network create pg-network
1718
```
1819

19-
Step 2: Run the Postgres container
20-
Once you’ve created the network, start running each container one-by-one. First, run the Postgres container
20+
### Step 2: Run the Postgres container
21+
22+
Once you’ve created the network, start running each container one by one. First, run the Postgres container:
23+
2124
```bash
2225
docker run -it \
23-
-e POSTGRES_USER="root" \
24-
-e POSTGRES_PASSWORD="root" \
25-
-e POSTGRES_DB="ny_taxi" \
26-
-v ny_taxi_postgres_data:/var/lib/postgresql \
27-
-p 5432:5432 \
28-
--network=pg-network \
29-
--name pgdatabase \
30-
postgres:16
26+
-e POSTGRES_USER="root" \
27+
-e POSTGRES_PASSWORD="root" \
28+
-e POSTGRES_DB="ny_taxi" \
29+
-v ny_taxi_postgres_data:/var/lib/postgresql \
30+
-p 5432:5432 \
31+
--network=pg-network \
32+
--name pgdatabase \
33+
postgres:16
3134
```
32-
Remember, if `postgres:18` causes issues, use `postgres:16` as mentioned above
3335

34-
Step 3: Build the docker container for the pipeline
35-
Ensure your current working directory is /pipeline. Then, run this command to build your container -
36+
If `postgres:18` causes issues, use `postgres:16` as shown above.
37+
38+
### Step 3: Build the Docker container for the pipeline
39+
40+
Ensure your current working directory is `/pipeline`, then build:
41+
3642
```bash
3743
docker build -t taxi_ingest:v001 .
3844
```
3945

40-
Step 4: Run the ingestion container
41-
Ensure your current working directory is /pipeline. Then, run this command to build your container -
46+
### Step 4: Run the ingestion container
47+
4248
```bash
4349
docker run -it \
44-
--network=pg-network \
45-
taxi_ingest:v001 \
46-
--pg_user=root \
47-
--pg_pass=root \
48-
--pg_host=pgdatabase \
49-
--pg_port=5432 \
50-
--pg_db=ny_taxi \
51-
--year=2021 \
52-
--month=1 \
53-
--target_table=yellow_taxi_trips
50+
--network=pg-network \
51+
taxi_ingest:v001 \
52+
--pg_user=root \
53+
--pg_pass=root \
54+
--pg_host=pgdatabase \
55+
--pg_port=5432 \
56+
--pg_db=ny_taxi \
57+
--year=2021 \
58+
--month=1 \
59+
--target_table=yellow_taxi_trips
5460
```
55-
Make sure that you use the parameters in the command exactly same as what you have in your script. For example, if your script as the click parameter `–pg_user` then use `pg_user`, else if it is something like `--user` then change the above command to include `--user` instead of `--pg_user`
5661

57-
Step 5 (Optional): To validate if your records reached the table
58-
To validate if your records really reached the Postgres table, run PGCLI using the following command -
62+
Make sure that you use the parameters in the command exactly as defined in your script. For example, if your script uses `--pg_user` then use `--pg_user`; if it uses `--user` then change the command accordingly.
63+
64+
### Step 5 (Optional): Validate the ingested records
65+
66+
To check if your records reached the Postgres table, run pgcli:
67+
5968
```bash
6069
uv run pgcli -h localhost -p 5432 -u root -d ny_taxi
6170
```
62-
Once inside pgcli, run the following to get table names -
63-
```postgres
71+
72+
List tables:
73+
74+
```sql
6475
\dt
6576
```
66-
Then, to validate how many rows have been ingested, run the following -
67-
```postgres
68-
SELECT COUNT(*) FROM yellow_taxi_trips
77+
78+
Check row count:
79+
80+
```sql
81+
SELECT COUNT(*) FROM yellow_taxi_trips;
6982
```

_questions/data-engineering-zoomcamp/module-1/071_7f7aa5f5e6_pgcli-after-installing-pgcli-and-checking-with-pgc.md

Lines changed: 35 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -7,67 +7,65 @@ sort_order: 71
77

88
This error occurs because psycopg cannot find the PostgreSQL client library (libpq). The simplest solution with uv is to install the binary version of psycopg, which bundles the required library.
99

10-
Solution 1: Add psycopg-binary (Recommended)
11-
```
12-
uv add psycopg-binary
10+
**Solution 1:** Add psycopg-binary (Recommended)
1311

12+
```bash
13+
uv add psycopg-binary
1414
uv run pgcli -h localhost -p 5432 -u root -d ny_taxi
1515
```
1616

17-
Solution 2: Manually edit pyproject.toml
18-
```
17+
**Solution 2:** Manually edit `pyproject.toml`
18+
19+
```toml
1920
[project]
2021
dependencies = [
21-
"pgcli>=4.2.0",
22-
"psycopg-binary>=3.0.0",
22+
"pgcli>=4.2.0",
23+
"psycopg-binary>=3.0.0",
2324
]
2425
```
26+
2527
Then sync your environment:
26-
```
28+
29+
```bash
2730
uv sync
2831
```
2932

30-
Additional troubleshooting steps (not uv-specific) if the issue persists:
31-
32-
1. Check Python Version:
33+
**Additional troubleshooting steps** (not uv-specific) if the issue persists:
3334

34-
```
35-
$ python -V
36-
```
35+
1. Check Python version — ensure Python is at least 3.9:
3736

38-
Ensure Python is at least 3.9. The 'psycopg2-binary' installation may fail on older versions.
37+
```bash
38+
python -V
39+
```
3940

40-
2. Environment Setup (if using a non-uv workflow):
41+
2. Environment setup (if using a non-uv workflow):
4142

42-
```
43-
$ conda create --name de-zoomcamp python=3.9
44-
$ conda activate de-zoomcamp
45-
```
43+
```bash
44+
conda create --name de-zoomcamp python=3.9
45+
conda activate de-zoomcamp
46+
```
4647

47-
3. Install Required Libraries:
48+
3. Install required libraries:
4849

49-
```
50-
pip install psycopg2-binary
51-
pip install psycopg_binary
52-
```
50+
```bash
51+
pip install psycopg2-binary
52+
pip install psycopg_binary
53+
```
5354

5455
4. Upgrade pgcli:
5556

56-
```
57-
pip install --upgrade pgcli
58-
```
57+
```bash
58+
pip install --upgrade pgcli
59+
```
5960

6061
5. Install pgcli via Conda:
6162

62-
```
63-
conda install -c conda-forge pgcli
64-
```
63+
```bash
64+
conda install -c conda-forge pgcli
65+
```
6566

66-
If you still encounter an error like
67-
```
68-
ModuleNotFoundError: No module named 'psycopg2'
69-
```
70-
then try:
71-
```
67+
If you still encounter `ModuleNotFoundError: No module named 'psycopg2'`, try:
68+
69+
```bash
7270
pip install psycopg2-binary
7371
```

_questions/data-engineering-zoomcamp/module-1/076_52858dfd98_postgres-modulenotfounderror-no-module-named-psyco.md

Lines changed: 0 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,5 @@
11
---
22
id: 52858dfd98
3-
images:
4-
- description: 'image #1'
5-
id: image_1
6-
path: images/data-engineering-zoomcamp/image_b7e005cb.png
7-
- description: 'image #2'
8-
id: image_2
9-
path: images/data-engineering-zoomcamp/image_c56a8539.png
103
question: 'Postgres - ModuleNotFoundError: No module named ''psycopg2'''
114
sort_order: 76
125
---
@@ -17,10 +10,6 @@ Issue:
1710
ModuleNotFoundError: No module named 'psycopg2'
1811
```
1912

20-
<IMAGE:image_1>
21-
22-
<IMAGE:image_2>
23-
2413
Solution:
2514

2615
1. Install psycopg2-binary:
Lines changed: 20 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,44 +1,34 @@
11
---
22
id: af2b85f346
3-
images:
4-
- description: 'image #1'
5-
id: image_1
6-
path: images/data-engineering-zoomcamp/image_df9492cb.png
7-
- description: 'image #2'
8-
id: image_2
9-
path: images/data-engineering-zoomcamp/image_6b01ae01.png
10-
- description: 'image #3'
11-
id: image_3
12-
path: images/data-engineering-zoomcamp/image_bc858c4b.png
13-
- description: 'image #4'
14-
id: image_4
15-
path: images/data-engineering-zoomcamp/image_a231d54c.png
16-
- description: 'image #5'
17-
id: image_5
18-
path: images/data-engineering-zoomcamp/image_2f5bf08c.png
193
question: GCP gcloud + MS VS Code - gcloud auth hangs
204
sort_order: 113
215
---
226

23-
If you are using MS VS Code and running `gcloud` in WSL2, when you first try to login to GCP via the `gcloud` CLI with `gcloud auth application-default login`, you may encounter an issue where a message appears and nothing happens:
7+
If you are using MS VS Code and running `gcloud` in WSL2, when you first try to login to GCP via the `gcloud` CLI with `gcloud auth application-default login`, you may encounter an issue where the terminal prints a long OAuth URL and then shows a series of "not found" errors for browsers:
248

25-
<{IMAGE:image_1}>
9+
```
10+
Your browser has been opened to visit:
11+
https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=...
2612
27-
There might be a prompt asking if you want to open it via a browser. If you click on it, it will open a page with an error message:
13+
/usr/bin/xdg-open: 882: x-www-browser: not found
14+
/usr/bin/xdg-open: 882: firefox: not found
15+
/usr/bin/xdg-open: 882: chromium: not found
16+
...
17+
xdg-open: no method available for opening '...'
18+
```
2819

29-
<{IMAGE:image_2}>
20+
VS Code may show a notification: "Your application running on port 8085 is available" with an "Open in Browser" button. Clicking it may lead to an error page.
3021

3122
**Solution:**
3223

33-
- Hover over the long link.
34-
- `Ctrl + Click` the long link.
35-
- Click "Configure Trusted Domains here."
36-
- A popup will appear; pick the first or second entry.
37-
38-
<{IMAGE:image_3}>
39-
40-
<{IMAGE:image_4}>
41-
42-
<{IMAGE:image_5}>
24+
1. Hover over the long OAuth URL in the terminal output.
25+
2. `Ctrl + Click` the link — VS Code will show a dialog: "Do you want Code to open the external website?" with buttons **Open**, **Copy**, **Configure Trusted Domains**, and **Cancel**.
26+
3. Click **Configure Trusted Domains**.
27+
4. A dropdown will appear with options like:
28+
- "Trust https://accounts.google.com"
29+
- "Trust google.com and all its subdomains"
30+
- "Trust all domains (disables link protection)"
31+
- "Manage Trusted Domains"
32+
5. Pick the first or second entry (e.g., "Trust https://accounts.google.com").
4333

4434
Next time you run `gcloud auth`, the login page should pop up via the default browser without issues.
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
---
2+
id: 0b468d3cc5
3+
question: Why use secrets instead of embedding the JSON key in the Kestra task?
4+
sort_order: 11
5+
---
6+
7+
Secrets prevent credential exposure and make workflows easier to manage.

0 commit comments

Comments
 (0)