Skip to content

Commit 89c39ab

Browse files
authored
Merge pull request #4 from DataTalksClub/first-steps
First steps
2 parents f7e5f4b + 3cd4d98 commit 89c39ab

43 files changed

Lines changed: 4883 additions & 869 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

_config.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -146,7 +146,7 @@ liquid:
146146
back_to_top: true
147147
back_to_top_text: "Back to top"
148148

149-
footer_content: 'Copyright &copy; 2017-2020 Patrick Marsceill. Distributed by an <a href="https://github.com/just-the-docs/just-the-docs/tree/main/LICENSE.txt">MIT license.</a> <a href="https://www.netlify.com/">This site is powered by Netlify.</a>'
149+
footer_content: ''
150150

151151
# Footer last edited timestamp
152152
last_edit_timestamp: true # show or hide edit time - page must have `last_modified_date` defined in the frontmatter
@@ -155,7 +155,7 @@ last_edit_time_format: "%b %e %Y at %I:%M %p" # uses ruby's time format: https:/
155155
# Footer "Edit this page on GitHub" link text
156156
gh_edit_link: true # show or hide edit this page link
157157
gh_edit_link_text: "Edit this page on GitHub"
158-
gh_edit_repository: "https://github.com/just-the-docs/just-the-docs" # the github URL for your repo
158+
gh_edit_repository: "https://github.com/DataTalksClub/zoomcamps-notes-faq" # the github URL for your repo
159159
gh_edit_branch: "main" # the branch that your docs is served from
160160
# gh_edit_source: docs # the source that your files originate from
161161
gh_edit_view_mode: "tree" # "tree" or "edit" if you want the user to jump into the editor immediately

_includes/components/sidebar.html

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,5 @@
2525
{% if nav_footer_custom != "" %}
2626
{{ nav_footer_custom }}
2727
{% else %}
28-
<footer class="site-footer">
29-
This site uses <a href="https://github.com/just-the-docs/just-the-docs">Just the Docs</a>, a documentation theme for Jekyll.
30-
</footer>
3128
{% endif %}
3229
</div>
Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
---
2+
title: "General FAQ"
3+
layout: default
4+
nav_order: 100
5+
parent: Data Engineering Zoomcamp
6+
has_children: false
7+
---
8+
9+
# Frequently Asked Questions
10+
11+
## Table of contents
12+
{: .no_toc .text-delta }
13+
14+
1. TOC
15+
{:toc}
16+
17+
## Course Basics
18+
19+
**Q: When does the course start?**
20+
A: The next cohort starts January 13th 2025. Register using [this link](https://airtable.com/shr6oVXeQvSI5HuWD). Join the [Telegram channel](https://t.me/dezoomcamp) and DataTalks.Club's Slack for announcements.
21+
22+
**Q: What are the prerequisites?**
23+
A: You should have:
24+
- Basic coding experience
25+
- Familiarity with SQL
26+
- Experience with Python (helpful but not required)
27+
No prior data engineering experience needed. See [prerequisites](https://github.com/DataTalksClub/data-engineering-zoomcamp/blob/main/README.md#prerequisites).
28+
29+
**Q: How many hours per week should I expect to spend?**
30+
A: Typically 5-15 hours per week, depending on your background and experience.
31+
32+
**Q: Can I join after the start date?**
33+
A: Yes, you can still join and submit homework, but be mindful of assignment deadlines.
34+
35+
**Q: Do I need a confirmation email after registering?**
36+
A: No, you're automatically accepted. Registration is just to gauge interest.
37+
38+
## Technical Setup
39+
40+
**Q: Which Python version should I use?**
41+
A: Python 3.9 is recommended for compatibility with course materials. Python 3.10 and 3.11 should also work.
42+
43+
**Q: Should I use local machine, GCP, or GitHub Codespaces?**
44+
A: You have three options:
45+
1. Local machine (may have challenges on Windows)
46+
2. GitHub Codespaces (pre-installed tools)
47+
3. Cloud VM (GCP recommended)
48+
49+
**Q: Why use GCP instead of other cloud providers?**
50+
A: GCP is recommended because:
51+
- Everyone has a Google account
52+
- $300 free credits for new users
53+
- BigQuery integration
54+
- Consistent with course materials
55+
56+
**Q: What should I set up before starting?**
57+
A: Install and configure:
58+
- Google Cloud account and SDK
59+
- Python 3 (with Anaconda)
60+
- Terraform
61+
- Git
62+
63+
## Homework and Projects
64+
65+
**Q: What are the homework deadlines?**
66+
A: Check [courses.datatalks.club/de-zoomcamp-2025](https://courses.datatalks.club/de-zoomcamp-2025/) and the [Google Calendar](https://calendar.google.com/calendar/?cid=ZXIxcjA1M3ZlYjJpcXU0dTFmaG02MzVxMG9AZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ).
67+
68+
**Q: Are late submissions allowed?**
69+
A: No, but you can submit while the form remains open. Check submission timestamp on the Course page.
70+
71+
**Q: What should I submit as the homework URL?**
72+
A: Your GitHub/GitLab/Bitbucket repository containing your work for that week.
73+
74+
**Q: How does the points system work?**
75+
A: Points are awarded for:
76+
- Homework completion
77+
- FAQ contributions (1 point/week max)
78+
- Learning in public (1 point/link, 7 points max)
79+
80+
## Certificates
81+
82+
**Q: Do I need to complete all homework for the certificate?**
83+
A: No, only the peer-reviewed capstone projects are required.
84+
85+
**Q: Can I get a certificate in self-paced mode?**
86+
A: No, certificates require completing a "live" cohort due to peer review requirements.
87+
88+
**Q: How do I get my certificate?**
89+
A: After project grading:
90+
1. Verify your name in your profile
91+
2. Wait for grading completion announcement
92+
3. Follow [certificate generation instructions](https://github.com/DataTalksClub/data-engineering-zoomcamp/blob/main/certificates.md)
93+
94+
## Course Materials
95+
96+
**Q: Which YouTube playlist should I follow?**
97+
A: The main playlist is [Data Engineering Zoomcamp](https://www.youtube.com/playlist?list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb). Additional playlists for specific years are available.
98+
99+
**Q: Can I follow the course after it finishes?**
100+
A: Yes, all materials remain available. You can:
101+
- Study at your own pace
102+
- Review past homework
103+
- Prepare for next cohort
104+
- Work on projects
105+
106+
**Q: What's different in the current cohort?**
107+
A: 2025 edition uses Kestra instead of MageAI. See the [demo](https://www.youtube.com/watch?v=R0JAFvDCmSY) and [updated materials](https://www.youtube.com/playlist?list=PLEK3H8YwZn1oPPShk2p5k3E9vO-gPnUCf).
108+
109+
## Support and Help
110+
111+
**Q: How do I get help during the course?**
112+
A: Multiple support channels:
113+
- Slack channel (search before asking)
114+
- FAQ documentation
115+
- @ZoomcampQABot for searches
116+
- Office hours via YouTube Live
117+
118+
**Q: How should I ask questions in Slack?**
119+
A: Include:
120+
- Your OS/environment
121+
- Commands you ran
122+
- Error messages (no screenshots)
123+
- What you've tried
124+
- Use code formatting (```)
125+
- Keep discussion in threads
126+
127+
**Q: Will office hours be recorded?**
128+
A: Yes, all sessions are recorded and available shortly after.
129+
130+
## Common Issues
131+
132+
**Q: What if I have trouble with Windows?**
133+
A: Consider using WSL for better compatibility, especially for shell scripts.
134+
135+
**Q: How do I fix VSCode connection issues to GCP VM?**
136+
A: Try managing SSH fingerprints or deleting the known_hosts file.
137+
138+
**Q: How do I open HTML files from WSL?**
139+
A: Install wslu and use `wslview filename.html`
140+
141+
## Additional Resources
142+
143+
**Q: Where can I find more learning materials?**
144+
A: Check [Awesome Data Engineering Resources](https://github.com/DataTalksClub/data-engineering-zoomcamp/blob/main/awesome-data-engineering.md)
145+
146+
**Q: How can I contribute to the course?**
147+
A: You can:
148+
- Star the repository
149+
- Share with others
150+
- Create PRs for improvements
151+
- Update this FAQ
152+
- Help fellow students
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
---
2+
title: FAQ
3+
parent: Final Project
4+
nav_order: 2
5+
---
6+
7+
# Module 6 FAQ
8+
9+
Coming soon
10+
{: .label .label-yellow }
11+
12+
## Contribute
13+
14+
Contribute to this FAQ by transforming [existing FAQ from Google Docs](https://docs.google.com/document/d/19bnYs80DwuUimHM65UV3sylsCn2j1vziPOwzBwQrebw/edit?tab=t.0) to the requred format.
15+
16+
### Contributing FAQ Best Practices
17+
18+
Please include your FAQ in the following format:
19+
20+
```
21+
**Q: Your Question**
22+
23+
`Exact terminal error` if it exists
24+
25+
A: answer
26+
```
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
---
2+
title: Final Project
3+
parent: Data Engineering Zoomcamp
4+
nav_order: 7
5+
---
6+
7+
# Final Project
8+
{: .fs-9 }
9+
10+
The goal of this project is to apply everything we have learned in this course to build an end-to-end data pipeline.
11+
{: .fs-6 .fw-300 }
12+
13+
## Module Materials
14+
15+
[Read more](https://github.com/DataTalksClub/data-engineering-zoomcamp/tree/main/projects){: .btn .btn-purple }
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
---
2+
title: Resources
3+
parent: Final Project
4+
nav_order: 1
5+
---
6+
7+
# Resources
8+
9+
## Datasets
10+
11+
Refer to the provided [datasets](https://github.com/DataTalksClub/data-engineering-zoomcamp/blob/main/dataset.md) for possible selection.
12+
13+
## Helpful Links
14+
15+
* [Unit Tests + CI for Airflow](https://www.astronomer.io/events/recaps/testing-airflow-to-bulletproof-your-code/)
16+
* [CI/CD for Airflow (with Gitlab & GCP state file)](https://engineering.ripple.com/building-ci-cd-with-airflow-gitlab-and-terraform-in-gcp)
17+
* [CI/CD for Airflow (with GitHub and S3 state file)](https://programmaticponderings.com/2021/12/14/devops-for-dataops-building-a-ci-cd-pipeline-for-apache-airflow-dags/)
18+
* [CD for Terraform](https://medium.com/towards-data-science/git-actions-terraform-for-data-engineers-scientists-gcp-aws-azure-448dc7c60fcc)
19+
* [Spark + Airflow](https://medium.com/doubtnut/github-actions-airflow-for-automating-your-spark-pipeline-c9dff32686b)
20+
21+
22+
## Projects Gallery
23+
24+
Explore a collection of projects completed by members of our community. The projects cover a wide range of topics and utilize different tools and techniques. Feel free to delve into any project and see how others have tackled real-world problems with data, structured their code, and presented their findings. It's a great resource to learn and get ideas for your own projects.
25+
26+
[![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://datatalksclub-projects.streamlit.app/)
27+
28+
## DE Zoomcamp 2023
29+
30+
* [2023 Projects](https://github.com/DataTalksClub/data-engineering-zoomcamp/tree/main/cohorts/2023/project.md)
31+
32+
## DE Zoomcamp 2022
33+
34+
* [2022 Projects](https://github.com/DataTalksClub/data-engineering-zoomcamp/tree/main/cohorts/2022/project.md)

0 commit comments

Comments
 (0)