Skip to content

Commit cdc2714

Browse files
committed
Remove outdated module 1 introduction files and update configuration to exclude private directories from the build.
1 parent c04ecd2 commit cdc2714

122 files changed

Lines changed: 3504 additions & 37 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.vscode/settings.json

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
{
2+
"python.analysis.typeCheckingMode": "basic"
3+
}

_config.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,8 @@ exclude:
5858
- Dockerfile
5959
# theme test code
6060
- fixtures/
61+
- _private
62+
- _private/**
6163

6264
# Set a path/url to a logo that will be displayed instead of the title
6365
logo: "assets/images/dtc-logo.png"

_private/vocab-ml-module-01

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
2+
3+
## Glossary lesson 1
4+
- Features: All the known characteristics or pieces of information about an object that are used as input to a machine learning model to make a prediction. (e.g., car's age, manufacturer, mileage).
5+
- Target: The specific variable or value that a machine learning model aims to predict. (e.g., the price of a car).
6+
- Model: The output of the machine learning training process; it's an algorithm or system that has learned patterns from data and can be used to make predictions on new, unseen data.
7+
- Prediction: The output generated by a trained machine learning model when it is given new input features, representing the model's estimate for the target variable.
8+
- Machine Learning (ML): A process of extracting patterns and insights from data, allowing a system to learn and make predictions or decisions without being explicitly programmed for every specific outcome.
9+
- Data: Raw facts, figures, or information collected for analysis. In machine learning, it refers to the collection of features and targets used for training and testing models.
10+
- Patterns: Regular and discernible relationships or structures within data that machine learning algorithms identify and learn from.
11+
- Rule-based Systems: Traditional programming approaches where a system makes decisions based on a predefined set of explicit rules, often contrasting with the pattern-learning approach of machine learning.
12+
- Training: The process of feeding data (features and corresponding targets) to a machine learning algorithm so that it can learn the underlying patterns and create a model.
13+
14+
## Glossary lesson 2
15+
16+
- Classifier: A system or algorithm that categorizes input into one of several classes (e.g., spam/not spam).
17+
- Features: Measurable characteristics or attributes of the data used by the model (e.g., email length, sender domain).
18+
- Target Variable: The variable that the model is trying to predict or classify (e.g., whether an email is spam or not).
19+
- Training/Fitting: The process of feeding data to a machine learning algorithm so it can learn patterns and create a model.
20+
- Model: The output of the training process; a learned representation that can make predictions on new data.
21+
- Probability: The likelihood of an event occurring, often output by ML models before a final classification decision.
22+
- Decision Threshold: A predefined value used to convert a model's probability output into a definitive classification (e.g., >0.5 means spam).
23+
- Supervised Learning: A type of machine learning where the model learns from labeled data (inputs with known correct outputs). Spam detection and car price prediction are examples.

create_md_files.py

Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
import os
2+
import csv
3+
import re
4+
from urllib.parse import urlparse, parse_qs
5+
6+
# Function to extract YouTube video ID from URL
7+
def extract_video_id(url):
8+
parsed_url = urlparse(url)
9+
if parsed_url.netloc == 'www.youtube.com' or parsed_url.netloc == 'youtube.com':
10+
if parsed_url.path == '/watch':
11+
return parse_qs(parsed_url.query)['v'][0]
12+
elif 'embed' in parsed_url.path:
13+
return parsed_url.path.split('/')[-1]
14+
elif parsed_url.netloc == 'youtu.be':
15+
return parsed_url.path[1:]
16+
return None
17+
18+
# Function to create folder if it doesn't exist
19+
def create_folder_if_not_exists(folder_path):
20+
if not os.path.exists(folder_path):
21+
os.makedirs(folder_path)
22+
print(f"Created folder: {folder_path}")
23+
24+
# Function to format the title from filename
25+
def format_title(filename):
26+
# Remove file extension
27+
name = os.path.splitext(filename)[0]
28+
# Extract the number prefix if it exists
29+
num_prefix = re.search(r'^(\d+)-', name)
30+
prefix = ""
31+
if num_prefix:
32+
prefix = f"{num_prefix.group(1)}. "
33+
# Remove leading numbers and hyphens (like 01-)
34+
name = re.sub(r'^\d+-', '', name)
35+
# Replace hyphens with spaces
36+
name = name.replace('-', ' ')
37+
# Capitalize words
38+
return prefix + ' '.join(word.capitalize() for word in name.split())
39+
40+
# Function to format module name
41+
def format_module_name(folder):
42+
# Remove leading numbers (like 01)
43+
module_num = re.search(r'^(\d+)', folder)
44+
if module_num:
45+
module_number = module_num.group(1)
46+
# Remove the hyphen if it exists
47+
module_name = re.sub(r'^\d+-', '', folder)
48+
# Replace hyphens with spaces
49+
module_name = module_name.replace('-', ' ')
50+
# Capitalize words
51+
module_name = ' '.join(word.capitalize() for word in module_name.split())
52+
return f"Module {module_number}: {module_name}"
53+
else:
54+
# If no leading number, just capitalize
55+
return ' '.join(word.capitalize() for word in folder.replace('-', ' ').split())
56+
57+
# Function to create index.md file for a module
58+
def create_index_file(module_folder, module_name):
59+
index_path = os.path.join(module_folder, 'index.md')
60+
# Extract module number for nav_order
61+
module_num = re.search(r'Module (\d+)', module_name)
62+
nav_order = int(module_num.group(1)) if module_num else 1
63+
64+
with open(index_path, 'w') as f:
65+
f.write(f"""---
66+
title: "{module_name}"
67+
nav_order: {nav_order}
68+
has_children: true
69+
---
70+
71+
# {module_name}
72+
73+
This module covers the following topics:
74+
75+
""")
76+
print(f"Created index file: {index_path}")
77+
78+
# Function to create markdown file for a video
79+
def create_md_file(module_folder, filename, video_url, module_name, order):
80+
md_path = os.path.join(module_folder, filename)
81+
title = format_title(filename)
82+
video_id = extract_video_id(video_url)
83+
84+
# Extract file number for nav_order if it exists
85+
file_num = re.search(r'^(\d+)', os.path.splitext(filename)[0])
86+
nav_order = int(file_num.group(1)) if file_num else order
87+
88+
with open(md_path, 'w') as f:
89+
f.write(f"""---
90+
title: "{title}"
91+
parent: "{module_name}"
92+
nav_order: {nav_order}
93+
---
94+
95+
<iframe width="560" height="315" src="https://www.youtube.com/embed/{video_id}" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
96+
""")
97+
print(f"Created markdown file: {md_path}")
98+
99+
# Main function
100+
def main():
101+
base_dir = 'docs/machine-learning-zoomcamp'
102+
csv_file = 'youtube_links.csv'
103+
104+
# Read CSV file
105+
with open(csv_file, 'r') as f:
106+
reader = csv.reader(f)
107+
next(reader) # Skip header row
108+
109+
# Group by folder
110+
folder_files = {}
111+
for row in reader:
112+
folder, filename, video_url = row
113+
if folder not in folder_files:
114+
folder_files[folder] = []
115+
folder_files[folder].append((filename, video_url))
116+
117+
# Create folders and files
118+
for folder, files in folder_files.items():
119+
module_name = format_module_name(folder)
120+
module_folder = os.path.join(base_dir, folder)
121+
122+
# Create module folder
123+
create_folder_if_not_exists(module_folder)
124+
125+
# Create index.md for the module
126+
create_index_file(module_folder, module_name)
127+
128+
# Create markdown files for each video
129+
for i, (filename, video_url) in enumerate(files):
130+
create_md_file(module_folder, filename, video_url, module_name, i + 1)
131+
132+
if __name__ == "__main__":
133+
main()

docs/machine-learning-zoomcamp/module-01-introduction/what-is-machine-learning.md renamed to docs/machine-learning-zoomcamp/01-intro/01-what-is-ml.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
2-
title: What is Machine Learning?
3-
parent: "Module 1: Introduction to Machine Learning"
2+
title: "01. What Is Ml"
3+
parent: "Module 01: Intro"
44
nav_order: 1
55
---
66

docs/machine-learning-zoomcamp/module-01-introduction/machine-learning-vs-rule-based-systems.md renamed to docs/machine-learning-zoomcamp/01-intro/02-ml-vs-rules.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
2-
title: "Machine Learning vs Rule-Based Systems: What is the Difference?"
3-
parent: "Module 1: Introduction to Machine Learning"
2+
title: "02. Ml Vs Rules"
3+
parent: "Module 01: Intro"
44
nav_order: 2
55
---
66

docs/machine-learning-zoomcamp/module-01-introduction/supervised-machine-learning.md renamed to docs/machine-learning-zoomcamp/01-intro/03-supervised-ml.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
2-
title: "What is Supervised Machine Learning?"
3-
parent: "Module 1: Introduction to Machine Learning"
2+
title: "03. Supervised Ml"
3+
parent: "Module 01: Intro"
44
nav_order: 3
55
---
66

@@ -106,4 +106,4 @@ Examples include:
106106

107107
## Next Steps
108108

109-
In the next lesson, we'll explore the bigger picture of organizing machine learning projects and discuss a methodology called CRISP-DM (Cross-Industry Standard Process for Data Mining), which provides a structured approach to planning and executing machine learning projects.
109+
In the next lesson, we'll explore the bigger picture of organizing machine learning projects and discuss a methodology called CRISP-DM (Cross-Industry Standard Process for Data Mining), which provides a structured approach to planning and executing machine learning projects.

docs/machine-learning-zoomcamp/module-01-introduction/crisp-dm.md renamed to docs/machine-learning-zoomcamp/01-intro/04-crisp-dm.md

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,10 @@
11
---
2-
title: "What is CRISP-DM?"
3-
parent: "Module 1: Introduction to Machine Learning"
2+
title: "04. Crisp Dm"
3+
parent: "Module 01: Intro"
44
nav_order: 4
55
---
6-
76
# What is CRISP-DM?
87

9-
> These notes are based on the video [ML Zoomcamp 1.4 - CRISP-DM](https://youtu.be/dCa3JvmJbr0?si=QixEZxWzDeCnSvCq)
10-
118
<iframe width="560" height="315" src="https://www.youtube.com/embed/dCa3JvmJbr0?si=QixEZxWzDeCnSvCq" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
129

1310
CRISP-DM stands for **Cross-Industry Standard Process for Data Mining**, a methodology for organizing machine learning projects. Despite being developed in the 1990s by IBM, it has stood the test of time and remains relevant for modern ML projects with minimal modifications.

docs/machine-learning-zoomcamp/module-01-introduction/model-selection-process.md renamed to docs/machine-learning-zoomcamp/01-intro/05-model-selection.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
2-
title: "Model Selection Process"
3-
parent: "Module 1: Introduction to Machine Learning"
2+
title: "05. Model Selection"
3+
parent: "Module 01: Intro"
44
nav_order: 5
55
---
66

docs/machine-learning-zoomcamp/module-01-introduction/github-codespaces.md renamed to docs/machine-learning-zoomcamp/01-intro/06-github-codespaces.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: "GitHub Codespaces"
2+
title: "1.6. GitHub Codespaces"
33
parent: "Module 1: Introduction to Machine Learning"
44
nav_order: 6
55
---

0 commit comments

Comments
 (0)