DataTalksClub
diff --git a/‎.vscode/settings.json‎
Lines changed: 3 additions & 0 deletions b/‎.vscode/settings.json‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎_config.yml‎
Lines changed: 2 additions & 0 deletions b/‎_config.yml‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎_private/vocab-ml-module-01‎
Lines changed: 23 additions & 0 deletions b/‎_private/vocab-ml-module-01‎
Lines changed: 23 additions & 0 deletions
diff --git a/‎create_md_files.py‎
Lines changed: 133 additions & 0 deletions b/‎create_md_files.py‎
Lines changed: 133 additions & 0 deletions
diff --git a/‎…introduction/what-is-machine-learning.md‎ ‎…rning-zoomcamp/01-intro/01-what-is-ml.md‎docs/machine-learning-zoomcamp/module-01-introduction/what-is-machine-learning.md renamed to docs/machine-learning-zoomcamp/01-intro/01-what-is-ml.md
Lines changed: 2 additions & 2 deletions b/‎…introduction/what-is-machine-learning.md‎ ‎…rning-zoomcamp/01-intro/01-what-is-ml.md‎docs/machine-learning-zoomcamp/module-01-introduction/what-is-machine-learning.md renamed to docs/machine-learning-zoomcamp/01-intro/01-what-is-ml.md
Lines changed: 2 additions & 2 deletions
diff --git a/‎…achine-learning-vs-rule-based-systems.md‎ ‎…ning-zoomcamp/01-intro/02-ml-vs-rules.md‎docs/machine-learning-zoomcamp/module-01-introduction/machine-learning-vs-rule-based-systems.md renamed to docs/machine-learning-zoomcamp/01-intro/02-ml-vs-rules.md
Lines changed: 2 additions & 2 deletions b/‎…achine-learning-vs-rule-based-systems.md‎ ‎…ning-zoomcamp/01-intro/02-ml-vs-rules.md‎docs/machine-learning-zoomcamp/module-01-introduction/machine-learning-vs-rule-based-systems.md renamed to docs/machine-learning-zoomcamp/01-intro/02-ml-vs-rules.md
Lines changed: 2 additions & 2 deletions
diff --git a/‎…roduction/supervised-machine-learning.md‎ ‎…ng-zoomcamp/01-intro/03-supervised-ml.md‎docs/machine-learning-zoomcamp/module-01-introduction/supervised-machine-learning.md renamed to docs/machine-learning-zoomcamp/01-intro/03-supervised-ml.md
Lines changed: 3 additions & 3 deletions b/‎…roduction/supervised-machine-learning.md‎ ‎…ng-zoomcamp/01-intro/03-supervised-ml.md‎docs/machine-learning-zoomcamp/module-01-introduction/supervised-machine-learning.md renamed to docs/machine-learning-zoomcamp/01-intro/03-supervised-ml.md
Lines changed: 3 additions & 3 deletions
diff --git a/‎…mcamp/module-01-introduction/crisp-dm.md‎ ‎…earning-zoomcamp/01-intro/04-crisp-dm.md‎docs/machine-learning-zoomcamp/module-01-introduction/crisp-dm.md renamed to docs/machine-learning-zoomcamp/01-intro/04-crisp-dm.md
Lines changed: 2 additions & 5 deletions b/‎…mcamp/module-01-introduction/crisp-dm.md‎ ‎…earning-zoomcamp/01-intro/04-crisp-dm.md‎docs/machine-learning-zoomcamp/module-01-introduction/crisp-dm.md renamed to docs/machine-learning-zoomcamp/01-intro/04-crisp-dm.md
Lines changed: 2 additions & 5 deletions
diff --git a/‎…-introduction/model-selection-process.md‎ ‎…-zoomcamp/01-intro/05-model-selection.md‎docs/machine-learning-zoomcamp/module-01-introduction/model-selection-process.md renamed to docs/machine-learning-zoomcamp/01-intro/05-model-selection.md
Lines changed: 2 additions & 2 deletions b/‎…-introduction/model-selection-process.md‎ ‎…-zoomcamp/01-intro/05-model-selection.md‎docs/machine-learning-zoomcamp/module-01-introduction/model-selection-process.md renamed to docs/machine-learning-zoomcamp/01-intro/05-model-selection.md
Lines changed: 2 additions & 2 deletions
diff --git a/‎…ule-01-introduction/github-codespaces.md‎ ‎…oomcamp/01-intro/06-github-codespaces.md‎docs/machine-learning-zoomcamp/module-01-introduction/github-codespaces.md renamed to docs/machine-learning-zoomcamp/01-intro/06-github-codespaces.md
Lines changed: 1 addition & 1 deletion b/‎…ule-01-introduction/github-codespaces.md‎ ‎…oomcamp/01-intro/06-github-codespaces.md‎docs/machine-learning-zoomcamp/module-01-introduction/github-codespaces.md renamed to docs/machine-learning-zoomcamp/01-intro/06-github-codespaces.md
Lines changed: 1 addition & 1 deletion
@@ -0,0 +1,3 @@
+{
+    "python.analysis.typeCheckingMode": "basic"
+}
@@ -58,6 +58,8 @@ exclude:
   - Dockerfile
   # theme test code
   - fixtures/
+  - _private
+  - _private/**
 
 # Set a path/url to a logo that will be displayed instead of the title
 logo: "assets/images/dtc-logo.png"
 
@@ -0,0 +1,23 @@
+
+
+## Glossary lesson 1
+- Features: All the known characteristics or pieces of information about an object that are used as input to a machine learning model to make a prediction. (e.g., car's age, manufacturer, mileage).
+- Target: The specific variable or value that a machine learning model aims to predict. (e.g., the price of a car).
+- Model: The output of the machine learning training process; it's an algorithm or system that has learned patterns from data and can be used to make predictions on new, unseen data.
+- Prediction: The output generated by a trained machine learning model when it is given new input features, representing the model's estimate for the target variable.
+- Machine Learning (ML): A process of extracting patterns and insights from data, allowing a system to learn and make predictions or decisions without being explicitly programmed for every specific outcome.
+- Data: Raw facts, figures, or information collected for analysis. In machine learning, it refers to the collection of features and targets used for training and testing models.
+- Patterns: Regular and discernible relationships or structures within data that machine learning algorithms identify and learn from.
+- Rule-based Systems: Traditional programming approaches where a system makes decisions based on a predefined set of explicit rules, often contrasting with the pattern-learning approach of machine learning.
+- Training: The process of feeding data (features and corresponding targets) to a machine learning algorithm so that it can learn the underlying patterns and create a model.
+
+## Glossary lesson 2
+
+- Classifier: A system or algorithm that categorizes input into one of several classes (e.g., spam/not spam).
+- Features: Measurable characteristics or attributes of the data used by the model (e.g., email length, sender domain).
+- Target Variable: The variable that the model is trying to predict or classify (e.g., whether an email is spam or not).
+- Training/Fitting: The process of feeding data to a machine learning algorithm so it can learn patterns and create a model.
+- Model: The output of the training process; a learned representation that can make predictions on new data.
+- Probability: The likelihood of an event occurring, often output by ML models before a final classification decision.
+- Decision Threshold: A predefined value used to convert a model's probability output into a definitive classification (e.g., >0.5 means spam).
+- Supervised Learning: A type of machine learning where the model learns from labeled data (inputs with known correct outputs). Spam detection and car price prediction are examples.
@@ -0,0 +1,133 @@
+import os
+import csv
+import re
+from urllib.parse import urlparse, parse_qs
+
+# Function to extract YouTube video ID from URL
+def extract_video_id(url):
+    parsed_url = urlparse(url)
+    if parsed_url.netloc == 'www.youtube.com' or parsed_url.netloc == 'youtube.com':
+        if parsed_url.path == '/watch':
+            return parse_qs(parsed_url.query)['v'][0]
+        elif 'embed' in parsed_url.path:
+            return parsed_url.path.split('/')[-1]
+    elif parsed_url.netloc == 'youtu.be':
+        return parsed_url.path[1:]
+    return None
+
+# Function to create folder if it doesn't exist
+def create_folder_if_not_exists(folder_path):
+    if not os.path.exists(folder_path):
+        os.makedirs(folder_path)
+        print(f"Created folder: {folder_path}")
+
+# Function to format the title from filename
+def format_title(filename):
+    # Remove file extension
+    name = os.path.splitext(filename)[0]
+    # Extract the number prefix if it exists
+    num_prefix = re.search(r'^(\d+)-', name)
+    prefix = ""
+    if num_prefix:
+        prefix = f"{num_prefix.group(1)}. "
+        # Remove leading numbers and hyphens (like 01-)
+        name = re.sub(r'^\d+-', '', name)
+    # Replace hyphens with spaces
+    name = name.replace('-', ' ')
+    # Capitalize words
+    return prefix + ' '.join(word.capitalize() for word in name.split())
+
+# Function to format module name
+def format_module_name(folder):
+    # Remove leading numbers (like 01)
+    module_num = re.search(r'^(\d+)', folder)
+    if module_num:
+        module_number = module_num.group(1)
+        # Remove the hyphen if it exists
+        module_name = re.sub(r'^\d+-', '', folder)
+        # Replace hyphens with spaces
+        module_name = module_name.replace('-', ' ')
+        # Capitalize words
+        module_name = ' '.join(word.capitalize() for word in module_name.split())
+        return f"Module {module_number}: {module_name}"
+    else:
+        # If no leading number, just capitalize
+        return ' '.join(word.capitalize() for word in folder.replace('-', ' ').split())
+
+# Function to create index.md file for a module
+def create_index_file(module_folder, module_name):
+    index_path = os.path.join(module_folder, 'index.md')
+    # Extract module number for nav_order
+    module_num = re.search(r'Module (\d+)', module_name)
+    nav_order = int(module_num.group(1)) if module_num else 1
+    
+    with open(index_path, 'w') as f:
+        f.write(f"""---
+title: "{module_name}"
+nav_order: {nav_order}
+has_children: true
+---
+
+# {module_name}
+
+This module covers the following topics:
+
+""")
+    print(f"Created index file: {index_path}")
+
+# Function to create markdown file for a video
+def create_md_file(module_folder, filename, video_url, module_name, order):
+    md_path = os.path.join(module_folder, filename)
+    title = format_title(filename)
+    video_id = extract_video_id(video_url)
+    
+    # Extract file number for nav_order if it exists
+    file_num = re.search(r'^(\d+)', os.path.splitext(filename)[0])
+    nav_order = int(file_num.group(1)) if file_num else order
+    
+    with open(md_path, 'w') as f:
+        f.write(f"""---
+title: "{title}"
+parent: "{module_name}"
+nav_order: {nav_order}
+---
+
+<iframe width="560" height="315" src="https://www.youtube.com/embed/{video_id}" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
+""")
+    print(f"Created markdown file: {md_path}")
+
+# Main function
+def main():
+    base_dir = 'docs/machine-learning-zoomcamp'
+    csv_file = 'youtube_links.csv'
+    
+    # Read CSV file
+    with open(csv_file, 'r') as f:
+        reader = csv.reader(f)
+        next(reader)  # Skip header row
+        
+        # Group by folder
+        folder_files = {}
+        for row in reader:
+            folder, filename, video_url = row
+            if folder not in folder_files:
+                folder_files[folder] = []
+            folder_files[folder].append((filename, video_url))
+    
+    # Create folders and files
+    for folder, files in folder_files.items():
+        module_name = format_module_name(folder)
+        module_folder = os.path.join(base_dir, folder)
+        
+        # Create module folder
+        create_folder_if_not_exists(module_folder)
+        
+        # Create index.md for the module
+        create_index_file(module_folder, module_name)
+        
+        # Create markdown files for each video
+        for i, (filename, video_url) in enumerate(files):
+            create_md_file(module_folder, filename, video_url, module_name, i + 1)
+
+if __name__ == "__main__":
+    main()
@@ -1,6 +1,6 @@
 ---
-title: What is Machine Learning?
-parent: "Module 1: Introduction to Machine Learning"
+title: "01. What Is Ml"
+parent: "Module 01: Intro"
 nav_order: 1
 ---
 
 
@@ -1,6 +1,6 @@
 ---
-title: "Machine Learning vs Rule-Based Systems: What is the Difference?"
-parent: "Module 1: Introduction to Machine Learning"
+title: "02. Ml Vs Rules"
+parent: "Module 01: Intro"
 nav_order: 2
 ---
 
 
@@ -1,6 +1,6 @@
 ---
-title: "What is Supervised Machine Learning?"
-parent: "Module 1: Introduction to Machine Learning"
+title: "03. Supervised Ml"
+parent: "Module 01: Intro"
 nav_order: 3
 ---
 
@@ -106,4 +106,4 @@ Examples include:
 
 ## Next Steps
 
-In the next lesson, we'll explore the bigger picture of organizing machine learning projects and discuss a methodology called CRISP-DM (Cross-Industry Standard Process for Data Mining), which provides a structured approach to planning and executing machine learning projects.
+In the next lesson, we'll explore the bigger picture of organizing machine learning projects and discuss a methodology called CRISP-DM (Cross-Industry Standard Process for Data Mining), which provides a structured approach to planning and executing machine learning projects.
@@ -1,13 +1,10 @@
 ---
-title: "What is CRISP-DM?"
-parent: "Module 1: Introduction to Machine Learning"
+title: "04. Crisp Dm"
+parent: "Module 01: Intro"
 nav_order: 4
 ---
-
 # What is CRISP-DM?
 
-> These notes are based on the video [ML Zoomcamp 1.4 - CRISP-DM](https://youtu.be/dCa3JvmJbr0?si=QixEZxWzDeCnSvCq)
-
 <iframe width="560" height="315" src="https://www.youtube.com/embed/dCa3JvmJbr0?si=QixEZxWzDeCnSvCq" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
 
 CRISP-DM stands for **Cross-Industry Standard Process for Data Mining**, a methodology for organizing machine learning projects. Despite being developed in the 1990s by IBM, it has stood the test of time and remains relevant for modern ML projects with minimal modifications.
 
@@ -1,6 +1,6 @@
 ---
-title: "Model Selection Process"
-parent: "Module 1: Introduction to Machine Learning"
+title: "05. Model Selection"
+parent: "Module 01: Intro"
 nav_order: 5
 ---
 
 
@@ -1,5 +1,5 @@
 ---
-title: "GitHub Codespaces"
+title: "1.6. GitHub Codespaces"
 parent: "Module 1: Introduction to Machine Learning"
 nav_order: 6
 ---
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+{`
	`2`	`+ "python.analysis.typeCheckingMode": "basic"`
	`3`	`+}`