Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -353,3 +353,8 @@ MigrationBackup/
.ionide/
.vscode/settings.json

# Example output files (generated by running example scripts)
examples/*.png
examples/*.jpg
examples/*.jpeg

16 changes: 16 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,8 @@ Get started with the following resources:

# Getting Started

> **Complete Beginners**: New to data science? Start with our [beginner-friendly examples](examples/README.md)! These simple, well-commented examples will help you understand the basics before diving into the full curriculum.

> **Teachers**: we have [included some suggestions](for-teachers.md) on how to use this curriculum. We'd love your feedback [in our discussion forum](https://github.com/microsoft/Data-Science-For-Beginners/discussions)!

> **[Students](https://aka.ms/student-page)**: to use this curriculum on your own, fork the entire repo and complete the exercises on your own, starting with a pre-lecture quiz. Then read the lecture and complete the rest of the activities. Try to create the projects by comprehending the lessons rather than copying the solution code; however, that code is available in the /solutions folders in each project-oriented lesson. Another idea would be to form a study group with friends and go through the content together. For further study, we recommend [Microsoft Learn](https://docs.microsoft.com/en-us/users/jenlooper-2911/collections/qprpajyoy3x0g7?WT.mc_id=academic-77958-bethanycheum).
Expand Down Expand Up @@ -86,6 +88,20 @@ In addition, a low-stakes quiz before a class sets the intention of the student

> **A note about quizzes**: All quizzes are contained in the Quiz-App folder, for 40 total quizzes of three questions each. They are linked from within the lessons, but the quiz app can be run locally or deployed to Azure; follow the instruction in the `quiz-app` folder. They are gradually being localized.

## πŸŽ“ Beginner-Friendly Examples

**New to Data Science?** We've created a special [examples directory](examples/README.md) with simple, well-commented code to help you get started:

- 🌟 **Hello World** - Your first data science program
- πŸ“‚ **Loading Data** - Learn to read and explore datasets
- πŸ“Š **Simple Analysis** - Calculate statistics and find patterns
- πŸ“ˆ **Basic Visualization** - Create charts and graphs
- πŸ”¬ **Real-World Project** - Complete workflow from start to finish

Each example includes detailed comments explaining every step, making it perfect for absolute beginners!

πŸ‘‰ **[Start with the examples](examples/README.md)** πŸ‘ˆ

## Lessons


Expand Down
87 changes: 87 additions & 0 deletions examples/01_hello_world_data_science.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
"""
Hello World - Data Science Style!

This is your very first data science program. It introduces you to the basic
concepts of working with data in Python.

What you'll learn:
- How to create a simple dataset
- How to display data
- How to work with Python lists and dictionaries
- Basic data manipulation

Prerequisites: Just Python installed on your computer!
"""

# Let's start with the classic "Hello, World!" but with a data science twist
print("=" * 50)
print("Hello, World of Data Science!")
print("=" * 50)
print()

# In data science, we work with data. Let's create our first simple dataset.
# We'll use a list to store information about students and their test scores.

# A list is a collection of items in Python, written with square brackets []
students = ["Alice", "Bob", "Charlie", "Diana", "Eve"]
scores = [85, 92, 78, 95, 88]

print("Our Dataset:")
print("-" * 50)
print("Students:", students)
print("Scores:", scores)
print()

# Now let's do something useful with this data!
# We can find basic statistics about the scores

# Find the highest score
highest_score = max(scores)
print(f"πŸ“Š Highest score: {highest_score}")

# Find the lowest score
lowest_score = min(scores)
print(f"πŸ“Š Lowest score: {lowest_score}")

# Calculate the average score
# sum() adds all numbers together, len() tells us how many items we have
average_score = sum(scores) / len(scores)
print(f"πŸ“Š Average score: {average_score:.2f}") # .2f means show 2 decimal places
print()

# Let's find who got the highest score
# We use index() to find where the highest_score is in our list
top_student_index = scores.index(highest_score)
top_student = students[top_student_index]
print(f"πŸ† Top student: {top_student} with a score of {highest_score}")
print()

# Now let's organize this data in a more structured way
# We'll use a dictionary - it pairs keys (student names) with values (scores)
print("Student Scores (organized as key-value pairs):")
print("-" * 50)

# Create a dictionary by pairing students with their scores
student_scores = {}
for i in range(len(students)):
student_scores[students[i]] = scores[i]

# Display each student and their score
for student, score in student_scores.items():
# Add a special marker for the top student
marker = "⭐" if student == top_student else " "
print(f"{marker} {student}: {score} points")

print()
print("=" * 50)
print("Congratulations! You've completed your first data science program!")
print("=" * 50)

# What did we just do?
# 1. Created a simple dataset (student names and scores)
# 2. Performed basic analysis (max, min, average)
# 3. Found insights (who is the top student)
# 4. Organized the data in a useful structure (dictionary)
#
# These are the fundamental building blocks of data science!
# Next, you'll learn to work with real datasets using powerful libraries.
128 changes: 128 additions & 0 deletions examples/02_loading_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
"""
Loading and Exploring Data

In real data science projects, you'll work with data stored in files.
This example shows you how to load data from a CSV file and explore it.

What you'll learn:
- How to load data from a CSV file
- How to view basic information about your dataset
- How to display the first/last rows
- How to get summary statistics

Prerequisites: pandas library (install with: pip install pandas)
"""

# Import the pandas library - it's the most popular tool for working with data in Python
# We give it the short name 'pd' so we can type less
import pandas as pd

print("=" * 70)
print("Welcome to Data Loading and Exploration!")
print("=" * 70)
print()

# Step 1: Load data from a CSV file
# CSV stands for "Comma-Separated Values" - a common format for storing data
# We'll use the birds dataset that comes with this repository
print("πŸ“‚ Loading data from birds.csv...")
print()

# Load the data into a DataFrame (think of it as a smart spreadsheet)
# A DataFrame is pandas' main data structure - it organizes data in rows and columns
data = pd.read_csv('../data/birds.csv')

print("βœ… Data loaded successfully!")
print()

# Step 2: Get basic information about the dataset
print("-" * 70)
print("BASIC DATASET INFORMATION")
print("-" * 70)

# How many rows and columns do we have?
num_rows, num_columns = data.shape
print(f"πŸ“Š Dataset size: {num_rows} rows Γ— {num_columns} columns")
print()

# What are the column names?
print("πŸ“‹ Column names:")
for i, column in enumerate(data.columns, 1):
print(f" {i}. {column}")
print()

# Step 3: Look at the first few rows of data
# This gives us a quick preview of what the data looks like
print("-" * 70)
print("FIRST 5 ROWS OF DATA (Preview)")
print("-" * 70)
print(data.head()) # head() shows the first 5 rows by default
print()

# Step 4: Look at the last few rows
print("-" * 70)
print("LAST 3 ROWS OF DATA")
print("-" * 70)
print(data.tail(3)) # tail(3) shows the last 3 rows
print()

# Step 5: Get information about data types
print("-" * 70)
print("DATA TYPES AND NON-NULL COUNTS")
print("-" * 70)
print(data.info()) # Shows column names, data types, and count of non-null values
print()

# Step 6: Get statistical summary
print("-" * 70)
print("STATISTICAL SUMMARY (for numerical columns)")
print("-" * 70)
# describe() gives us statistics like mean, std, min, max, etc.
print(data.describe())
print()

# Step 7: Check for missing values
print("-" * 70)
print("MISSING VALUES CHECK")
print("-" * 70)
missing_values = data.isnull().sum()
print("Number of missing values per column:")
print(missing_values)
print()

if missing_values.sum() == 0:
print("βœ… Great! No missing values found.")
else:
print("⚠️ Some columns have missing values. You may need to handle them.")
print()

# Step 8: Get unique values in a column
print("-" * 70)
print("SAMPLE: UNIQUE VALUES")
print("-" * 70)
# Let's see what unique values exist in the first column
first_column = data.columns[0]
unique_count = data[first_column].nunique()
print(f"The column '{first_column}' has {unique_count} unique value(s)")
print()

# Summary
print("=" * 70)
print("SUMMARY")
print("=" * 70)
print("You've learned how to:")
print(" βœ“ Load data from a CSV file using pandas")
print(" βœ“ Check the size and shape of your dataset")
print(" βœ“ View the first and last rows")
print(" βœ“ Understand data types")
print(" βœ“ Get statistical summaries")
print(" βœ“ Check for missing values")
print()
print("Next step: Try loading other CSV files from the data/ folder!")
print("=" * 70)

# Pro Tips:
# - Always explore your data before analyzing it
# - Check for missing values and understand why they might be missing
# - Look at the data types to ensure they make sense
# - Use head() and tail() to spot any obvious issues with your data
Loading
Loading