Skip to content

BridgingAISocietySummerSchools/Data-Science-AI-Python-Course

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Learn Python: A Course Designed Specifically for Data Science and AI

Python Version Jupyter License: MIT Difficulty Focus

A polished, 10-notebook university-quality introduction to Python for Data Science and Machine Learning. From "What's a variable?" to building, evaluating, and interpreting your own ML models in scikit-learn.


🚀 Quick start — open any notebook in Google Colab

Click any badge below to launch the notebook in Colab with zero setup — no install, no Python, no terminal. Just a Google account.

🧱 Module 1 — Python Fundamentals

  • Open in Colab Notebook 1 — Python Basics (25–30 min) Variables, data types, arithmetic, strings, f-strings, a first applied calculation.
  • Open in Colab Notebook 2 — Control Structures (30–35 min) if / elif / else, for, while, break / continue, try / except.

📦 Module 2 — Data Structures

  • Open in Colab Notebook 3 — Lists and Sequences (30–35 min) Indexing, slicing, list comprehensions, tuples, strings as sequences, nested lists.
  • Open in Colab Notebook 4 — Dictionaries (30–35 min) Key-value lookup, nested dicts, list of dicts, counting / grouping, JSON.

🧰 Module 3 — Data Science Libraries

  • Open in Colab Notebook 5 — Pandas Preview (25–30 min) Series, DataFrames, indexing with loc / iloc, filtering, groupby.
  • Open in Colab Notebook 6 — Functions and Modules (30–35 min) Parameters, defaults, *args / **kwargs, scope, docstrings, type hints, imports.
  • Open in Colab Notebook 7 — NumPy Fundamentals (30–40 min) Arrays, vectorisation, broadcasting, axes, reproducible randomness.
  • Open in Colab Notebook 8 — Matplotlib Basics (35–45 min) Figure / Axes model, line / bar / scatter / hist / box / heatmap, subplots, annotations.

🤖 Module 4 — Machine Learning

  • Open in Colab Notebook 9 — Scikit-Learn Basics (60–75 min) Train/test split, classification + regression, pipelines, metrics, GridSearchCV, feature importance.

🏆 Capstone

  • Open in Colab Notebook 10 — Capstone: Weather Data Analysis (60–90 min) Full end-to-end project: data, EDA, dashboard, regression, executive summary.

💡 Pro tip. Google Colab provides a free Python environment with all course libraries (NumPy, pandas, matplotlib, scikit-learn) pre-installed.

Total time: ~6 hours of focused learning, plus 2–4 hours of practice.


🎯 Who this course is for

This course is designed for complete beginners who want to use Python specifically for data science, machine learning, and analytical work — not generic application development.

You will benefit if you are:

  • A business professional who wants to move from spreadsheets to code.
  • A student in a quantitative field (statistics, economics, biology, physics, social science).
  • A researcher who wants to script analyses instead of clicking through dropdowns.
  • A career-switcher targeting data analyst / data scientist / ML engineer roles.
  • A developer in another stack adding "data" to your skillset.

Prerequisites: none. A laptop, a browser, and curiosity are enough.

🌟 What makes this course different?

🎯 Data-science focused from day one. Every concept connects to real workflows. List slicing is taught as X[0:3] — the same syntax you'll see in scikit-learn. Dictionaries are taught as JSON-shaped records you'll meet in every API.

🧠 Intuition before syntax. Each section opens with why a concept matters before showing how it works. Analogies, diagrams, mental models — not just code.

🛠️ Real-world, not toy. Financial calculations, weather analysis, customer data, machine-learning pipelines — examples that mirror what data scientists actually do.

🧩 Modular & progressive. Notebooks build on each other. By Notebook 7 you're vectorising in NumPy; by Notebook 9 you're training random forests with cross-validation; by Notebook 10 you're shipping a small end-to-end project.

💡 Exercises with full solutions. Every notebook has 5+ exercises, including a "Debug me 🐞" — and every exercise has a detailed solution that explains the reasoning, not just the code.

📊 Polished visuals. Charts are clean, professionally styled, and chosen for didactic value.

📚 Learning objectives

By the end of this course you will be able to:

  • Write clean Python code with appropriate data structures, control flow, and functions.
  • Manipulate tabular data with pandas and numerical data with NumPy.
  • Build clear, publication-quality visualisations with matplotlib.
  • Train, evaluate, and interpret classification and regression models in scikit-learn.
  • Apply the full ML workflow — split, fit, evaluate, tune — without making the classic beginner mistakes.
  • Communicate findings via a short executive summary and a 2×2 dashboard.

▶️ Getting started

Option A — Google Colab (recommended, 0 setup)

Click any of the Open in Colab badges above. Sign in with a Google account. That's it.

Option B — Run locally

If you'd rather have a local environment:

# clone the repo
git clone https://github.com/BridgingAISocietySummerSchools/Data-Science-AI-Python-Course.git
cd Data-Science-AI-Python-Course

# either: one-shot setup script
./setup.sh

# or: manual venv
python3 -m venv venv
source venv/bin/activate          # Windows: venv\Scripts\activate
pip install -r requirements.txt
jupyter notebook

Start with 01_python_basics.ipynb and work through in order.

🗂️ Repository layout

📁 Data-Science-AI-Python-Course/
├── 📓 01_python_basics.ipynb          # Variables, types, arithmetic, f-strings
├── 📓 02_control_structures.ipynb     # if/elif/else, loops, try/except
├── 📓 03_lists_data_structures.ipynb  # Lists, indexing, slicing, comprehensions
├── 📓 04_dictionaries_advanced.ipynb  # Dictionaries, nested data, JSON
├── 📓 05_pandas_preview.ipynb         # DataFrames, groupby, plotting
├── 📓 06_functions_modules.ipynb      # Functions, defaults, scope, imports
├── 📓 07_numpy_fundamentals.ipynb     # Arrays, vectorisation, broadcasting
├── 📓 08_matplotlib_basics.ipynb      # Professional plotting
├── 📓 09_scikit_learn_basics.ipynb    # Classification + regression
├── 📓 10_capstone_project.ipynb       # End-to-end weather analysis
├── 📄 README.md                       # ← you are here
├── 📄 Python Data Science Cheat Sheet.md  # Quick syntax reference
├── 📄 CHANGELOG.md                    # Version history
├── 📄 CONTRIBUTING.md                 # How to contribute
├── 📄 requirements.txt                # Python dependencies
├── 📄 requirements-dev.txt            # Dev-only dependencies
└── 🛠️ setup.sh                        # One-shot local setup

🧭 Suggested learning path

The notebooks are designed to be done in order. Each notebook assumes you've internalised the previous ones.

                                Recommended order
   1 ──► 2 ──► 3 ──► 4 ──► 5 ──► 6 ──► 7 ──► 8 ──► 9 ──► 10
   │     │     │     │     │     │     │     │     │     │
   │     │     │     │     │     │     │     │     │     └─ 🏆 Capstone
   │     │     │     │     │     │     │     │     └─────── ML models
   │     │     │     │     │     │     │     └─────────── Visualisation
   │     │     │     │     │     │     └─────────────── NumPy arrays
   │     │     │     │     │     └─────────────────── Functions / modules
   │     │     │     │     └─────────────────────── First pandas
   │     │     │     └─────────────────────────── Dictionaries / JSON
   │     │     └─────────────────────────────── Lists & slicing
   │     └─────────────────────────────────── Decisions & loops
   └──────────────────────────────────── Python fundamentals

A typical schedule:

Pace Plan
1 hour / day 1 notebook per day → done in ~2 weeks
3 hours / weekend 3 notebooks per weekend → done in ~3 weekends
Bootcamp weekend All 10 in 2 days (~6 hours pure + breaks)

🧪 What's inside each notebook?

Every notebook follows the same modern structure:

  1. Header — module, time estimate, learning objectives, prerequisites.
  2. Sections with intuition first, then code, then a brief reflection on the output.
  3. Small examples → larger applied examples → exercises.
  4. 5 + practice exercises including at least one "Debug me 🐞".
  5. Complete solutions with explanations (collapsed in <details>).
  6. Key takeaways + self-assessment checklist + next-step pointer.

🧯 Troubleshooting

Imports fail in Colab. Almost never happens — Colab has the full stack. If it does, run !pip install <package> in a fresh cell.

Matplotlib plots don't show locally. Make sure you're running a Jupyter notebook (not a .py file). In some setups you may need %matplotlib inline in the first cell.

Jupyter won't start locally. pip install --upgrade jupyter usually fixes it. Fresh venv: rm -rf venv && python3 -m venv venv && source venv/bin/activate && pip install -r requirements.txt.

scikit-learn load_boston errors. It was removed in scikit-learn 1.2 — Notebook 9 uses the California Housing dataset instead.

📦 Dependencies

The full list lives in requirements.txt. The pinned core:

  • numpy ≥ 1.24
  • pandas ≥ 2.0
  • matplotlib ≥ 3.7
  • scikit-learn ≥ 1.3
  • scipy ≥ 1.10
  • seaborn ≥ 0.12 (optional, used briefly)
  • jupyter ≥ 1.0

Python 3.10 or newer is recommended (we use PEP 604 union types).

📚 Further reading

📰 Related reading: Learn Python for Data Science


🤝 Contributing

We welcome contributions! Check CONTRIBUTING.md for guidelines. Found a typo, a bug, or a clearer way to explain something? Open an issue or a PR.

📄 License

MIT — see LICENSE.


Every expert was once a beginner. The only difference is they started.

Welcome — and have fun. What will you build with your data-science skills? 🚀

Made with ❤️ for the Data Science community

⬆ Back to top

Releases

No releases published

Packages

 
 
 

Contributors