A polished, 10-notebook university-quality introduction to Python for Data Science and Machine Learning. From "What's a variable?" to building, evaluating, and interpreting your own ML models in scikit-learn.
Click any badge below to launch the notebook in Colab with zero setup — no install, no Python, no terminal. Just a Google account.
Notebook 1 — Python Basics (25–30 min) Variables, data types, arithmetic, strings, f-strings, a first applied calculation.
Notebook 2 — Control Structures (30–35 min)
if / elif / else,for,while,break / continue,try / except.
Notebook 3 — Lists and Sequences (30–35 min) Indexing, slicing, list comprehensions, tuples, strings as sequences, nested lists.
Notebook 4 — Dictionaries (30–35 min) Key-value lookup, nested dicts, list of dicts, counting / grouping, JSON.
Notebook 5 — Pandas Preview (25–30 min) Series, DataFrames, indexing with
loc/iloc, filtering,groupby.Notebook 6 — Functions and Modules (30–35 min) Parameters, defaults,
*args/**kwargs, scope, docstrings, type hints, imports.Notebook 7 — NumPy Fundamentals (30–40 min) Arrays, vectorisation, broadcasting, axes, reproducible randomness.
Notebook 8 — Matplotlib Basics (35–45 min) Figure / Axes model, line / bar / scatter / hist / box / heatmap, subplots, annotations.
Notebook 9 — Scikit-Learn Basics (60–75 min) Train/test split, classification + regression, pipelines, metrics,
GridSearchCV, feature importance.
Notebook 10 — Capstone: Weather Data Analysis (60–90 min) Full end-to-end project: data, EDA, dashboard, regression, executive summary.
💡 Pro tip. Google Colab provides a free Python environment with all course libraries (NumPy, pandas, matplotlib, scikit-learn) pre-installed.
Total time: ~6 hours of focused learning, plus 2–4 hours of practice.
This course is designed for complete beginners who want to use Python specifically for data science, machine learning, and analytical work — not generic application development.
You will benefit if you are:
- A business professional who wants to move from spreadsheets to code.
- A student in a quantitative field (statistics, economics, biology, physics, social science).
- A researcher who wants to script analyses instead of clicking through dropdowns.
- A career-switcher targeting data analyst / data scientist / ML engineer roles.
- A developer in another stack adding "data" to your skillset.
Prerequisites: none. A laptop, a browser, and curiosity are enough.
🎯 Data-science focused from day one. Every concept connects to real workflows. List slicing is taught as X[0:3] — the same syntax you'll see in scikit-learn. Dictionaries are taught as JSON-shaped records you'll meet in every API.
🧠 Intuition before syntax. Each section opens with why a concept matters before showing how it works. Analogies, diagrams, mental models — not just code.
🛠️ Real-world, not toy. Financial calculations, weather analysis, customer data, machine-learning pipelines — examples that mirror what data scientists actually do.
🧩 Modular & progressive. Notebooks build on each other. By Notebook 7 you're vectorising in NumPy; by Notebook 9 you're training random forests with cross-validation; by Notebook 10 you're shipping a small end-to-end project.
💡 Exercises with full solutions. Every notebook has 5+ exercises, including a "Debug me 🐞" — and every exercise has a detailed solution that explains the reasoning, not just the code.
📊 Polished visuals. Charts are clean, professionally styled, and chosen for didactic value.
By the end of this course you will be able to:
- Write clean Python code with appropriate data structures, control flow, and functions.
- Manipulate tabular data with pandas and numerical data with NumPy.
- Build clear, publication-quality visualisations with matplotlib.
- Train, evaluate, and interpret classification and regression models in scikit-learn.
- Apply the full ML workflow — split, fit, evaluate, tune — without making the classic beginner mistakes.
- Communicate findings via a short executive summary and a 2×2 dashboard.
Click any of the Open in Colab badges above. Sign in with a Google account. That's it.
If you'd rather have a local environment:
# clone the repo
git clone https://github.com/BridgingAISocietySummerSchools/Data-Science-AI-Python-Course.git
cd Data-Science-AI-Python-Course
# either: one-shot setup script
./setup.sh
# or: manual venv
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
jupyter notebookStart with 01_python_basics.ipynb and work through in order.
📁 Data-Science-AI-Python-Course/
├── 📓 01_python_basics.ipynb # Variables, types, arithmetic, f-strings
├── 📓 02_control_structures.ipynb # if/elif/else, loops, try/except
├── 📓 03_lists_data_structures.ipynb # Lists, indexing, slicing, comprehensions
├── 📓 04_dictionaries_advanced.ipynb # Dictionaries, nested data, JSON
├── 📓 05_pandas_preview.ipynb # DataFrames, groupby, plotting
├── 📓 06_functions_modules.ipynb # Functions, defaults, scope, imports
├── 📓 07_numpy_fundamentals.ipynb # Arrays, vectorisation, broadcasting
├── 📓 08_matplotlib_basics.ipynb # Professional plotting
├── 📓 09_scikit_learn_basics.ipynb # Classification + regression
├── 📓 10_capstone_project.ipynb # End-to-end weather analysis
├── 📄 README.md # ← you are here
├── 📄 Python Data Science Cheat Sheet.md # Quick syntax reference
├── 📄 CHANGELOG.md # Version history
├── 📄 CONTRIBUTING.md # How to contribute
├── 📄 requirements.txt # Python dependencies
├── 📄 requirements-dev.txt # Dev-only dependencies
└── 🛠️ setup.sh # One-shot local setup
The notebooks are designed to be done in order. Each notebook assumes you've internalised the previous ones.
Recommended order
1 ──► 2 ──► 3 ──► 4 ──► 5 ──► 6 ──► 7 ──► 8 ──► 9 ──► 10
│ │ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ │ └─ 🏆 Capstone
│ │ │ │ │ │ │ │ └─────── ML models
│ │ │ │ │ │ │ └─────────── Visualisation
│ │ │ │ │ │ └─────────────── NumPy arrays
│ │ │ │ │ └─────────────────── Functions / modules
│ │ │ │ └─────────────────────── First pandas
│ │ │ └─────────────────────────── Dictionaries / JSON
│ │ └─────────────────────────────── Lists & slicing
│ └─────────────────────────────────── Decisions & loops
└──────────────────────────────────── Python fundamentals
A typical schedule:
| Pace | Plan |
|---|---|
| 1 hour / day | 1 notebook per day → done in ~2 weeks |
| 3 hours / weekend | 3 notebooks per weekend → done in ~3 weekends |
| Bootcamp weekend | All 10 in 2 days (~6 hours pure + breaks) |
Every notebook follows the same modern structure:
- Header — module, time estimate, learning objectives, prerequisites.
- Sections with intuition first, then code, then a brief reflection on the output.
- Small examples → larger applied examples → exercises.
- 5 + practice exercises including at least one "Debug me 🐞".
- Complete solutions with explanations (collapsed in
<details>). - Key takeaways + self-assessment checklist + next-step pointer.
Imports fail in Colab. Almost never happens — Colab has the full stack. If it does, run !pip install <package> in a fresh cell.
Matplotlib plots don't show locally. Make sure you're running a Jupyter notebook (not a .py file). In some setups you may need %matplotlib inline in the first cell.
Jupyter won't start locally. pip install --upgrade jupyter usually fixes it. Fresh venv: rm -rf venv && python3 -m venv venv && source venv/bin/activate && pip install -r requirements.txt.
scikit-learn load_boston errors. It was removed in scikit-learn 1.2 — Notebook 9 uses the California Housing dataset instead.
The full list lives in requirements.txt. The pinned core:
numpy ≥ 1.24pandas ≥ 2.0matplotlib ≥ 3.7scikit-learn ≥ 1.3scipy ≥ 1.10seaborn ≥ 0.12(optional, used briefly)jupyter ≥ 1.0
Python 3.10 or newer is recommended (we use PEP 604 union types).
- 📘 Official Python tutorial
- 🔢 NumPy user guide
- 🐼 Pandas getting-started
- 📊 Matplotlib tutorials
- 🤖 Scikit-learn user guide
- 🏆 Kaggle Learn — free, project-based DS courses
- 📖 Hands-On Machine Learning — A. Géron, the gold-standard book
📰 Related reading: Learn Python for Data Science
We welcome contributions! Check CONTRIBUTING.md for guidelines. Found a typo, a bug, or a clearer way to explain something? Open an issue or a PR.
MIT — see LICENSE.
Every expert was once a beginner. The only difference is they started.
Welcome — and have fun. What will you build with your data-science skills? 🚀
Made with ❤️ for the Data Science community