Welcome to Module 3 resources. This README compiles all lecture materials, references, and external learning links related to data-centric AI, dataset handling, preprocessing, and augmentation.
- Learning from Data
- Data Organization & Leakage Prevention
- Oxford Flowers 102 Dataset
- Preprocessing & Transform Pipelines
- Data Augmentation & Robustness
-
MIT CSAIL - Data-Centric AI (overview of model-centric vs data-centric workflows)
https://datacentricai.org/ -
CACM - The Principles of Data-Centric AI
https://cacm.acm.org/ -
ArXiv Survey - Data-Centric Artificial Intelligence
https://arxiv.org/
-
Scikit-learn - Cross-validation guide
https://scikit-learn.org/stable/modules/cross_validation.html -
MachineLearningMastery - Data Leakage in Machine Learning
https://machinelearningmastery.com/data-leakage-machine-learning/
-
Official VGG Dataset Page
https://www.robots.ox.ac.uk/~vgg/data/flowers/102/ -
Hugging Face Dataset Card
https://huggingface.co/datasets/oxford_flowers102 -
VGG Research Group Overview
https://www.robots.ox.ac.uk/~vgg/
-
CS231n Stanford - Data Preprocessing Notes
https://cs231n.stanford.edu/ https://cs231n.stanford.edu/slides/2023/lecture_7.pdf -
Sebastian Raschka - Feature Scaling & Normalization
https://sebastianraschka.com/Articles/2014_about_feature_scaling.html -
Voxel51 Blog - Image Preprocessing Best Practices
https://voxel51.com/blog/image-preprocessing-best-practices-to-optimize-your-ai-workflows
-
Roboflow Blog - Data Augmentation Guide
https://blog.roboflow.com/data-augmentation/ -
Stanford AI Lab blog - Automating Data Augmentation https://ai.stanford.edu/blog/data-augmentation/
-
Albumentations Tutorial https://www.youtube.com/watch?v=rAdLwKJBvPM
This module focuses on:
- Data-centric AI principles
- Proper dataset handling and validation
- Image preprocessing pipelines
- Robust augmentation strategies