You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This repository serves as a comprehensive portfolio for Machine Learning (ML) and Deep Learning (DL) projects, completed as part of the Codecademy AI/ML Engineering certification. It encapsulates a wide array of fundamental and advanced concepts, from traditional supervised and unsupervised learning to modern neural network architectures.
The projects are implemented primarily in Python using Jupyter Notebooks, leveraging popular libraries such as scikit-learn, pandas, NumPy, Matplotlib, TensorFlow, and PyTorch. Each folder contains project code and analysis relevant to the topic.
๐ Codecademy Certification
This repository documents the comprehensive portfolio of projects completed as part of the Codecademy AI/ML Engineering + Data Science: Machine Learning Specialist Certification. The work within these folders demonstrates proficiency in:
Machine Learning Fundamentals: Supervised (Regression and Classification), Unsupervised (Clustering), and Ensemble methods.
Deep Learning: Implementing neural networks using TensorFlow/Keras and PyTorch.
Data Science Workflow: Exploratory Data Analysis (EDA), Data Visualization, Feature Engineering, and Model Selection/Tuning.
Core Libraries: Extensive use of scikit-learn, pandas, NumPy, Matplotlib, TensorFlow, and PyTorch.
The structured organization reflects the curriculum's progression, moving from foundational statistics and data visualization through traditional machine learning algorithms to advanced deep learning architectures and deployment concepts.
๐ Repository Structure & Project Details
Boosting_Ensemble
Project
Description
Boosting
Predict whether or not a person makes more than $50,000 using census data, demonstrating the power of sequential ensemble methods.
Classification_Project
A collection of diverse classification tasks:
Project
Algorithm
Description
Classifying Tweets
Naive Bayes Classifier
Uses a Naive Bayes Classifier to find patterns in real tweets and predict their origin (New York, London, or Paris).
Classifying Viral Tweets
K-Nearest Neighbor (KNN)
Employs the K-Nearest Neighbor algorithm to predict whether a tweet will go viral based on its features.
Data_Science
End-to-end data analysis and model building projects:
Project
Description
Bio Diversity
Analyze biodiversity data from the National Parks Service, focusing on various species observed across different national park locations.
OkCupid
Comprehensive project involving scoping, data preparation, analysis, and building a machine learning model using data from the OKCupid online dating application.
Data_Visualization
Projects focused on creating informative and clear visualizations for data exploration:
Project
Type
Focus
Categorical Data
EDA
Visual exploration of Mushroom datasets.
EDA
EDA
Airline Analysis data investigation.
Line Graph
Time Series
Tracking Online Lime Sales over time.
Portfolio Project
Multiple Visuals
Analyzing the relationship between Life Expectancy and GDP.
Decision_Trees
Project
Description
Find the flag!
Use Decision Trees to predict the continent of flags based on various features (colors, shapes, etc.), and explore feature importance.
Deep_Learning_TensorFlow
Advanced projects using TensorFlow/Keras for Deep Learning tasks:
Type
Project
Description
Classification
Galaxies
Classifying different types of Galaxies using Convolutional Neural Networks (CNNs).
Classification
Heart Failure
Predict the survival of patients with heart failure.
Classification
X-Rays
Analyzing Lung Scans (X-Rays) to predict pneumonia, Covid-19, or no illness.
Regression
Chances of Admission
Predicting a student's chances of admission to a university.
Exploratory_Data_Analysis
Detailed projects on initial data investigation and cleaning:
Project
Focus Dataset
Diabetes
Analyzing health and risk factors associated with diabetes.
NBA Trends
Investigating trends and statistics within the National Basketball Association.
Stackoverflow
Exploring developer survey data from Stack Overflow.
Students
Analyzing student performance and demographic data.
Feature_Engineering
Projects focused on transforming raw data into features that best represent the underlying problem:
Method
Project
Description
Filter Method
Customer Reviews
Applying filter methods (e.g., statistical tests) to select relevant features from a dataset of customer reviews on a clothing brand.
Wrapper Method
Obesity on lifestyle
Implementing wrapper methods (e.g., Recursive Feature Elimination) to determine the best subset of lifestyle factors for predicting obesity.
Hyperparameter_Tuning
Project
Description
Classify Raisins
Classifying different types of raisins (Kecimen and Besni) by implementing and comparing two tuning techniques: Grid Search for a Decision Tree Classifier and Random Search for a Logistic Regression Classifier.
K_Means_Clustering
Project
Algorithm
Description
Handwriting Recognition
K-Means Clustering
Using the unsupervised K-Means algorithm to cluster and recognize patterns in handwriting data.
K_Nearest_Neighbors
Project
Algorithm
Description
Breast Cancer Classifier
K-Nearest Neighbor (KNN)
Building a model to classify and predict the diagnosis of breast cancer based on medical features.
Linear_Regression
Projects demonstrating the fundamental Linear Regression model:
Implementation
Description
Scratch
Implementation of Traditional Linear Regression from scratch, providing a deep understanding of the underlying mathematics.
Sklearn
Utilizing the scikit-learn library for efficient implementation of Linear Regression.
Logistic_Regression
Projects on binary and multi-class classification using Logistic Regression:
Project
Description
Credit Card Fraud
Building a Logistic Regression model to detect and classify instances of credit card fraud.
Income Classification
Classifying individuals based on demographic data to predict their income bracket (e.g., $50K+).
ML_Pipeline
Project
Description
Classification Model
Creating a complete Machine Learning Pipeline to build a classification model for diagnosing hematologic diseases in pediatric patients.
Multiple_Linear_Regression
Projects extending Linear Regression to multiple predictor variables:
Project
Description
Tennis Ace
Predicting the outcome (e.g., score, ranking) for a tennis player based on multiple playing habits and statistics.
Yelp Regression
Investigating factors that most affect a restaurant's Yelp rating and building a model to predict the rating.
Naive_Bayes_Classifier
Project
Algorithm
Description
Email Similarity
Implementing the Naive Bayes Classifier to measure and classify email similarity based on content.
Neural_Networks
Project
Description
Life Expectancy
Using TensorFlow/Keras to build a Neural Network model to predict the life expectancy of countries based on socio-economic and health factors.
Perceptrons
Project
Description
Logic Gates
Modeling the fundamental building blocks of computersโlogic gates (AND, OR, and XOR)โusing simple Perceptrons.
Principal_Component_Analysis
File
Description
script_1.py
Classification task using PCA on the Telescope dataset to classify particles into gamma (signal) or hadrons (background).
script_2.py
Standalone implementation of the PCA algorithm for dimensionality reduction.
PyTorch ๐
Projects leveraging the PyTorch deep learning framework:
Project
Description
EV_Charging
Using Neural Networks built in PyTorch for predicting Residential EV Charging Loads.
Hotel_Cancellation
Building a PyTorch model for predicting Hotel Booking Cancellations.
Random_Forests
Project
Description
Census Data
Using the Random Forest ensemble method to predict whether or not a person makes more than $50,000 using census data.
Recommender_System
Project
Description
Book Recommender System
Building a system that suggests books to users based on collaborative filtering or content-based methods.
Regularization
Project
Description
Predict Wine Quality
Applying Regularization techniques (L1/L2) to a regression model to improve generalization and predict Wine Quality.
Statistics
Foundational projects covering core statistical concepts for data science:
46 projects on a full spectrum of Advanced Data Science, AI, Machine Learning, Deep Learning skills, including EDA, Data Visualization, traditional ML Fundamentals (Regression, Classification, Clustering, Ensemble methods) using TensorFlow/Keras, PyTorch, Scikit-Learn, Pandas, NumPy, & more, implemented in Python scripts & Jupyter Notebooks.k