🧠 Unsupervised Machine Learning Portfolio

A collection of unsupervised machine learning projects built with Python and Scikit-learn, exploring clustering, dimensionality reduction, probabilistic modelling, and recommendation systems on real-world datasets.

📁 Projects Overview

Folder	Technique	Description
`kmean-hierarchical_clustering`	K-Means + Hierarchical	Segment data into meaningful groups
`gaussian_mixture`	Gaussian Mixture Models	Probabilistic soft-assignment clustering
`principal_component_analysis`	PCA	Reduce high-dimensional data while preserving variance
`dimensional_reduction`	t-SNE / UMAP	Non-linear dimensionality reduction for visualisation
`recommendation_system`	Collaborative Filtering	Build a personalised item recommendation engine
`clustering_geospatial`	K-Means + GeoPandas	Cluster world countries by GDP and population on a map

🗂️ Project Details

1. 📍 Clustering & Geospatial Analysis

Applied K-Means clustering on world country data using GDP and population as features, then visualised the results as choropleth maps and multidimensional scatter overlays.

Key techniques:

K-Means clustering on economic indicators
Geospatial visualisation with GeoPandas
Multidimensional plotting — colour (GDP), size (population), shape (continent)
Coordinate Reference System (CRS) handling

Dataset: Natural Earth Low Resolution — country geometries, GDP, population, continent

2. 📊 K-Means & Hierarchical Clustering

Compared flat (K-Means) and hierarchical (agglomerative) clustering approaches on structured datasets. Used the elbow method to determine optimal cluster count and dendrograms to visualise merge distances.

Key techniques:

K-Means with elbow method for optimal k
Agglomerative hierarchical clustering
Dendrogram visualisation
Silhouette score evaluation

3. 🌊 Gaussian Mixture Models

Modelled data distributions using probabilistic Gaussian Mixture Models (GMM), allowing soft cluster assignments where each data point has a probability of belonging to each cluster.

Key techniques:

Expectation-Maximisation (EM) algorithm
Soft vs hard cluster assignment
BIC/AIC model selection
Comparison with K-Means hard clustering

4. 📉 Principal Component Analysis (PCA)

Applied PCA to reduce high-dimensional datasets to 2–3 components while retaining maximum variance, enabling faster modelling and cleaner visualisations.

Key techniques:

Explained variance ratio analysis
Scree plots for component selection
2D and 3D PCA projections
PCA as preprocessing for downstream ML

5. 🔍 Dimensionality Reduction (t-SNE / UMAP)

Used non-linear dimensionality reduction techniques to reveal hidden structure in high-dimensional data that PCA cannot capture linearly.

Key techniques:

t-SNE for cluster visualisation
UMAP for faster non-linear projection
Comparison of linear vs non-linear reduction
Hyperparameter tuning (perplexity, n_neighbors)

6. 🎯 Recommendation System

Built a collaborative filtering recommendation engine that suggests items to users based on patterns in historical interactions — the same approach used by Netflix and Spotify.

Key techniques:

User-based and item-based collaborative filtering
Cosine similarity for user/item matching
Matrix factorisation concepts
Evaluation with precision and recall

🛠️ Tech Stack

Library	Purpose
`scikit-learn`	Clustering, PCA, GMM, model evaluation
`pandas`	Data loading, manipulation
`numpy`	Numerical computation
`matplotlib`	All visualisations and plots
`seaborn`	Statistical visualisation
`geopandas`	Geospatial data handling and mapping
`shapely`	Geometric operations

🚀 Getting Started

1. Clone the repository

git clone https://github.com/Lami14/alx_data_science.git
cd alx_data_science

2. Install dependencies

pip install scikit-learn pandas numpy matplotlib seaborn geopandas shapely

3. Open any project

cd kmean-hierarchical_clustering
jupyter notebook

Each folder contains its own notebook with inline explanations and outputs.

💡 What I Learned

How unsupervised learning finds hidden structure in unlabelled data
When to choose K-Means vs GMM vs hierarchical clustering for a given problem
How PCA and t-SNE complement each other — PCA for speed, t-SNE for visualisation
How real-world recommendation systems work under the hood using similarity metrics
How to represent economic and demographic data visually on geographic maps

🔮 Future Improvements

Add evaluation metrics (silhouette score, Davies-Bouldin index) across all projects
Deploy the recommendation system as a Flask API
Add an interactive map dashboard using Folium or Plotly
Extend geospatial analysis with South African provincial data

📄 License

MIT License — feel free to fork and build on this work.

Built by Lamla · Unsupervised ML Portfolio · ALX Africa Data Science Programme 🇿🇦

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Unsupervised Machine Learning Portfolio

📁 Projects Overview

🗂️ Project Details

1. 📍 Clustering & Geospatial Analysis

2. 📊 K-Means & Hierarchical Clustering

3. 🌊 Gaussian Mixture Models

4. 📉 Principal Component Analysis (PCA)

5. 🔍 Dimensionality Reduction (t-SNE / UMAP)

6. 🎯 Recommendation System

🛠️ Tech Stack

🚀 Getting Started

1. Clone the repository

2. Install dependencies

3. Open any project

💡 What I Learned

🔮 Future Improvements

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
dimensional_reduction		dimensional_reduction
gaussian_mixture		gaussian_mixture
kmean-hierarchical_clustering		kmean-hierarchical_clustering
principal_component_analysis		principal_component_analysis
recommendation_system		recommendation_system
Clustering_and_geospatial_analysis_exercise.ipynb.txt		Clustering_and_geospatial_analysis_exercise.ipynb.txt
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

🧠 Unsupervised Machine Learning Portfolio

📁 Projects Overview

🗂️ Project Details

1. 📍 Clustering & Geospatial Analysis

2. 📊 K-Means & Hierarchical Clustering

3. 🌊 Gaussian Mixture Models

4. 📉 Principal Component Analysis (PCA)

5. 🔍 Dimensionality Reduction (t-SNE / UMAP)

6. 🎯 Recommendation System

🛠️ Tech Stack

🚀 Getting Started

1. Clone the repository

2. Install dependencies

3. Open any project

💡 What I Learned

🔮 Future Improvements

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages