Skip to content

Oferneum/ml-logistic-regression

Repository files navigation

Logistic & Softmax Regression from Scratch

A complete implementation of binary and multi-class classification using logistic regression and softmax regression, built from scratch with NumPy. Features hyperparameter tuning, regularization, and comprehensive visualization.

🎯 Project Overview

This project demonstrates fundamental machine learning concepts through implementation of classification algorithms without using high-level ML libraries. Built for MNIST digit classification, it showcases:

  • Binary Classification: Logistic regression with L2 regularization
  • Multi-class Classification: Softmax regression with mini-batch gradient descent
  • Hyperparameter Tuning: Grid search over learning rates and regularization strengths
  • Numerical Stability: Custom implementations of stable sigmoid and softmax functions
  • Comprehensive Visualization: Loss curves, accuracy plots, and weight visualizations

🏗️ Technical Implementation

Core Algorithms

Binary Logistic Regression

  • Optimization: Batch gradient descent with L2 weight regularization
  • Loss Function: Binary cross-entropy with regularization term
  • Features:
    • Numerically stable sigmoid computation
    • Gradient calculation with automatic differentiation-style updates
    • Configurable learning rate and regularization strength

Softmax (Multinomial) Regression

  • Optimization: Mini-batch stochastic gradient descent
  • Loss Function: Categorical cross-entropy with L2 regularization
  • Features:
    • Numerically stable softmax with max subtraction trick
    • Efficient batch processing (512-sample batches)
    • Feature standardization (z-score normalization)

Key Technical Features

Numerical Stability

# Sigmoid: Avoids overflow for large negative values
sigmoid(z) = 1/(1+exp(-z))  for z0
           = exp(z)/(1+exp(z)) for z < 0

# Softmax: Max subtraction prevents overflow
z' = z - max(z)
softmax(z) = exp(z') / sum(exp(z'))

Regularization

  • L2 penalty applied only to weights (not bias)
  • Prevents overfitting on training data
  • Grid search over λ ∈ [10⁻⁵, 10²]

Hyperparameter Optimization

  • Learning rates: 8 values logarithmically spaced from 10⁻⁵ to 10²
  • Regularization strengths: 8 values logarithmically spaced from 10⁻⁵ to 10²
  • Total: 64 model configurations evaluated
  • Selection criterion: Maximum validation accuracy

📊 Project Components

Part A: Data Visualization

Visualizes sample MNIST images to understand the dataset structure and labeling.

Part B: Binary Logistic Regression

Trains a binary classifier (e.g., 0 vs 1) with:

  • 1000 epochs of batch gradient descent
  • Learning rate: 0.1
  • No regularization (λ=0)
  • Plots: Loss and accuracy curves over training

Part C: Hyperparameter Tuning

Comprehensive grid search to find optimal hyperparameters:

  • Tests 64 combinations of learning rate and regularization
  • Visualizes performance across hyperparameter space
  • Analyzes weight patterns under different regularization strengths
  • Reports final test set accuracy

Bonus A: Robustness Analysis

Tests model behavior on random noise inputs:

  • Generates 10,000 random images from U[0,1]
  • Analyzes output probability distribution
  • Evaluates model's confidence on out-of-distribution data

Bonus B: Multi-class Classification

Extends to full 10-class MNIST:

  • Softmax regression for digits 0-9
  • Mini-batch gradient descent (batch size: 512)
  • 20 epochs with carefully tuned hyperparameters
  • Feature standardization for improved convergence

🚀 Getting Started

Prerequisites

pip install numpy matplotlib

Data Files Required

Place these files in the same directory as the script:

  • binary_class.npz - Binary classification data (Parts A-C, Bonus A)
  • bonus.npz - Full 10-class MNIST data (Bonus B) [optional]

Running the Project

python Logistic_softmax_regression.py

The script will automatically:

  1. Visualize sample training images
  2. Train binary logistic regression
  3. Perform hyperparameter tuning
  4. Test on random noise (Bonus A)
  5. Train softmax classifier if bonus.npz exists (Bonus B)

📈 Results & Visualizations

Model Convergence

The implementation tracks loss and accuracy over 1000 epochs. As seen below, the model successfully minimizes cross-entropy loss while maintaining high generalization accuracy on the validation set.

Model Training Results

🧠 Weight Analysis & Regularization

By visualizing the weights as 28×28 images, we can observe exactly what the model has learned. This gallery demonstrates the effect of the L2 penalty (λ) on the model's internal representations.

No Regularization ($\lambda=0$) Optimal Weights ($\lambda=best$) Strong Regularization ($\lambda=1$)
No Reg Optimal Strong Reg

Technical Insights:

  • Under-regularized ($\lambda=0$): Weights are noisy and capture high-frequency variance from the training data, leading to potential overfitting.
  • Optimal $\lambda$: The "ghost" of the digit is clearly visible, showing that the model is focusing on the core spatial features.
  • Over-regularized ($\lambda=1$): The penalty is too high, dampening the weights and washing out the structural details of the digit.

🔍 Hyperparameter Optimization

A grid search was performed over 64 combinations of learning rates and regularization strengths. The model automatically selects the configuration with the highest validation accuracy for final testing.

🔬 Key Learnings

Machine Learning Concepts

  • Gradient Descent: Implemented batch and mini-batch variants
  • Regularization: Practical experience with L2 penalty for preventing overfitting
  • Cross-Entropy Loss: Understanding and implementation for classification
  • Softmax Function: Multi-class probability distributions

Numerical Computing

  • Stability: Techniques to avoid overflow/underflow in exponentials
  • Vectorization: Efficient NumPy operations for matrix computations
  • Batch Processing: Memory-efficient training on large datasets

Model Evaluation

  • Train/Validation/Test Split: Proper evaluation methodology
  • Hyperparameter Search: Systematic grid search approach
  • Overfitting Detection: Monitoring train vs validation performance

💡 Implementation Highlights

Custom Components

  • LogisticRegression class: Full binary classification pipeline
  • SoftmaxRegression class: Multi-class classification with mini-batching
  • Stable activation functions: _sigmoid_stable(), _softmax_stable()
  • Hyperparameter search: find_best_hyperparameters()

Code Quality

  • Type hints throughout (Python 3.10+)
  • Dataclass for configuration (SoftmaxConfig)
  • Modular design with clear separation of concerns
  • Comprehensive documentation and comments

📝 Technical Details

Gradient Computation

Binary logistic regression gradients:

∂L/∂w = (1/n) Xᵀ(σ(Xw+b) - y) + λw
∂L/∂b = (1/n) Σ(σ(Xw+b) - y)

Softmax regression gradients:

∂L/∂W = (1/n) Xᵀ(P - Y_onehot) + λW
∂L/∂b = (1/n) Σ(P - Y_onehot)

Loss Functions

Binary cross-entropy with L2 regularization:

L = -(1/n) Σ[y log(σ(z)) + (1-y) log(1-σ(z))] + (λ/2)||w||²

Categorical cross-entropy with L2 regularization:

L = -(1/n) Σ log(P[i, y[i]]) + (λ/2)||W||²_F

🎓 Educational Value

This project demonstrates:

  • Building ML algorithms from mathematical foundations
  • Understanding the internals of popular libraries (scikit-learn, TensorFlow)
  • Debugging numerical issues in optimization
  • Practical hyperparameter tuning strategies
  • Proper experimental methodology (train/val/test splits)

🔧 Technologies Used

  • Python 3.10+: Modern Python with type hints
  • NumPy: Efficient numerical computing
  • Matplotlib: Comprehensive visualization
  • Dataclasses: Clean configuration management

📚 Future Enhancements

Potential extensions:

  • Implement momentum and Adam optimizers
  • Add learning rate scheduling
  • Multi-layer neural networks
  • Data augmentation techniques
  • K-fold cross-validation
  • Confusion matrix analysis
  • ROC curves and AUC metrics

Note: This implementation was developed for educational purposes to demonstrate understanding of fundamental machine learning algorithms and numerical optimization techniques.

About

Binary and multi-class classification with logistic/softmax regression built from scratch using NumPy

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages