A complete implementation of binary and multi-class classification using logistic regression and softmax regression, built from scratch with NumPy. Features hyperparameter tuning, regularization, and comprehensive visualization.
This project demonstrates fundamental machine learning concepts through implementation of classification algorithms without using high-level ML libraries. Built for MNIST digit classification, it showcases:
- Binary Classification: Logistic regression with L2 regularization
- Multi-class Classification: Softmax regression with mini-batch gradient descent
- Hyperparameter Tuning: Grid search over learning rates and regularization strengths
- Numerical Stability: Custom implementations of stable sigmoid and softmax functions
- Comprehensive Visualization: Loss curves, accuracy plots, and weight visualizations
- Optimization: Batch gradient descent with L2 weight regularization
- Loss Function: Binary cross-entropy with regularization term
- Features:
- Numerically stable sigmoid computation
- Gradient calculation with automatic differentiation-style updates
- Configurable learning rate and regularization strength
- Optimization: Mini-batch stochastic gradient descent
- Loss Function: Categorical cross-entropy with L2 regularization
- Features:
- Numerically stable softmax with max subtraction trick
- Efficient batch processing (512-sample batches)
- Feature standardization (z-score normalization)
Numerical Stability
# Sigmoid: Avoids overflow for large negative values
sigmoid(z) = 1/(1+exp(-z)) for z ≥ 0
= exp(z)/(1+exp(z)) for z < 0
# Softmax: Max subtraction prevents overflow
z' = z - max(z)
softmax(z) = exp(z') / sum(exp(z'))Regularization
- L2 penalty applied only to weights (not bias)
- Prevents overfitting on training data
- Grid search over λ ∈ [10⁻⁵, 10²]
Hyperparameter Optimization
- Learning rates: 8 values logarithmically spaced from 10⁻⁵ to 10²
- Regularization strengths: 8 values logarithmically spaced from 10⁻⁵ to 10²
- Total: 64 model configurations evaluated
- Selection criterion: Maximum validation accuracy
Visualizes sample MNIST images to understand the dataset structure and labeling.
Trains a binary classifier (e.g., 0 vs 1) with:
- 1000 epochs of batch gradient descent
- Learning rate: 0.1
- No regularization (λ=0)
- Plots: Loss and accuracy curves over training
Comprehensive grid search to find optimal hyperparameters:
- Tests 64 combinations of learning rate and regularization
- Visualizes performance across hyperparameter space
- Analyzes weight patterns under different regularization strengths
- Reports final test set accuracy
Tests model behavior on random noise inputs:
- Generates 10,000 random images from U[0,1]
- Analyzes output probability distribution
- Evaluates model's confidence on out-of-distribution data
Extends to full 10-class MNIST:
- Softmax regression for digits 0-9
- Mini-batch gradient descent (batch size: 512)
- 20 epochs with carefully tuned hyperparameters
- Feature standardization for improved convergence
pip install numpy matplotlibPlace these files in the same directory as the script:
binary_class.npz- Binary classification data (Parts A-C, Bonus A)bonus.npz- Full 10-class MNIST data (Bonus B) [optional]
python Logistic_softmax_regression.pyThe script will automatically:
- Visualize sample training images
- Train binary logistic regression
- Perform hyperparameter tuning
- Test on random noise (Bonus A)
- Train softmax classifier if bonus.npz exists (Bonus B)
The implementation tracks loss and accuracy over 1000 epochs. As seen below, the model successfully minimizes cross-entropy loss while maintaining high generalization accuracy on the validation set.
By visualizing the weights as 28×28 images, we can observe exactly what the model has learned. This gallery demonstrates the effect of the L2 penalty (λ) on the model's internal representations.
| No Regularization ( |
Optimal Weights ( |
Strong Regularization ( |
|---|---|---|
![]() |
![]() |
![]() |
Technical Insights:
-
Under-regularized (
$\lambda=0$ ): Weights are noisy and capture high-frequency variance from the training data, leading to potential overfitting. -
Optimal
$\lambda$ : The "ghost" of the digit is clearly visible, showing that the model is focusing on the core spatial features. -
Over-regularized (
$\lambda=1$ ): The penalty is too high, dampening the weights and washing out the structural details of the digit.
A grid search was performed over 64 combinations of learning rates and regularization strengths. The model automatically selects the configuration with the highest validation accuracy for final testing.
- Gradient Descent: Implemented batch and mini-batch variants
- Regularization: Practical experience with L2 penalty for preventing overfitting
- Cross-Entropy Loss: Understanding and implementation for classification
- Softmax Function: Multi-class probability distributions
- Stability: Techniques to avoid overflow/underflow in exponentials
- Vectorization: Efficient NumPy operations for matrix computations
- Batch Processing: Memory-efficient training on large datasets
- Train/Validation/Test Split: Proper evaluation methodology
- Hyperparameter Search: Systematic grid search approach
- Overfitting Detection: Monitoring train vs validation performance
LogisticRegressionclass: Full binary classification pipelineSoftmaxRegressionclass: Multi-class classification with mini-batching- Stable activation functions:
_sigmoid_stable(),_softmax_stable() - Hyperparameter search:
find_best_hyperparameters()
- Type hints throughout (Python 3.10+)
- Dataclass for configuration (
SoftmaxConfig) - Modular design with clear separation of concerns
- Comprehensive documentation and comments
Binary logistic regression gradients:
∂L/∂w = (1/n) Xᵀ(σ(Xw+b) - y) + λw
∂L/∂b = (1/n) Σ(σ(Xw+b) - y)
Softmax regression gradients:
∂L/∂W = (1/n) Xᵀ(P - Y_onehot) + λW
∂L/∂b = (1/n) Σ(P - Y_onehot)
Binary cross-entropy with L2 regularization:
L = -(1/n) Σ[y log(σ(z)) + (1-y) log(1-σ(z))] + (λ/2)||w||²
Categorical cross-entropy with L2 regularization:
L = -(1/n) Σ log(P[i, y[i]]) + (λ/2)||W||²_F
This project demonstrates:
- Building ML algorithms from mathematical foundations
- Understanding the internals of popular libraries (scikit-learn, TensorFlow)
- Debugging numerical issues in optimization
- Practical hyperparameter tuning strategies
- Proper experimental methodology (train/val/test splits)
- Python 3.10+: Modern Python with type hints
- NumPy: Efficient numerical computing
- Matplotlib: Comprehensive visualization
- Dataclasses: Clean configuration management
Potential extensions:
- Implement momentum and Adam optimizers
- Add learning rate scheduling
- Multi-layer neural networks
- Data augmentation techniques
- K-fold cross-validation
- Confusion matrix analysis
- ROC curves and AUC metrics
Note: This implementation was developed for educational purposes to demonstrate understanding of fundamental machine learning algorithms and numerical optimization techniques.



