A collection of four pattern-recognition mini-projects converted from an original MATLAB course into Python. Each project implements a core algorithm from scratch (least-squares discriminants, the batch perceptron, a soft-margin SVM dual, a Bayes classifier) and, where useful, compares the hand-written implementation against an established library to validate it.
The emphasis throughout is on understanding the mechanics — solving the SVM
dual by hand, deriving the maximum-likelihood estimates, building the
perceptron update — rather than calling a black-box .fit(). Library
implementations are used as benchmarks, not as the solution.
python-pr-class-projects/
├── data/ # datasets (committed — see "Data" below)
│ ├── Proj1DataSet.xlsx
│ ├── Proj2DataSet.xlsx
│ ├── Proj3Train100.xlsx
│ ├── Proj3Train1000.xlsx
│ ├── Proj3Test.xlsx
│ └── celegans/
├── proj1-linear-discriminant-functions/
├── proj2-soft-margin-svm/
├── proj3-bayes-naive-bayes/
├── proj4-svm-celegans/
├── pyproject.toml # dependencies (managed by uv)
├── uv.lock # locked, reproducible versions
└── README.md
Each project folder contains its source scripts, a conftest.py (see
Running the tests), and a tests/ subfolder with its
pytest suite.
The data/ folder lives outside the project folders and is shared:
keeping data in one place avoids duplicating files across projects, and every
loader resolves the path by walking up to the repository root, so the projects
run no matter where the repo is cloned or launched from.
Implements linear discriminants on the Iris dataset across three parts:
exploratory data analysis (feature statistics, within/between-class variance,
a correlation heatmap), then binary classification comparing Least Squares
against the Batch Perceptron, and finally multi-class Least Squares. Both
classifiers are written from scratch — the closed-form least-squares solution
and the iterative perceptron update with its own convergence loop.
Data: data/Proj1DataSet.xlsx
Binary case (Setosa vs Versicolor + Virginica) on the petal features. The Least Squares and Batch Perceptron boundaries both separate Setosa cleanly, but land in different places — LS minimises squared error over all points, while the perceptron only moves to fix misclassifications.
Multi-class Least Squares over all three species, showing the three pairwise decision boundaries. The Setosa block is linearly separable; the Versicolor/Virginica overlap is where the misclassifications occur.
Implements a soft-margin Support Vector Machine from scratch by solving the
dual quadratic program directly (via cvxopt), for both a linear and a
Gaussian RBF kernel. The hand-written solver is validated against
scikit-learn's SVC, and a timing experiment compares the from-scratch QP
solver against the library's SMO algorithm as the dataset grows.
Data: data/Proj2DataSet.xlsx
Linear soft-margin SVM for C = 0.1 (wide margin, many support vectors) vs C = 100 (narrow margin, fewer support vectors). Support vectors are split into three categories — on the margin, inside the margin, and misclassified — each marked distinctly.
The Gaussian-kernel SVM (extra credit) carving a curved decision boundary around the classes — the nonlinear extension of the linear case, again shown for two C values.
Training time vs sample size at C = 100. The from-scratch QP solver (red) scales far worse than the library's SMO solver (green), which was purpose-built for SVM training — exactly the gap the experiment was meant to expose.
Implements a Bayes classifier and a Naive Bayes classifier from scratch,
using maximum-likelihood estimation of the Gaussian class parameters. The
full Bayes model estimates a complete covariance matrix per class; Naive Bayes
assumes feature independence (a diagonal covariance). The experiment compares
both against the theoretical optimum (Bayes with the true parameters) and
shows how each behaves as the amount of training data grows.
Data: data/Proj3Train100.xlsx, data/Proj3Train1000.xlsx,
data/Proj3Test.xlsx
Test accuracy and error for each classifier. With only 100 training samples, Naive Bayes actually beats full Bayes — the full covariance has many more parameters to estimate, so it overfits when data is scarce. With 1000 samples the full model catches up and both approach the true-parameter ceiling.
Applies a Gaussian-kernel SVM (scikit-learn, validated by the from-scratch work in proj2) to a real image-classification task: distinguishing worm images (class 1) from plate defects (class 0) in C. elegans microscopy data. The pipeline reads the raw images, reduces dimensionality with PCA to keep training tractable, grid-searches the kernel scale and box constraint, and saves the trained model. A separate inference script reloads the model and steps through the test images interactively.
C. elegans is a roundworm widely used in biological research; the worms are
imaged on plates, but plates sometimes have defects that make the animals hard
to track, and this classifier separates clean worm images from defective ones.
Data: data/celegans/
Confusion matrix on the held-out test set, ~92.8% accuracy. The errors are roughly balanced between the two classes — no strong bias toward predicting worm or defect.
The inference viewer steps through test images in defect/worm pairs from the terminal, showing each image's true label and the model's prediction.
The data/ folder is committed to this repository — the datasets
(including the C. elegans image set) total well under 100 MB, so they live in
git alongside the code and every project works immediately after cloning, with
nothing to download or arrange by hand.
data/
├── Proj1DataSet.xlsx # proj1 (Iris)
├── Proj2DataSet.xlsx # proj2
├── Proj3Train100.xlsx # proj3 (100-sample training set)
├── Proj3Train1000.xlsx # proj3 (1000-sample training set)
├── Proj3Test.xlsx # proj3 (test set)
└── celegans/ # proj4
├── 0/ # defect (class 0)
│ ├── training/
│ └── test/
└── 1/ # worm (class 1)
├── training/
└── test/
- Proj1DataSet.xlsx — the Iris dataset (the spreadsheet provided with the course materials), arranged as three class-ordered blocks of 50.
- Proj2DataSet.xlsx — the two-class point set for the SVM experiments.
- Proj3Train100 / Train1000 / Test — the Gaussian-generated training and test sets for the Bayes experiments, each with 5 features and a class column.
- C. elegans — extracted from microscopy source images. The extraction
scripts are intentionally not included: extraction required manual data
preparation afterward, and shipping the scripts without that manual context
would cause more confusion than it resolves. The images are already split
into
training/andtest/under each class folder.
Because the data is committed, there is nothing to set up here — cloning the repository gives you everything each project needs.
This project uses uv for dependency
management, so there is no requirements.txt. Dependencies are declared in
pyproject.toml and pinned in uv.lock for fully reproducible installs.
1. Install uv (if you don't already have it).
- Windows (PowerShell):
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
- macOS / Linux:
curl -LsSf https://astral.sh/uv/install.sh | sh
2. Install the dependencies. From the repository root:
uv syncThis reads pyproject.toml and uv.lock, creates a virtual environment
(.venv/), and installs the exact locked versions. You do not create or manage
the venv by hand.
What the files do:
pyproject.toml— declares the project and its dependencies.uv.lock— the resolved, locked versionsuv syncinstalls from (commit this; it's what makes installs reproducible)..venv/— the environment uv creates. Not committed.
3. Run things with uv run, which uses the project environment
automatically:
uv run python proj1-linear-discriminant-functions/main.pyShared environment: all four projects share one virtual environment at the repository root — there is not a separate venv per project. Run
uv synconce and every project is ready.
These projects were written in VS Code. You don't need it, but if you hit
import or interpreter issues, VS Code makes them easy to avoid: open the repo
root as the workspace folder and select the .venv interpreter
(Ctrl/Cmd+Shift+P → "Python: Select Interpreter" → the .venv in the repo
root). Several scripts also open matplotlib windows, so run them in an
environment with a display rather than a headless terminal.
Each project has a real pytest suite under its tests/ folder. These are
genuine assertion-based tests — they verify behavior and fail when something
breaks, not demonstrations that merely print output. They exist to catch
regressions: the algorithms here are hand-written and interdependent, so a test
that pins down, say, "the SVM dual coefficients satisfy the box constraint" or
"the support-vector counts match the report" protects you the day an edit
silently breaks one of them.
From the repository root:
uv run python -m pytestThis discovers and runs every project's tests in one pass. Add detail or brevity as needed:
uv run python -m pytest -v # verbose: one line per test
uv run python -m pytest -q # quiet: compact summaryNot using uv? Every command below also works without the
uv runprefix — just callpython -m pytestdirectly. The only requirement is that the dependencies are installed in the active Python environment (e.g. you ranuv syncand activated.venv, or installed the packages another way). With uv,uv runhandles the environment for you; without it, activate your environment first and drop the prefix:python -m pytest # run everything python -m pytest -v # verbose python -m pytest -q # quiet
uv run python -m pytest "proj2-soft-margin-svm"
# without uv:
python -m pytest "proj2-soft-margin-svm"uv run python -m pytest "proj3-bayes-naive-bayes/tests/test_bayes_classifiers.py"
uv run python -m pytest "proj3-bayes-naive-bayes/tests/test_bayes_classifiers.py::test_accuracy_error_sum_to_100"
# without uv:
python -m pytest "proj3-bayes-naive-bayes/tests/test_bayes_classifiers.py"Each project has a conftest.py at its root (next to the source files, not
inside tests/). It is required: the test files live in tests/, but the
modules they import live one level up in the project root. pytest adds the
folder containing the nearest conftest.py to the import path, so this file is
what lets a test do from classifiers import least_squares without any package
setup or sys.path hacks.
Some of these conftest.py files are more than just a marker — where a project
has source in more than one folder (for example proj2, whose extra-credit RBF
code lives in a subfolder), the conftest.py adds those folders to the path
too, and proj-specific module name prefixes keep the suites from colliding when
every project is collected together in a single pytest run. Don't delete
these files — without them, the tests in tests/ fail with
ModuleNotFoundError.







