Python Pattern Recognition Class Projects

A collection of four pattern-recognition mini-projects converted from an original MATLAB course into Python. Each project implements a core algorithm from scratch (least-squares discriminants, the batch perceptron, a soft-margin SVM dual, a Bayes classifier) and, where useful, compares the hand-written implementation against an established library to validate it.

The emphasis throughout is on understanding the mechanics — solving the SVM dual by hand, deriving the maximum-likelihood estimates, building the perceptron update — rather than calling a black-box .fit(). Library implementations are used as benchmarks, not as the solution.

Repository layout

python-pr-class-projects/
├── data/                          # datasets (committed — see "Data" below)
│   ├── Proj1DataSet.xlsx
│   ├── Proj2DataSet.xlsx
│   ├── Proj3Train100.xlsx
│   ├── Proj3Train1000.xlsx
│   ├── Proj3Test.xlsx
│   └── celegans/
├── proj1-linear-discriminant-functions/
├── proj2-soft-margin-svm/
├── proj3-bayes-naive-bayes/
├── proj4-svm-celegans/
├── pyproject.toml                 # dependencies (managed by uv)
├── uv.lock                        # locked, reproducible versions
└── README.md

Each project folder contains its source scripts, a conftest.py (see Running the tests), and a tests/ subfolder with its pytest suite.

The data/ folder lives outside the project folders and is shared: keeping data in one place avoids duplicating files across projects, and every loader resolves the path by walking up to the repository root, so the projects run no matter where the repo is cloned or launched from.

The projects

proj1 — Linear Discriminant Functions

Implements linear discriminants on the Iris dataset across three parts: exploratory data analysis (feature statistics, within/between-class variance, a correlation heatmap), then binary classification comparing Least Squares against the Batch Perceptron, and finally multi-class Least Squares. Both classifiers are written from scratch — the closed-form least-squares solution and the iterative perceptron update with its own convergence loop. Data: data/Proj1DataSet.xlsx

Binary case (Setosa vs Versicolor + Virginica) on the petal features. The Least Squares and Batch Perceptron boundaries both separate Setosa cleanly, but land in different places — LS minimises squared error over all points, while the perceptron only moves to fix misclassifications.

Multi-class Least Squares over all three species, showing the three pairwise decision boundaries. The Setosa block is linearly separable; the Versicolor/Virginica overlap is where the misclassifications occur.

proj2 — Soft-Margin SVM

Implements a soft-margin Support Vector Machine from scratch by solving the dual quadratic program directly (via cvxopt), for both a linear and a Gaussian RBF kernel. The hand-written solver is validated against scikit-learn's SVC, and a timing experiment compares the from-scratch QP solver against the library's SMO algorithm as the dataset grows. Data: data/Proj2DataSet.xlsx

Linear soft-margin SVM for C = 0.1 (wide margin, many support vectors) vs C = 100 (narrow margin, fewer support vectors). Support vectors are split into three categories — on the margin, inside the margin, and misclassified — each marked distinctly.

The Gaussian-kernel SVM (extra credit) carving a curved decision boundary around the classes — the nonlinear extension of the linear case, again shown for two C values.

Training time vs sample size at C = 100. The from-scratch QP solver (red) scales far worse than the library's SMO solver (green), which was purpose-built for SVM training — exactly the gap the experiment was meant to expose.

proj3 — Bayes vs Naive Bayes

Implements a Bayes classifier and a Naive Bayes classifier from scratch, using maximum-likelihood estimation of the Gaussian class parameters. The full Bayes model estimates a complete covariance matrix per class; Naive Bayes assumes feature independence (a diagonal covariance). The experiment compares both against the theoretical optimum (Bayes with the true parameters) and shows how each behaves as the amount of training data grows. Data: data/Proj3Train100.xlsx, data/Proj3Train1000.xlsx, data/Proj3Test.xlsx

Test accuracy and error for each classifier. With only 100 training samples, Naive Bayes actually beats full Bayes — the full covariance has many more parameters to estimate, so it overfits when data is scarce. With 1000 samples the full model catches up and both approach the true-parameter ceiling.

proj4 — SVM on C. elegans Images

Applies a Gaussian-kernel SVM (scikit-learn, validated by the from-scratch work in proj2) to a real image-classification task: distinguishing worm images (class 1) from plate defects (class 0) in C. elegans microscopy data. The pipeline reads the raw images, reduces dimensionality with PCA to keep training tractable, grid-searches the kernel scale and box constraint, and saves the trained model. A separate inference script reloads the model and steps through the test images interactively.

C. elegans is a roundworm widely used in biological research; the worms are imaged on plates, but plates sometimes have defects that make the animals hard to track, and this classifier separates clean worm images from defective ones. Data: data/celegans/

Confusion matrix on the held-out test set, ~92.8% accuracy. The errors are roughly balanced between the two classes — no strong bias toward predicting worm or defect.

The inference viewer steps through test images in defect/worm pairs from the terminal, showing each image's true label and the model's prediction.

Data

The data/ folder is committed to this repository — the datasets (including the C. elegans image set) total well under 100 MB, so they live in git alongside the code and every project works immediately after cloning, with nothing to download or arrange by hand.

Expected layout

data/
├── Proj1DataSet.xlsx                 # proj1 (Iris)
├── Proj2DataSet.xlsx                 # proj2
├── Proj3Train100.xlsx                # proj3 (100-sample training set)
├── Proj3Train1000.xlsx               # proj3 (1000-sample training set)
├── Proj3Test.xlsx                    # proj3 (test set)
└── celegans/                         # proj4
    ├── 0/                            # defect  (class 0)
    │   ├── training/
    │   └── test/
    └── 1/                            # worm    (class 1)
        ├── training/
        └── test/

Where each dataset comes from

Proj1DataSet.xlsx — the Iris dataset (the spreadsheet provided with the course materials), arranged as three class-ordered blocks of 50.
Proj2DataSet.xlsx — the two-class point set for the SVM experiments.
Proj3Train100 / Train1000 / Test — the Gaussian-generated training and test sets for the Bayes experiments, each with 5 features and a class column.
C. elegans — extracted from microscopy source images. The extraction scripts are intentionally not included: extraction required manual data preparation afterward, and shipping the scripts without that manual context would cause more confusion than it resolves. The images are already split into training/ and test/ under each class folder.

Because the data is committed, there is nothing to set up here — cloning the repository gives you everything each project needs.

Setup (uv)

This project uses uv for dependency management, so there is no requirements.txt. Dependencies are declared in pyproject.toml and pinned in uv.lock for fully reproducible installs.

1. Install uv (if you don't already have it).

Windows (PowerShell):

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

macOS / Linux:

curl -LsSf https://astral.sh/uv/install.sh | sh

2. Install the dependencies. From the repository root:

uv sync

This reads pyproject.toml and uv.lock, creates a virtual environment (.venv/), and installs the exact locked versions. You do not create or manage the venv by hand.

What the files do:

pyproject.toml — declares the project and its dependencies.
uv.lock — the resolved, locked versions uv sync installs from (commit this; it's what makes installs reproducible).
.venv/ — the environment uv creates. Not committed.

3. Run things with uv run, which uses the project environment automatically:

uv run python proj1-linear-discriminant-functions/main.py

Shared environment: all four projects share one virtual environment at the repository root — there is not a separate venv per project. Run uv sync once and every project is ready.

Editor

These projects were written in VS Code. You don't need it, but if you hit import or interpreter issues, VS Code makes them easy to avoid: open the repo root as the workspace folder and select the .venv interpreter (Ctrl/Cmd+Shift+P → "Python: Select Interpreter" → the .venv in the repo root). Several scripts also open matplotlib windows, so run them in an environment with a display rather than a headless terminal.

Running the tests

Each project has a real pytest suite under its tests/ folder. These are genuine assertion-based tests — they verify behavior and fail when something breaks, not demonstrations that merely print output. They exist to catch regressions: the algorithms here are hand-written and interdependent, so a test that pins down, say, "the SVM dual coefficients satisfy the box constraint" or "the support-vector counts match the report" protects you the day an edit silently breaks one of them.

Run everything at once

From the repository root:

uv run python -m pytest

This discovers and runs every project's tests in one pass. Add detail or brevity as needed:

uv run python -m pytest -v      # verbose: one line per test
uv run python -m pytest -q      # quiet: compact summary

Not using uv? Every command below also works without the uv run prefix — just call python -m pytest directly. The only requirement is that the dependencies are installed in the active Python environment (e.g. you ran uv sync and activated .venv, or installed the packages another way). With uv, uv run handles the environment for you; without it, activate your environment first and drop the prefix:
python -m pytest          # run everything
python -m pytest -v       # verbose
python -m pytest -q       # quiet

Run one project

uv run python -m pytest "proj2-soft-margin-svm"
# without uv:
python -m pytest "proj2-soft-margin-svm"

Run a single test file or test

uv run python -m pytest "proj3-bayes-naive-bayes/tests/test_bayes_classifiers.py"
uv run python -m pytest "proj3-bayes-naive-bayes/tests/test_bayes_classifiers.py::test_accuracy_error_sum_to_100"
# without uv:
python -m pytest "proj3-bayes-naive-bayes/tests/test_bayes_classifiers.py"

A note on the `conftest.py` files

Each project has a conftest.py at its root (next to the source files, not inside tests/). It is required: the test files live in tests/, but the modules they import live one level up in the project root. pytest adds the folder containing the nearest conftest.py to the import path, so this file is what lets a test do from classifiers import least_squares without any package setup or sys.path hacks.

Some of these conftest.py files are more than just a marker — where a project has source in more than one folder (for example proj2, whose extra-credit RBF code lives in a subfolder), the conftest.py adds those folders to the path too, and proj-specific module name prefixes keep the suites from colliding when every project is collected together in a single pytest run. Don't delete these files — without them, the tests in tests/ fail with ModuleNotFoundError.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python Pattern Recognition Class Projects

Repository layout

The projects

proj1 — Linear Discriminant Functions

proj2 — Soft-Margin SVM

proj3 — Bayes vs Naive Bayes

proj4 — SVM on C. elegans Images

Data

Expected layout

Where each dataset comes from

Setup (uv)

Editor

Running the tests

Run everything at once

Run one project

Run a single test file or test

A note on the `conftest.py` files

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
images		images
proj1-linear-discriminant-functions		proj1-linear-discriminant-functions
proj2-soft-margin-svm		proj2-soft-margin-svm
proj3-bayes-naive-bayes		proj3-bayes-naive-bayes
proj4-svm-celegans		proj4-svm-celegans
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Python Pattern Recognition Class Projects

Repository layout

The projects

proj1 — Linear Discriminant Functions

proj2 — Soft-Margin SVM

proj3 — Bayes vs Naive Bayes

proj4 — SVM on C. elegans Images

Data

Expected layout

Where each dataset comes from

Setup (uv)

Editor

Running the tests

Run everything at once

Run one project

Run a single test file or test

A note on the conftest.py files

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

A note on the `conftest.py` files

Packages