Skip to content

Commit 053d2f5

Browse files
authored
updated docs of the new clustering models APIs (#21)
1 parent a3e6126 commit 053d2f5

7 files changed

Lines changed: 175 additions & 20 deletions

File tree

docs/clustering/README.md

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# Clustering models
2+
3+
The `pyvisim.clustering` module holds the small models the encoders use to build
4+
their visual vocabulary and to reduce dimensionality. Each one owns an underlying
5+
scikit-learn estimator and exposes just the attributes the encoders need through
6+
typed getters, so the encoders never touch scikit-learn's `*_` fitted attributes
7+
directly.
8+
9+
| Object | File | Backed by | Used by |
10+
|--------|------|-----------|---------|
11+
| `KMeans` | [`kmeans.py`](../../pyvisim/clustering/kmeans.py) | `sklearn.cluster.KMeans` | `VLADEncoder` |
12+
| `GaussianMixtureModel` | [`gmm.py`](../../pyvisim/clustering/gmm.py) | `sklearn.mixture.GaussianMixture` | `FisherVectorEncoder` |
13+
| `PCA` | [`pca.py`](../../pyvisim/clustering/pca.py) | `sklearn.decomposition.PCA` | both encoders (optional) |
14+
15+
`KMeans` and `GaussianMixtureModel` are clustering models and share
16+
[`ClusteringModelBase`](../../pyvisim/clustering/_base_clustering.py). `PCA` is not a
17+
clustering model, so it sits directly on the shared `_SklearnModelBase` instead.
18+
19+
## How they work
20+
21+
You create a model unfitted, passing the scikit-learn constructor parameters straight
22+
through:
23+
24+
```python
25+
from pyvisim.clustering import KMeans, GaussianMixtureModel, PCA
26+
27+
kmeans = KMeans(n_clusters=256, random_state=0)
28+
gmm = GaussianMixtureModel(n_components=256, random_state=0)
29+
pca = PCA(n_components=64, whiten=True)
30+
```
31+
32+
`n_clusters` / `n_components` are explicit; everything else is forwarded verbatim to
33+
the wrapped estimator. Call `fit(features)` to train, and check `is_fitted` at any
34+
time. The fitted-only getters (`cluster_centers`, `weights`, `means`, `covariances`,
35+
`n_features_in`, ...) raise `NotFittedError` if you read them before fitting, so you
36+
get a clear error instead of an `AttributeError` from scikit-learn.
37+
38+
In normal use you don't build these yourself: the encoders create the matching model
39+
for you from the parameters passed to their constructors (see
40+
[encoders/base_encoder.md](../encoders/base_encoder.md)).
41+
42+
## What each model exposes
43+
44+
**`KMeans`**
45+
- `n_clusters` — number of clusters.
46+
- `cluster_centers``(n_clusters, n_features)` centroid coordinates.
47+
- `predict(features)` — nearest cluster index per row. This is the hard assignment
48+
VLAD uses.
49+
50+
**`GaussianMixtureModel`**
51+
- `n_clusters` — number of mixture components (the `ClusteringModelBase` name for it).
52+
- `weights`, `means`, `covariances` — the GMM parameters the Fisher Vector gradients
53+
are computed from.
54+
- `predict_proba(features)` — posterior probability per component, i.e. the soft
55+
assignment Fisher Vectors use.
56+
- Diagonal covariance only. Asking for any other `covariance_type` raises `ValueError`
57+
up front, because the Fisher Vector math assumes diagonal covariances (and training
58+
is much faster that way).
59+
60+
**`PCA`**
61+
- `n_components` — number of components of the fitted PCA.
62+
- `transform(features)` — projects features onto the principal components.
63+
64+
## Adopting a pretrained scikit-learn estimator
65+
66+
There's an internal `_from_sklearn` classmethod used to wrap an already-fitted
67+
estimator loaded from a legacy `KMeansWeights` / `GMMWeights` pickle. It type-checks
68+
the estimator (and, for the GMM, re-validates the diagonal covariance) before adopting
69+
it. You won't call this directly; it backs the deprecated weight-loading path described
70+
in [encoders/weights.md](../encoders/weights.md).

docs/encoders/README.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,11 @@ vectors are used for retrieval, clustering, and classification.
1212

1313
where `K` is the number of clusters and `D` is the local descriptor dimension.
1414

15-
Shared machinery lives in [`ImageEncoderBase`](base_encoder.md), and pretrained
16-
clustering/PCA models are exposed through the enums documented in
17-
[weights.md](weights.md).
15+
Shared machinery lives in [`ImageEncoderBase`](base_encoder.md). Each encoder builds its
16+
aggregation model from the [`pyvisim.clustering`](../clustering/README.md) package using
17+
the parameters you pass at construction, then fits it in `learn`. Trained encoders are
18+
saved and restored with `save_to_disk` / `load_from_disk`; the older pretrained-weight
19+
enums in [weights.md](weights.md) still work but are deprecated.
1820

1921
## VLAD vs Fisher Vector
2022

docs/encoders/base_encoder.md

Lines changed: 32 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -12,19 +12,42 @@ A concrete encoder is the combination of:
1212

1313
1. a **feature extractor** (`FeatureExtractorBase`),
1414
2. an optional **PCA** model,
15-
3. a **clustering model** (`KMeans` for VLAD, `GaussianMixture` for Fisher),
15+
3. a **clustering model** (`KMeans` for VLAD, `GaussianMixtureModel` for Fisher; both
16+
from [`pyvisim.clustering`](../clustering/README.md)),
1617
4. a **similarity function**.
1718

1819
The base class wires these together, validates their dimensions, and provides
19-
`learn`, `encode` (abstract), `generate_encoding_map`, and `similarity_score`.
20+
`learn`, `save_to_disk`/`load_from_disk`, `encode` (abstract), `generate_encoding_map`,
21+
and `similarity_score`.
2022

2123
## Constructing an encoder
2224

23-
Two mutually exclusive ways to supply a clustering model:
25+
The encoder classes are constructed like this:
2426

25-
- Pass `weights=` (a `KMeansWeights` / `GMMWeights` enum member). The base class loads
26-
the pickled model, and if the weight name contains `PCA` it also loads the matching
27-
PCA model automatically. When `weights` is given, the `clustering_model` and `pca`
28-
arguments are ignored.
29-
- Pass an explicit `clustering_model=` (and optionally `pca=`) that you trained
30-
yourself or loaded elsewhere.
27+
- `VLADEncoder` takes `n_clusters` plus an optional `kmeans_params` dict.
28+
- `FisherVectorEncoder` takes `n_components` plus an optional `gmm_params` dict.
29+
- Both take an optional `pca_params` dict (must include `n_components`) to add a PCA
30+
step. Leave it out and no PCA is applied.
31+
32+
Everything in `kmeans_params` / `gmm_params` / `pca_params` is forwarded verbatim to the
33+
underlying scikit-learn models (see scikit-learn for `KMeans` and `GaussianMixture` documentation). See
34+
[vlad.md](vlad.md) and [fisher_vector.md](fisher_vector.md) for the per-encoder details.
35+
36+
## Training and persistence
37+
38+
The models start unfitted, so you have to train before encoding:
39+
40+
- `learn(images)` extracts features from the images, fits the configured PCA first (if
41+
any), then fits the clustering model. Dimension checks against the feature extractor
42+
and PCA are deferred until the models are actually fitted.
43+
- `save_to_disk(path)` writes the fitted clustering model, the PCA model, and the
44+
normalization hyperparameters to a versioned `.encoder` file (the `.encoder` suffix is
45+
added if you leave it off). It raises `NotFittedError` if you haven't called `learn`
46+
yet.
47+
- `load_from_disk(path)` rebuilds the encoder from that file. The feature extractor and
48+
similarity function aren't serialized, so you pass them again here (the feature
49+
extractor defaults to `RootSIFT`); its output dimension has to match the saved PCA or
50+
clustering model.
51+
52+
This save/load round-trip is the supported way to reuse a trained encoder. The old
53+
`weights=` enum path still works but is deprecated, see [weights.md](weights.md).

docs/encoders/fisher_vector.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,29 @@ The Fisher Vector encodes an image into a vector of shape `(2 * K * D + K,)`, wh
77
optional PCA). The `2 * K * D` term comes from the mean and variance gradients, and
88
the `+ K` term from the mixture-weight gradients.
99

10+
## Constructing one
11+
12+
Fisher Vectors always cluster with a Gaussian Mixture Model, configured through the
13+
encoder:
14+
15+
```python
16+
from pyvisim.encoders import FisherVectorEncoder
17+
18+
fisher = FisherVectorEncoder(
19+
n_components=256, # number of mixture components
20+
gmm_params={"random_state": 0}, # forwarded to sklearn.mixture.GaussianMixture
21+
pca_params={"n_components": 64}, # optional; omit for no PCA
22+
)
23+
fisher.learn(images) # fits the PCA (if any) then the GMM
24+
```
25+
26+
`n_components` is passed directly, not inside `gmm_params` (doing both raises a
27+
`ValueError`). The GMM uses diagonal covariances; passing any other `covariance_type`
28+
in `gmm_params` raises a `ValueError`, since the Fisher Vector math assumes diagonal
29+
covariances. Save a fitted encoder with `fisher.save_to_disk("fisher")` and reload it
30+
with `FisherVectorEncoder.load_from_disk("fisher.encoder")`, see
31+
[base_encoder.md](base_encoder.md).
32+
1033
## How `encode` works
1134

1235
For each image:

docs/encoders/vlad.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,27 @@ VLAD (Vector of Locally Aggregated Descriptors) encodes an image into a vector o
66
shape `(K * D,)`, where `K` is the number of KMeans clusters and `D` is the local
77
descriptor dimension (after optional PCA).
88

9+
## Constructing one
10+
11+
VLAD always clusters with K-Means, so you configure that model through the encoder:
12+
13+
```python
14+
from pyvisim.encoders import VLADEncoder
15+
16+
vlad = VLADEncoder(
17+
n_clusters=256, # number of visual words
18+
kmeans_params={"random_state": 0}, # forwarded to sklearn.cluster.KMeans
19+
pca_params={"n_components": 64}, # optional; omit for no PCA
20+
)
21+
vlad.learn(images) # fits the PCA (if any) then K-Means
22+
```
23+
24+
`n_clusters` is passed directly, not inside `kmeans_params` (doing both raises a
25+
`ValueError`). Everything else in `kmeans_params` / `pca_params` is handed straight to
26+
the matching scikit-learn estimator. Once fitted, save with `vlad.save_to_disk("vlad")`
27+
and reload with `VLADEncoder.load_from_disk("vlad.encoder")`, see
28+
[base_encoder.md](base_encoder.md).
29+
930
## How `encode` works
1031

1132
For each image:

docs/encoders/weights.md

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,12 @@
22

33
File: [`_base_encoder.py`](../../pyvisim/encoders/_base_encoder.py)
44

5+
> ⚠️ **Deprecated.** Loading pretrained models through the `KMeansWeights` / `GMMWeights`
6+
> enums now emits a `DeprecationWarning` and will be removed in a future release. Train
7+
> an encoder with `learn()` and persist it with `save_to_disk()` / `load_from_disk()`
8+
> instead, see [base_encoder.md](base_encoder.md). The rest of this page documents the
9+
> legacy path while it's still around.
10+
511
Pretrained clustering models are exposed as enums so users can select a model by name
612
instead of handling file paths. Each enum member's value is a path to a pickled
713
scikit-learn model under `pyvisim/res/model_files/`, and the shared `_PretrainedModels`
@@ -21,12 +27,18 @@ each with and without PCA, for example `OXFORD102_K256_ROOTSIFT_PCA`.
2127

2228
A weight name containing `PCA` requires the matching PCA model so descriptors are
2329
reduced before clustering. `_CLUSTERING_TO_PCA_MAPPING` maps each `_PCA` variant to its
24-
clustering weight, and `ImageEncoderBase.__init__` loads the PCA automatically when a
30+
clustering weight, and `_load_pretrained_weights` loads the PCA automatically when a
2531
`PCA` weight is selected. This is why you never reference `_PCA` directly: choosing a
2632
`*_PCA` clustering weight pulls in the correct PCA for you.
2733

28-
## Adding your own weights
34+
The pickled scikit-learn estimators aren't used raw. Each one is adopted into the
35+
matching [`pyvisim.clustering`](../clustering/README.md) model via its internal
36+
`_from_sklearn` classmethod, which type-checks the estimator (and re-validates the
37+
diagonal covariance for the GMM) before wrapping it.
38+
39+
## Using your own trained model
2940

30-
To use a model you trained yourself, skip the enums and pass the fitted model directly
31-
via the encoder's `kmeans_model` / `gmm_model` (and optional `pca`) arguments. See
32-
[base_encoder.md](base_encoder.md).
41+
Don't reach for the enums for this. Configure the encoder from parameters, call
42+
`learn()` on your images, and save the result with `save_to_disk()`; reload it later
43+
with `load_from_disk()`. That's the supported replacement for the weight enums, and it
44+
round-trips the PCA and normalization settings too. See [base_encoder.md](base_encoder.md).

docs/overview.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ pyvisim/
1414
├── _errors.py Custom exceptions
1515
├── eval.py Retrieval metrics (top-k, mAP, accuracy)
1616
├── encoders/ VLAD, Fisher Vector, Pipeline, pretrained weights
17+
├── clustering/ KMeans, GaussianMixtureModel, PCA
1718
├── features/ SIFT, RootSIFT, DeepConvFeature, Lambda
1819
├── datasets/ OxfordFlowerDataset
1920
└── neural_networks/ Siamese network (planned, not yet implemented)
@@ -22,6 +23,8 @@ pyvisim/
2223
Per-area docs:
2324

2425
- [Encoders](encoders/): how images become vectors.
26+
- [Clustering](clustering/): the KMeans, GMM, and PCA models the encoders build their
27+
vocabulary with.
2528
- [Features](features/): how local descriptors are extracted from an image.
2629
- [Neural networks](neural_networks/): planned Siamese network.
2730
- [Dataset](dataset/): the bundled Oxford Flowers dataset class.
@@ -66,9 +69,10 @@ Everything is built on the two abstract base classes in
6669
and `(M, D)` arrays and returning an `(N, M)` matrix can be used. On assignment it
6770
is probed with dummy input; if it does not return the expected shape,
6871
fall back to a row-by-row loop. Default is cosine similarity.
69-
- **Pretrained weights are enums.** `KMeansWeights` and `GMMWeights` are enums whose
70-
values are file paths to pickled scikit-learn models. PCA models are paired to the
71-
PCA weight variants automatically. See [encoders/weights.md](encoders/weights.md).
72+
- **Trained encoders persist to `.encoder` files.** `save_to_disk` / `load_from_disk`
73+
serialize the fitted clustering model, PCA, and normalization settings. This replaces
74+
the deprecated `KMeansWeights` / `GMMWeights` enum loading path, which still works for
75+
now but warns. See [encoders/weights.md](encoders/weights.md).
7276

7377
## Evaluation
7478

0 commit comments

Comments
 (0)