Skip to content

Feat/clustering models#17

Closed
MechaCritter wants to merge 5 commits into
mainfrom
feat/clustering-models
Closed

Feat/clustering models#17
MechaCritter wants to merge 5 commits into
mainfrom
feat/clustering-models

Conversation

@MechaCritter

Copy link
Copy Markdown
Owner

moved pr template to under .github/PULL_REQUEST_TEMPLATE

MechaCritter and others added 5 commits June 13, 2026 00:41
Introduce pyvisim/clustering with KMeans, GaussianMixtureModel and PCA,
models that own the underlying scikit-learn estimator and expose the
attributes the encoders need (cluster_centers, weights, means,
covariances, n_components, n_features_in, ...) through typed getters.

The models take the scikit-learn constructor parameters directly and
are created unfitted; this prepares for removing scikit-learn objects
from the encoder constructors.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…earn objects

Breaking change: VLADEncoder and FisherVectorEncoder no longer accept
scikit-learn estimators (kmeans_model/gmm_model/pca) in their
constructors. VLAD always uses K-Means and Fisher Vectors always use a
GMM, so the encoders now build the matching pyvisim.clustering models
themselves from the parameters passed at initialization:
n_clusters/n_components plus the optional kmeans_params/gmm_params and
pca_params dictionaries, whose entries are forwarded verbatim to the
underlying scikit-learn estimators.

- learn() no longer takes n_clusters/kwargs; it fits the models that
  were configured at initialization. A configured PCA is now applied
  (and fitted first if necessary) before fitting the clustering model;
  previously it was silently reset with a warning.
- All scikit-learn attribute access (cluster_centers_, weights_,
  means_, covariances_, n_features_in_, ...) goes through the
  clustering and PCA model getters.
- Dimension validation is skipped for unfitted models and applies once
  the models are fitted.
- The default RootSIFT feature extractor moved into ImageEncoderBase.
- Loading pretrained KMeansWeights/GMMWeights still works; the loaded
  estimators are adopted by the corresponding pyvisim models.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Encoders can now persist their learned state to a versioned .encoder
file (fitted clustering model, PCA model and normalization
hyperparameters) and be restored from it via the load_from_disk
classmethod. The feature extractor and similarity function are not
serialized and are provided again at load time; dimension validation
runs on restore.

This is the designated replacement for loading pretrained models via
the KMeansWeights/GMMWeights enums.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Passing the weights enums to the encoder constructors now emits a
DeprecationWarning; the enums and the loading path will be removed in a
future release in favor of save_to_disk()/load_from_disk() with
.encoder files. The enum docstrings carry the same notice.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Quickstart now configures the encoder from parameters, calls learn()
and shows save_to_disk/load_from_disk with .encoder files. Document the
kmeans_params/gmm_params/pca_params dictionaries in the encoders README
and mark KMeansWeights/GMMWeights loading as deprecated.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@MechaCritter

Copy link
Copy Markdown
Owner Author

oops, wrong PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant