Skip to content

clustering feature in napari deeplabcut#38

Draft
sabrinabenas wants to merge 15 commits intoDeepLabCut:mainfrom
DeepLabCutAIResidency:cluster1
Draft

clustering feature in napari deeplabcut#38
sabrinabenas wants to merge 15 commits intoDeepLabCut:mainfrom
DeepLabCutAIResidency:cluster1

Conversation

@sabrinabenas
Copy link
Copy Markdown

@sabrinabenas sabrinabenas commented Aug 31, 2022

New way to detect outlier frames in the Napari Deeplabcut plugin by clustering keypoints.
Recipe: https://deeplabcut.github.io/DeepLabCut/docs/recipes/ClusteringNapari.html

@MMathisLab MMathisLab requested a review from jeylau August 31, 2022 12:07
@sabrinabenas sabrinabenas marked this pull request as draft August 31, 2022 12:15
@jeylau jeylau self-assigned this Sep 6, 2022
@jeylau jeylau added the enhancement New feature or request label Sep 6, 2022
@C-Achard C-Achard added the DLC Related to functionality accessed from within DeepLabCut (e.g. clustering) label Jan 13, 2026
@C-Achard
Copy link
Copy Markdown
Collaborator

@deruyter92 Isn't this accessible from the main DLC package by opening the napari plugin from within ?

@C-Achard C-Achard assigned C-Achard and unassigned jeylau Jan 23, 2026
@C-Achard C-Achard removed the request for review from jeylau January 23, 2026 08:44
@C-Achard C-Achard added the new feature New and improved ! label Jan 23, 2026
This was referenced Jan 23, 2026
@C-Achard C-Achard mentioned this pull request Mar 27, 2026
21 tasks
@C-Achard C-Achard requested a review from Copilot March 27, 2026 18:28
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a clustering-based workflow to help identify and refine outlier frames in the napari DeepLabCut plugin, as described in the linked recipe.

Changes:

  • Introduces a new clustering implementation (kmeans.py) and wiring in the main widget to create a “cluster” points layer and preview frames/keypoints.
  • Adds a new path-parsing helper (find_project_name) used to locate project assets from layer metadata.
  • Updates README with a new suggested workflow for detecting outliers.

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 16 comments.

Show a summary per file
File Description
src/napari_deeplabcut/napari.yaml Adds a new (non-standard) manifest section intended for clustering integration.
src/napari_deeplabcut/misc.py Adds find_project_name helper for DLC project path identification.
src/napari_deeplabcut/kmeans.py New clustering logic (PCA + DBSCAN) for pose/keypoint distance features.
src/napari_deeplabcut/_widgets.py Adds clustering UI/buttons, threading helper, and frame preview via matplotlib canvas.
src/napari_deeplabcut/_reader.py Minor edits around stacking behavior (currently left as commented code).
README.md Documents the new clustering/outlier refinement workflow.
src/napari_deeplabcut/_writer.py Minor formatting/line-number correction only.
src/napari_deeplabcut/init.py Whitespace-only change.
Comments suppressed due to low confidence (1)

src/napari_deeplabcut/_widgets.py:26

  • The import section now contains multiple duplicate/conflicting imports (e.g., defaultdict, partial, numpy, pandas, MethodType, typing imports repeated; and FigureCanvas imported from both backend_qt5agg and backend_qtagg). This increases import time and makes it ambiguous which FigureCanvas is actually used. Please deduplicate and keep a single, consistent set of imports/backends.
import os
from collections import defaultdict
from functools import partial
import numpy as np
import pandas as pd
from matplotlib.backends.backend_qt5agg import FigureCanvasQTAgg as FigureCanvas
from matplotlib.figure import Figure
from types import MethodType
from typing import Optional, Sequence, Union
from napari.layers import Image, Points
from collections import defaultdict, namedtuple
from copy import deepcopy
from datetime import datetime
from functools import partial, cached_property
from math import ceil, log10
import matplotlib.pyplot as plt
import matplotlib.style as mplstyle
import napari
import pandas as pd
from pathlib import Path
from types import MethodType
from typing import Optional, Sequence, Union

from matplotlib.backends.backend_qtagg import FigureCanvas, NavigationToolbar2QT

import numpy as np

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1 to +5
import numpy as np
import pandas as pd
from scipy.spatial.distance import pdist
from sklearn.cluster import DBSCAN
from sklearn.decomposition import PCA
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module introduces hard dependencies on scipy (pdist) and scikit-learn (DBSCAN, PCA), but the project’s declared install_requires doesn’t include them. Without adding these to the package dependencies, the plugin will fail to import in a clean environment. Please add the dependencies (or guard the imports and provide a clear error) and document the requirement.

Copilot uses AI. Check for mistakes.
Comment on lines +4 to +21
from sklearn.cluster import DBSCAN
from sklearn.decomposition import PCA
from napari_deeplabcut._writer import _conv_layer_to_df
from napari_deeplabcut.misc import DLCHeader


def _cluster(data):
pca = PCA(n_components=2)
principalComponents = pca.fit_transform(data)

# putting components in a dataframe for later
PCA_components = pd.DataFrame(principalComponents)

dbscan=DBSCAN(eps=9.7, min_samples=20, algorithm='ball_tree', metric='minkowski', leaf_size=90, p=2)

# fit - perform DBSCAN clustering from features, or distance matrix.
dbscan = dbscan.fit(PCA_components)
cluster1 = dbscan.labels_
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file/module is named kmeans.py but the implementation uses DBSCAN (density-based clustering) rather than k-means. This mismatch is confusing for maintenance and discoverability. Consider renaming the module/functions to reflect DBSCAN (or implement actual k-means if that’s the intended algorithm).

Suggested change
from sklearn.cluster import DBSCAN
from sklearn.decomposition import PCA
from napari_deeplabcut._writer import _conv_layer_to_df
from napari_deeplabcut.misc import DLCHeader
def _cluster(data):
pca = PCA(n_components=2)
principalComponents = pca.fit_transform(data)
# putting components in a dataframe for later
PCA_components = pd.DataFrame(principalComponents)
dbscan=DBSCAN(eps=9.7, min_samples=20, algorithm='ball_tree', metric='minkowski', leaf_size=90, p=2)
# fit - perform DBSCAN clustering from features, or distance matrix.
dbscan = dbscan.fit(PCA_components)
cluster1 = dbscan.labels_
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
from napari_deeplabcut._writer import _conv_layer_to_df
from napari_deeplabcut.misc import DLCHeader
def _cluster(data, n_clusters: int = 8):
pca = PCA(n_components=2)
principalComponents = pca.fit_transform(data)
# putting components in a dataframe for later
PCA_components = pd.DataFrame(principalComponents)
kmeans = KMeans(n_clusters=n_clusters, random_state=0)
# fit - perform k-means clustering from features.
kmeans = kmeans.fit(PCA_components)
cluster1 = kmeans.labels_

Copilot uses AI. Check for mistakes.
Comment on lines +26 to +45
def cluster_data(points_layer):
df = _conv_layer_to_df(
points_layer.data, points_layer.metadata, points_layer.properties
)
try:
df = df.drop('single', axis=1, level='individuals')
except KeyError:
pass
df.dropna(inplace=True)
header = DLCHeader(df.columns)
try:
df = df.stack('individuals').droplevel('individuals')
except KeyError:
pass
df.index = ['/'.join(row) for row in df.index]
xy = df.to_numpy().reshape((-1, len(header.bodyparts), 2))
# TODO Normalize dists by longest length?
dists = np.vstack([pdist(data, "euclidean") for data in xy])
points = np.c_[_cluster(dists)] # x, y, label
return points, list(df.index)
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cluster_data introduces non-trivial data reshaping and clustering logic but currently has no tests. Please add unit tests for expected shapes/labels (including noise label -1 from DBSCAN) using a small synthetic Points-layer-like input, similar to existing pytest coverage in src/napari_deeplabcut/_tests.

Copilot uses AI. Check for mistakes.
Comment on lines +15 to +19
def find_project_name(s):
pat = re.compile('.+-.+-\d{4}-\d{1,2}-\d{1,2}')
for part in Path(s).parts[::-1]:
if pat.search(part):
return part
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New helper find_project_name is used by the clustering workflow but has no test coverage. Given the path parsing/regex sensitivity across OSes, please add unit tests (e.g., POSIX + Windows style paths, and a case where no match is found) alongside existing test_misc.py coverage.

Copilot uses AI. Check for mistakes.
Comment on lines +909 to +920
self.viewer.layers[0].visible = True
self.viewer.layers[1].visible = False
self.viewer.dims.set_current_step(0, self.step)
self.viewer.add_image(self._im.get_array(), name='image refine label')
self.viewer.layers.move_selected(0, 2)

def on_click_close_img(self):
self.viewer.layers.remove('image refine label')
self.viewer.layers.move_selected(0, 1)
self.viewer.layers[0].visible = False
self.viewer.layers[1].visible = True

Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These handlers assume specific layer ordering (layers[0] is the source Points layer and layers[1] is the cluster layer) and use layers.remove('image refine label'), which likely won’t work because LayersList.remove typically expects a Layer instance (not a name). This is brittle if users have other layers open, or if the new image layer isn’t present. Please store explicit references to the created layers (cluster layer + refine image layer) and show/hide/remove them via those references (or by name indexing like del viewer.layers[name]).

Suggested change
self.viewer.layers[0].visible = True
self.viewer.layers[1].visible = False
self.viewer.dims.set_current_step(0, self.step)
self.viewer.add_image(self._im.get_array(), name='image refine label')
self.viewer.layers.move_selected(0, 2)
def on_click_close_img(self):
self.viewer.layers.remove('image refine label')
self.viewer.layers.move_selected(0, 1)
self.viewer.layers[0].visible = False
self.viewer.layers[1].visible = True
# Lazily determine and cache source and cluster layers to avoid relying
# on fixed layer positions in the viewer.
source_layer = getattr(self, "_source_points_layer", None)
cluster_layer = getattr(self, "_cluster_points_layer", None)
if source_layer is None or cluster_layer is None:
layers_list = list(self.viewer.layers)
if len(layers_list) < 2:
# Not enough layers to determine source/cluster; abort safely.
return
source_layer = layers_list[0]
cluster_layer = layers_list[1]
self._source_points_layer = source_layer
self._cluster_points_layer = cluster_layer
source_layer.visible = True
cluster_layer.visible = False
self.viewer.dims.set_current_step(0, self.step)
# Store a reference to the refine image layer so it can be removed safely.
refine_layer = self.viewer.add_image(
self._im.get_array(), name='image refine label'
)
self._refine_image_layer = refine_layer
self.viewer.layers.move_selected(0, 2)
def on_click_close_img(self):
# Safely remove the refine image layer if it exists.
refine_layer = getattr(self, "_refine_image_layer", None)
if refine_layer is not None and refine_layer in self.viewer.layers:
self.viewer.layers.remove(refine_layer)
self._refine_image_layer = None
self.viewer.layers.move_selected(0, 1)
source_layer = getattr(self, "_source_points_layer", None)
cluster_layer = getattr(self, "_cluster_points_layer", None)
if source_layer is not None:
source_layer.visible = False
if cluster_layer is not None:
cluster_layer.visible = True

Copilot uses AI. Check for mistakes.
self.viewer.layers[0].visible = False

self._df = pd.read_hdf(self.viewer.layers[0].source.path)
self._df.index = ['/'.join(row) for row in list(self._df.index)]
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The frame-index normalization uses '/'.join(...) on self._df.index. This is not OS-agnostic (Windows paths will use \\ from os.path.join in read_hdf) and will also behave incorrectly if the index values are already strings (it will join characters). Prefer using the existing to_os_dir_sep() helper (or consistently use os.path.join/Path) and only join when the index value is a tuple/MultiIndex entry.

Suggested change
self._df.index = ['/'.join(row) for row in list(self._df.index)]
# Normalize frame index to OS-appropriate path strings.
self._df.index = [
to_os_dir_sep(os.path.join(*idx)) if isinstance(idx, tuple) else to_os_dir_sep(idx)
for idx in self._df.index
]

Copilot uses AI. Check for mistakes.
Comment on lines +40 to +41
df.index = ['/'.join(row) for row in df.index]
xy = df.to_numpy().reshape((-1, len(header.bodyparts), 2))
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

df.index = ['/'.join(row) for row in df.index] assumes each index entry is an iterable of path parts and forces POSIX separators. If the index entries are already strings (common after read_hdf) this will join characters, and on Windows it won’t match the os.path.join paths used elsewhere. Use to_os_dir_sep() / Path normalization and only join when dealing with tuples/MultiIndex entries.

Copilot uses AI. Check for mistakes.
Comment on lines +1013 to +1014
# FIXME Is the following necessary?
if any(s in str(layer) for s in ('cluster', 'refine')):
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filtering inserted layers via any(s in str(layer) ...) is unreliable: str(layer) isn’t a stable API and may match unrelated layers, causing metadata propagation and store setup to be skipped unexpectedly. If this guard is needed, it should check explicit layer attributes (e.g., layer.name against exact names) or use a dedicated flag on layers created by this widget.

Suggested change
# FIXME Is the following necessary?
if any(s in str(layer) for s in ('cluster', 'refine')):
# Skip auxiliary layers created by this widget (e.g. clustering/refinement results)
layer_name = getattr(layer, "name", "")
if isinstance(layer, Points) and layer_name in ("cluster", "refine"):

Copilot uses AI. Check for mistakes.
Comment on lines 56 to +63
display_name: Keypoint controls
kmeans:
- command: napari-deeplabcut.get_hdf_reader1
accepts_directories: false
filename_patterns: ['*.h5']
- command: napari-deeplabcut.get_folder_parser1
accepts_directories: true
filename_patterns: ['*'] No newline at end of file
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

contributions only supports recognized extension points (e.g., commands, readers, writers, widgets). The new kmeans: section is not a valid napari manifest entry and references napari-deeplabcut.get_hdf_reader1/get_folder_parser1, which are not declared under commands (and don’t exist in the codebase). This will likely make the plugin manifest invalid and prevent the plugin from loading. Please remove this section or wire the feature through existing commands/widgets (and add any new commands under contributions.commands).

Suggested change
display_name: Keypoint controls
kmeans:
- command: napari-deeplabcut.get_hdf_reader1
accepts_directories: false
filename_patterns: ['*.h5']
- command: napari-deeplabcut.get_folder_parser1
accepts_directories: true
filename_patterns: ['*']
display_name: Keypoint controls

Copilot uses AI. Check for mistakes.
Comment on lines +6 to +7
from napari_deeplabcut._writer import _conv_layer_to_df
from napari_deeplabcut.misc import DLCHeader
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_conv_layer_to_df is imported from napari_deeplabcut._writer, but that function doesn’t exist in _writer.py (only _form_df is defined). This import will raise at runtime and break clustering. Either import and use the existing _form_df (wrapping the layer metadata/properties like other code in _widgets.py), or add the missing conversion function to _writer.py.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

DLC Related to functionality accessed from within DeepLabCut (e.g. clustering) enhancement New feature or request new feature New and improved !

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants