clustering feature in napari deeplabcut by sabrinabenas · Pull Request #38 · DeepLabCut/napari-deeplabcut

sabrinabenas · 2022-08-31T12:02:16Z

New way to detect outlier frames in the Napari Deeplabcut plugin by clustering keypoints.
Recipe: https://deeplabcut.github.io/DeepLabCut/docs/recipes/ClusteringNapari.html

C-Achard · 2026-01-13T14:10:08Z

@deruyter92 Isn't this accessible from the main DLC package by opening the napari plugin from within ?

Copilot

Pull request overview

Adds a clustering-based workflow to help identify and refine outlier frames in the napari DeepLabCut plugin, as described in the linked recipe.

Changes:

Introduces a new clustering implementation (kmeans.py) and wiring in the main widget to create a “cluster” points layer and preview frames/keypoints.
Adds a new path-parsing helper (find_project_name) used to locate project assets from layer metadata.
Updates README with a new suggested workflow for detecting outliers.

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 16 comments.

Show a summary per file

File	Description
src/napari_deeplabcut/napari.yaml	Adds a new (non-standard) manifest section intended for clustering integration.
src/napari_deeplabcut/misc.py	Adds `find_project_name` helper for DLC project path identification.
src/napari_deeplabcut/kmeans.py	New clustering logic (PCA + DBSCAN) for pose/keypoint distance features.
src/napari_deeplabcut/_widgets.py	Adds clustering UI/buttons, threading helper, and frame preview via matplotlib canvas.
src/napari_deeplabcut/_reader.py	Minor edits around stacking behavior (currently left as commented code).
README.md	Documents the new clustering/outlier refinement workflow.
src/napari_deeplabcut/_writer.py	Minor formatting/line-number correction only.
src/napari_deeplabcut/init.py	Whitespace-only change.

Comments suppressed due to low confidence (1)

src/napari_deeplabcut/_widgets.py:26

The import section now contains multiple duplicate/conflicting imports (e.g., defaultdict, partial, numpy, pandas, MethodType, typing imports repeated; and FigureCanvas imported from both backend_qt5agg and backend_qtagg). This increases import time and makes it ambiguous which FigureCanvas is actually used. Please deduplicate and keep a single, consistent set of imports/backends.

import os
from collections import defaultdict
from functools import partial
import numpy as np
import pandas as pd
from matplotlib.backends.backend_qt5agg import FigureCanvasQTAgg as FigureCanvas
from matplotlib.figure import Figure
from types import MethodType
from typing import Optional, Sequence, Union
from napari.layers import Image, Points
from collections import defaultdict, namedtuple
from copy import deepcopy
from datetime import datetime
from functools import partial, cached_property
from math import ceil, log10
import matplotlib.pyplot as plt
import matplotlib.style as mplstyle
import napari
import pandas as pd
from pathlib import Path
from types import MethodType
from typing import Optional, Sequence, Union

from matplotlib.backends.backend_qtagg import FigureCanvas, NavigationToolbar2QT

import numpy as np

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-27T18:35:29Z

+import numpy as np
+import pandas as pd
+from scipy.spatial.distance import pdist
+from sklearn.cluster import DBSCAN
+from sklearn.decomposition import PCA


This module introduces hard dependencies on scipy (pdist) and scikit-learn (DBSCAN, PCA), but the project’s declared install_requires doesn’t include them. Without adding these to the package dependencies, the plugin will fail to import in a clean environment. Please add the dependencies (or guard the imports and provide a clear error) and document the requirement.

Copilot · 2026-03-27T18:35:29Z

+from sklearn.cluster import DBSCAN
+from sklearn.decomposition import PCA
+from napari_deeplabcut._writer import _conv_layer_to_df
+from napari_deeplabcut.misc import DLCHeader
+
+
+def _cluster(data):
+    pca = PCA(n_components=2)
+    principalComponents = pca.fit_transform(data)
+
+    # putting components in a dataframe for later
+    PCA_components = pd.DataFrame(principalComponents)
+
+    dbscan=DBSCAN(eps=9.7, min_samples=20, algorithm='ball_tree', metric='minkowski', leaf_size=90, p=2)
+
+    # fit - perform DBSCAN clustering from features, or distance matrix.
+    dbscan = dbscan.fit(PCA_components)
+    cluster1 = dbscan.labels_


The file/module is named kmeans.py but the implementation uses DBSCAN (density-based clustering) rather than k-means. This mismatch is confusing for maintenance and discoverability. Consider renaming the module/functions to reflect DBSCAN (or implement actual k-means if that’s the intended algorithm).

Suggested change

from sklearn.cluster import DBSCAN

from sklearn.decomposition import PCA

from napari_deeplabcut._writer import _conv_layer_to_df

from napari_deeplabcut.misc import DLCHeader

def _cluster(data):

pca = PCA(n_components=2)

principalComponents = pca.fit_transform(data)

# putting components in a dataframe for later

PCA_components = pd.DataFrame(principalComponents)

dbscan=DBSCAN(eps=9.7, min_samples=20, algorithm='ball_tree', metric='minkowski', leaf_size=90, p=2)

# fit - perform DBSCAN clustering from features, or distance matrix.

dbscan = dbscan.fit(PCA_components)

cluster1 = dbscan.labels_

from sklearn.cluster import KMeans

from sklearn.decomposition import PCA

from napari_deeplabcut._writer import _conv_layer_to_df

from napari_deeplabcut.misc import DLCHeader

def _cluster(data, n_clusters: int = 8):

pca = PCA(n_components=2)

principalComponents = pca.fit_transform(data)

# putting components in a dataframe for later

PCA_components = pd.DataFrame(principalComponents)

kmeans = KMeans(n_clusters=n_clusters, random_state=0)

# fit - perform k-means clustering from features.

kmeans = kmeans.fit(PCA_components)

cluster1 = kmeans.labels_

Copilot · 2026-03-27T18:35:30Z

+def cluster_data(points_layer):
+    df = _conv_layer_to_df(
+        points_layer.data, points_layer.metadata, points_layer.properties
+    )
+    try:
+        df = df.drop('single', axis=1, level='individuals')
+    except KeyError:
+        pass
+    df.dropna(inplace=True)
+    header = DLCHeader(df.columns)
+    try:
+        df = df.stack('individuals').droplevel('individuals')
+    except KeyError:
+        pass
+    df.index = ['/'.join(row) for row in df.index]
+    xy = df.to_numpy().reshape((-1, len(header.bodyparts), 2))
+    # TODO Normalize dists by longest length?
+    dists = np.vstack([pdist(data, "euclidean") for data in xy])
+    points = np.c_[_cluster(dists)]  # x, y, label
+    return points, list(df.index)


cluster_data introduces non-trivial data reshaping and clustering logic but currently has no tests. Please add unit tests for expected shapes/labels (including noise label -1 from DBSCAN) using a small synthetic Points-layer-like input, similar to existing pytest coverage in src/napari_deeplabcut/_tests.

Copilot · 2026-03-27T18:35:30Z

+def find_project_name(s):
+    pat = re.compile('.+-.+-\d{4}-\d{1,2}-\d{1,2}')
+    for part in Path(s).parts[::-1]:
+        if pat.search(part):
+            return part


New helper find_project_name is used by the clustering workflow but has no test coverage. Given the path parsing/regex sensitivity across OSes, please add unit tests (e.g., POSIX + Windows style paths, and a case where no match is found) alongside existing test_misc.py coverage.

Copilot · 2026-03-27T18:35:30Z

+        self.viewer.layers[0].visible = True
+        self.viewer.layers[1].visible = False
+        self.viewer.dims.set_current_step(0, self.step)
+        self.viewer.add_image(self._im.get_array(), name='image refine label')
+        self.viewer.layers.move_selected(0, 2)
+
+    def on_click_close_img(self):
+        self.viewer.layers.remove('image refine label')
+        self.viewer.layers.move_selected(0, 1)
+        self.viewer.layers[0].visible = False
+        self.viewer.layers[1].visible = True
+


These handlers assume specific layer ordering (layers[0] is the source Points layer and layers[1] is the cluster layer) and use layers.remove('image refine label'), which likely won’t work because LayersList.remove typically expects a Layer instance (not a name). This is brittle if users have other layers open, or if the new image layer isn’t present. Please store explicit references to the created layers (cluster layer + refine image layer) and show/hide/remove them via those references (or by name indexing like del viewer.layers[name]).

Suggested change

self.viewer.layers[0].visible = True

self.viewer.layers[1].visible = False

self.viewer.dims.set_current_step(0, self.step)

self.viewer.add_image(self._im.get_array(), name='image refine label')

self.viewer.layers.move_selected(0, 2)

def on_click_close_img(self):

self.viewer.layers.remove('image refine label')

self.viewer.layers.move_selected(0, 1)

self.viewer.layers[0].visible = False

self.viewer.layers[1].visible = True

# Lazily determine and cache source and cluster layers to avoid relying

# on fixed layer positions in the viewer.

source_layer = getattr(self, "_source_points_layer", None)

cluster_layer = getattr(self, "_cluster_points_layer", None)

if source_layer is None or cluster_layer is None:

layers_list = list(self.viewer.layers)

if len(layers_list) < 2:

# Not enough layers to determine source/cluster; abort safely.

return

source_layer = layers_list[0]

cluster_layer = layers_list[1]

self._source_points_layer = source_layer

self._cluster_points_layer = cluster_layer

source_layer.visible = True

cluster_layer.visible = False

self.viewer.dims.set_current_step(0, self.step)

# Store a reference to the refine image layer so it can be removed safely.

refine_layer = self.viewer.add_image(

self._im.get_array(), name='image refine label'

)

self._refine_image_layer = refine_layer

self.viewer.layers.move_selected(0, 2)

def on_click_close_img(self):

# Safely remove the refine image layer if it exists.

refine_layer = getattr(self, "_refine_image_layer", None)

if refine_layer is not None and refine_layer in self.viewer.layers:

self.viewer.layers.remove(refine_layer)

self._refine_image_layer = None

self.viewer.layers.move_selected(0, 1)

source_layer = getattr(self, "_source_points_layer", None)

cluster_layer = getattr(self, "_cluster_points_layer", None)

if source_layer is not None:

source_layer.visible = False

if cluster_layer is not None:

cluster_layer.visible = True

Copilot · 2026-03-27T18:35:32Z

+        self.viewer.layers[0].visible = False
+
+        self._df = pd.read_hdf(self.viewer.layers[0].source.path)
+        self._df.index = ['/'.join(row) for row in list(self._df.index)]


The frame-index normalization uses '/'.join(...) on self._df.index. This is not OS-agnostic (Windows paths will use \\ from os.path.join in read_hdf) and will also behave incorrectly if the index values are already strings (it will join characters). Prefer using the existing to_os_dir_sep() helper (or consistently use os.path.join/Path) and only join when the index value is a tuple/MultiIndex entry.

Suggested change

self._df.index = ['/'.join(row) for row in list(self._df.index)]

# Normalize frame index to OS-appropriate path strings.

self._df.index = [

to_os_dir_sep(os.path.join(*idx)) if isinstance(idx, tuple) else to_os_dir_sep(idx)

for idx in self._df.index

]

Copilot · 2026-03-27T18:35:32Z

+    df.index = ['/'.join(row) for row in df.index]
+    xy = df.to_numpy().reshape((-1, len(header.bodyparts), 2))


df.index = ['/'.join(row) for row in df.index] assumes each index entry is an iterable of path parts and forces POSIX separators. If the index entries are already strings (common after read_hdf) this will join characters, and on Windows it won’t match the os.path.join paths used elsewhere. Use to_os_dir_sep() / Path normalization and only join when dealing with tuples/MultiIndex entries.

Copilot · 2026-03-27T18:35:32Z

+        # FIXME Is the following necessary?
+        if any(s in str(layer) for s in ('cluster', 'refine')):


Filtering inserted layers via any(s in str(layer) ...) is unreliable: str(layer) isn’t a stable API and may match unrelated layers, causing metadata propagation and store setup to be skipped unexpectedly. If this guard is needed, it should check explicit layer attributes (e.g., layer.name against exact names) or use a dedicated flag on layers created by this widget.

Suggested change

# FIXME Is the following necessary?

if any(s in str(layer) for s in ('cluster', 'refine')):

# Skip auxiliary layers created by this widget (e.g. clustering/refinement results)

layer_name = getattr(layer, "name", "")

if isinstance(layer, Points) and layer_name in ("cluster", "refine"):

Copilot · 2026-03-27T18:35:33Z

      display_name: Keypoint controls
+  kmeans:
+    - command: napari-deeplabcut.get_hdf_reader1
+      accepts_directories: false
+      filename_patterns: ['*.h5']
+    - command: napari-deeplabcut.get_folder_parser1
+      accepts_directories: true
+      filename_patterns: ['*']


contributions only supports recognized extension points (e.g., commands, readers, writers, widgets). The new kmeans: section is not a valid napari manifest entry and references napari-deeplabcut.get_hdf_reader1/get_folder_parser1, which are not declared under commands (and don’t exist in the codebase). This will likely make the plugin manifest invalid and prevent the plugin from loading. Please remove this section or wire the feature through existing commands/widgets (and add any new commands under contributions.commands).

Suggested change

display_name: Keypoint controls

kmeans:

- command: napari-deeplabcut.get_hdf_reader1

accepts_directories: false

filename_patterns: ['*.h5']

- command: napari-deeplabcut.get_folder_parser1

accepts_directories: true

filename_patterns: ['*']

display_name: Keypoint controls

Copilot · 2026-03-27T18:35:33Z

+from napari_deeplabcut._writer import _conv_layer_to_df
+from napari_deeplabcut.misc import DLCHeader


_conv_layer_to_df is imported from napari_deeplabcut._writer, but that function doesn’t exist in _writer.py (only _form_df is defined). This import will raise at runtime and break clustering. Either import and use the existing _form_df (wrapping the layer metadata/properties like other code in _widgets.py), or add the missing conversion function to _writer.py.

C-Achard · 2026-05-04T13:44:27Z

Note : this is not compatible/written for v0.3.0.0 and greater, and will require some refactoring before being merged.

If there is interest please let us know and we can add it in the next release!

sabrinabenas added 9 commits August 5, 2022 20:13

add cluster

f645696

add changes

f7e66ae

add changes

5a57d40

add multiIndex

1d1deaa

change name

8758bf7

change name

56e10a4

add pseudofix offset

7beaa0a

add minor fix

4b733eb

add new workflow

27141f9

MMathisLab requested a review from jeylau August 31, 2022 12:07

sabrinabenas marked this pull request as draft August 31, 2022 12:15

delete test

fdcd392

jeylau self-assigned this Sep 6, 2022

jeylau added the enhancement New feature or request label Sep 6, 2022

jeylau added 5 commits September 6, 2022 14:49

Delete data file

1d06e2c

Add utility function to parse project name

9dc8583

Clean up and bug fixes

2588bba

Fix clustering of maDLC data

a700577

Merge branch 'main' into pr/sabrinabenas/38

0c7b383

C-Achard added the DLC Related to functionality accessed from within DeepLabCut (e.g. clustering) label Jan 13, 2026

C-Achard assigned C-Achard and unassigned jeylau Jan 23, 2026

C-Achard removed the request for review from jeylau January 23, 2026 08:44

C-Achard added the new feature New and improved ! label Jan 23, 2026

This was referenced Jan 23, 2026

clustering in napari-DLC GUI #43

Open

DLC - clustering #66

Closed

C-Achard mentioned this pull request Mar 27, 2026

Plugin refactor : full version #175

Merged

26 tasks

C-Achard requested a review from Copilot March 27, 2026 18:28

Copilot started reviewing on behalf of C-Achard March 27, 2026 18:28 View session

Copilot AI reviewed Mar 27, 2026

View reviewed changes

C-Achard mentioned this pull request Apr 9, 2026

Refactor debrief: known issues and potential improvements #184

Open

-        self.viewer.layers[0].visible = True
-        self.viewer.layers[1].visible = False
-        self.viewer.dims.set_current_step(0, self.step)
-        self.viewer.add_image(self._im.get_array(), name='image refine label')
-        self.viewer.layers.move_selected(0, 2)
-    def on_click_close_img(self):
-        self.viewer.layers.remove('image refine label')
-        self.viewer.layers.move_selected(0, 1)
-        self.viewer.layers[0].visible = False
-        self.viewer.layers[1].visible = True
+        # Lazily determine and cache source and cluster layers to avoid relying
+        # on fixed layer positions in the viewer.
+        source_layer = getattr(self, "_source_points_layer", None)
+        cluster_layer = getattr(self, "_cluster_points_layer", None)
+        if source_layer is None or cluster_layer is None:
+            layers_list = list(self.viewer.layers)
+            if len(layers_list) < 2:
+                # Not enough layers to determine source/cluster; abort safely.
+                return
+            source_layer = layers_list[0]
+            cluster_layer = layers_list[1]
+            self._source_points_layer = source_layer
+            self._cluster_points_layer = cluster_layer
+        source_layer.visible = True
+        cluster_layer.visible = False
+        self.viewer.dims.set_current_step(0, self.step)
+        # Store a reference to the refine image layer so it can be removed safely.
+        refine_layer = self.viewer.add_image(
+            self._im.get_array(), name='image refine label'
+        )
+        self._refine_image_layer = refine_layer
+        self.viewer.layers.move_selected(0, 2)
+    def on_click_close_img(self):
+        # Safely remove the refine image layer if it exists.
+        refine_layer = getattr(self, "_refine_image_layer", None)
+        if refine_layer is not None and refine_layer in self.viewer.layers:
+            self.viewer.layers.remove(refine_layer)
+            self._refine_image_layer = None
+        self.viewer.layers.move_selected(0, 1)
+        source_layer = getattr(self, "_source_points_layer", None)
+        cluster_layer = getattr(self, "_cluster_points_layer", None)
+        if source_layer is not None:
+            source_layer.visible = False
+        if cluster_layer is not None:
+            cluster_layer.visible = True

-        self._df.index = ['/'.join(row) for row in list(self._df.index)]
+        # Normalize frame index to OS-appropriate path strings.
+        self._df.index = [
+            to_os_dir_sep(os.path.join(*idx)) if isinstance(idx, tuple) else to_os_dir_sep(idx)
+            for idx in self._df.index
+        ]

		df.index = ['/'.join(row) for row in df.index]
		xy = df.to_numpy().reshape((-1, len(header.bodyparts), 2))

		# FIXME Is the following necessary?
		if any(s in str(layer) for s in ('cluster', 'refine')):

-        # FIXME Is the following necessary?
-        if any(s in str(layer) for s in ('cluster', 'refine')):
+        # Skip auxiliary layers created by this widget (e.g. clustering/refinement results)
+        layer_name = getattr(layer, "name", "")
+        if isinstance(layer, Points) and layer_name in ("cluster", "refine"):

		from napari_deeplabcut._writer import _conv_layer_to_df
		from napari_deeplabcut.misc import DLCHeader

Uh oh!

Conversation

sabrinabenas commented Aug 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

C-Achard commented Jan 13, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

C-Achard commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sabrinabenas commented Aug 31, 2022 •

edited

Loading