Skip to content

Face Quality and Adaptive Density-Aware Face Clustering#1290

Merged
rahulharpal1603 merged 7 commits into
AOSSIE-Org:mainfrom
rohan-pandeyy:feat/adaptive-density-aware-face-clustering
Jun 7, 2026
Merged

Face Quality and Adaptive Density-Aware Face Clustering#1290
rahulharpal1603 merged 7 commits into
AOSSIE-Org:mainfrom
rohan-pandeyy:feat/adaptive-density-aware-face-clustering

Conversation

@rohan-pandeyy

@rohan-pandeyy rohan-pandeyy commented May 25, 2026

Copy link
Copy Markdown
Member

Fixes #1271

Summary

This pull request introduces a comprehensive quality gate for face detection and clustering, making the clustering process more robust by filtering out low-quality faces based on configurable thresholds. It also tracks and reports the number of faces skipped due to these quality checks throughout the pipeline, from detection to clustering and API responses. Additionally, the clustering logic now adapts the DBSCAN eps parameter based on the data distribution for improved clustering accuracy.

Key improvements and features:

Face Quality Filtering

  • Added a face_passes_quality_gate utility in face_quality.py to filter detected faces based on confidence, size, and blur, using configurable thresholds (PICTO_CLUSTERING_CONF_THRESHOLD, PICTO_CLUSTERING_BLUR_THRESHOLD, PICTO_CLUSTERING_MIN_FACE_SIZE). This is now integrated into the face detection pipeline, ensuring only high-quality faces are processed. [1] [2] [3] [4]

Configurability

  • Introduced new clustering and quality thresholds in settings.py (e.g., PICTO_CLUSTERING_EPS, PICTO_CLUSTERING_SIMILARITY_THRESHOLD, etc.) to allow runtime tuning of clustering and filtering behavior.

Clustering Logic Enhancements

  • The clustering function now adaptively estimates the DBSCAN eps parameter based on the data using a new estimate_eps function, leading to more data-driven clustering. [1] [2]
  • All clustering and cluster assignment functions now return the number of faces skipped due to quality or embedding validation, and this is propagated through the pipeline. [1] [2] [3] [4] [5] [6] [7]

API and Schema Updates

  • The /recluster endpoint and corresponding response schema now report the number of faces skipped during clustering, giving users visibility into data quality issues. [1] [2]

Pipeline Integration

  • The image processing pipeline and face detection now accumulate and return the total number of faces skipped due to quality checks, ensuring accurate reporting and traceability. [1] [2] [3]

These changes collectively improve clustering reliability, transparency, and configurability, and provide better diagnostics for skipped/filtered faces.

AI Usage Disclosure:

We encourage contributors to use AI tools responsibly when creating Pull Requests. While AI can be a valuable aid, it is essential to ensure that your contributions meet the task requirements, build successfully, include relevant tests, and pass all linters. Submissions that do not meet these standards may be closed without warning to maintain the quality and integrity of the project. Please take the time to understand the changes you are proposing and their impact. AI slop is strongly discouraged and may lead to banning and blocking. Do not spam our repos with AI slop.

Check one of the checkboxes below:

  • This PR does not contain AI-generated code at all.
  • This PR contains AI-generated code. I have read the AI Usage Policy and this PR complies with this policy. I have tested the code locally and I am responsible for it.

I have used the following AI models and tools: TODO

Checklist

  • My PR addresses a single issue, fixes a single bug or makes a single improvement.
  • My code follows the project's code style and conventions
  • If applicable, I have made corresponding changes or additions to the documentation
  • If applicable, I have made corresponding changes or additions to tests
  • My changes generate no new warnings or errors
  • I have joined the Discord server and I will share a link to this PR with the project maintainers there
  • I have read the Contribution Guidelines
  • Once I submit my PR, CodeRabbit AI will automatically review it and I will address CodeRabbit's comments.
  • I have filled this PR template completely and carefully, and I understand that my PR may be closed without review otherwise.

Summary by CodeRabbit

  • New Features

    • Face detection adds per-face quality checks and skip counting; reclustering reports skipped faces and respects configurable clustering thresholds.
    • Clustering gains adaptive parameter estimation (eps) and improved merge behavior.
    • Global alert UI added and wired to notify when reclustering skips faces.
  • Tests

    • Unit tests for clustering behavior, adaptive eps estimation, and face-quality checks.
  • Docs

    • API schema updated to include skipped-face info in recluster responses.

@coderabbitai

coderabbitai Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

Wondering what really moved? Review this PR in Change Stack to inspect semantic changes, definitions, and references.

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5876e21f-d86e-4fa7-b79a-d2f3d8421873

📥 Commits

Reviewing files that changed from the base of the PR and between 31b5b59 and 188d84f.

📒 Files selected for processing (4)
  • backend/app/database/images.py
  • backend/app/routes/images.py
  • docs/backend/backend_python/openapi.json
  • frontend/src/app/store.ts
✅ Files skipped from review due to trivial changes (2)
  • backend/app/database/images.py
  • backend/app/routes/images.py

Walkthrough

Centralizes clustering config, adds a pre-embedding face quality gate, implements adaptive eps estimation, propagates skipped-face counts through clustering utilities and API, and surfaces skipped counts in the frontend via a global alert.

Changes

Adaptive Clustering and Quality Gating

Layer / File(s) Summary
Clustering Configuration Foundation
backend/app/config/settings.py
Introduces _get_env_float/_get_env_int helpers and module-level PICTO_CLUSTERING_* constants populated from env vars with defaults and bounds.
Face Quality Gate & Detection Integration
backend/app/utils/face_quality.py, backend/app/models/FaceDetector.py, backend/app/utils/images.py
Adds face_passes_quality_gate (confidence, bbox size, Laplacian blur) and integrates it into FaceDetector; detection path now counts and returns faces_skipped. Images utility accumulates skipped counts.
Adaptive EPS and Clustering Utilities
backend/app/utils/face_clusters.py
Adds estimate_eps() using k‑NN distances; updates clustering functions to validate embeddings, track and propagate total_faces_skipped, return (results, total_faces_skipped), and use config merge threshold when needed; adaptive eps falls back to config when insufficient data.
API Response Contract Extension
backend/app/schemas/face_clusters.py, backend/app/routes/face_clusters.py, docs/backend/backend_python/openapi.json
Extends GlobalReclusterData with optional faces_skipped. Route unpacks (result, total_faces_skipped) and populates both clusters_created and faces_skipped in responses; OpenAPI updated.
Frontend Types, Store, and Global Alert
frontend/src/api/api-functions/face_clusters.ts, frontend/src/features/globalAlertSlice.ts, frontend/src/app/store.ts
Adds GlobalReclusterData TS interface and typed API return; introduces globalAlert slice and wires its reducer into the store.
Global Alert UI and Integration
frontend/src/components/GlobalAlert/GlobalAlert.tsx, frontend/src/App.tsx, frontend/src/pages/SettingsPage/components/ApplicationControlsCard.tsx
Adds GlobalAlert component (auto-dismiss, manual close), mounts it in App, and dispatches showGlobalAlert from Settings page when faces_skipped > 0.
Clustering and Quality Gate Tests
backend/tests/test_face_clusters.py
Adds synthetic embedding generators, mock DB fixtures, and tests verifying clustering regression, adaptive-eps stability/fallback, and quality-gate pass/fail cases.
Misc backend formatting
backend/app/database/images.py, backend/app/routes/images.py
Minor SQL string formatting and message string cleanup with no behavioural change.

Sequence Diagram

sequenceDiagram
  participant FaceDetector
  participant QualityGate as face_passes_quality_gate
  participant EmbeddingSvc as FaceNetEmbedding
  participant EpsEstimator as estimate_eps
  participant DBSCAN
  participant Assigner as cluster_util_cluster_all_face_embeddings
  participant API as trigger_global_reclustering
  participant Client

  FaceDetector->>QualityGate: crop, bbox, conf_score
  QualityGate-->>FaceDetector: pass / fail (increment faces_skipped if fail)
  FaceDetector->>EmbeddingSvc: preprocess face crop -> embedding
  EmbeddingSvc->>EpsEstimator: embeddings (k = min_samples)
  EpsEstimator-->>DBSCAN: eps (or None -> use config)
  DBSCAN->>Assigner: cluster labels
  Assigner-->>API: (clusters_created, total_faces_skipped)
  API-->>Client: GlobalReclusterResponse{clusters_created, faces_skipped}
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • AOSSIE-Org/PictoPy#771: Overlaps clustering logic and thresholding changes in backend/app/utils/face_clusters.py.
  • AOSSIE-Org/PictoPy#560: Related global reclustering endpoint work; this PR extends response payload to include faces_skipped.
  • AOSSIE-Org/PictoPy#524: Related edits to FaceDetector detection/embedding flow; both PRs change detection-to-embedding behavior.

Suggested labels

Python, TypeScript/JavaScript, Documentation

Suggested reviewers

  • rahulharpal1603

Poem

🐰 I hop through embeddings late at night,
Skipping the blurred and the tiny in sight,
DBSCAN learns the local beat,
Clusters settle tidy and neat,
Alerts chirp gently — count the faces skipped right.

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: implementing a face quality gate and adaptive density-aware clustering to improve clustering robustness.
Linked Issues check ✅ Passed All objectives from issue #1271 are implemented: face quality gate filtering [FaceDetector.py, face_quality.py], adaptive eps estimation [face_clusters.py], clustering parameters centralized in settings [settings.py], faces_skipped count propagation [face_clusters.py, images.py, face_clusters.py route, schema], frontend TypeScript typing and UI alert [face_clusters.ts, ApplicationControlsCard.tsx, globalAlertSlice.ts], and regression tests [test_face_clusters.py].
Out of Scope Changes check ✅ Passed Minor formatting/cleanup changes in images.py and routes/images.py are peripheral housekeeping. The GlobalAlert component and Redux slice are reasonable supporting infrastructure for the faces_skipped UI feature. All changes directly support the core clustering quality and adaptive density objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/app/utils/face_clusters.py (1)

232-233: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Fix tuple contract on empty-input return path.

At Line 233, cluster_util_cluster_all_face_embeddings returns [], but cluster_util_face_clusters_sync (Line 112) unpacks two values. This raises unpacking errors when there are no faces.

Suggested fix
     if not faces_data:
-        return []
+        return [], 0
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/utils/face_clusters.py` around lines 232 - 233, The early-return
in cluster_util_cluster_all_face_embeddings currently returns [] but
cluster_util_face_clusters_sync expects two values; change the empty-input
return to match the tuple contract (e.g., return an empty list and an empty
dict) so callers can safely unpack: update
cluster_util_cluster_all_face_embeddings to return ([], {}) when faces_data is
falsy, ensuring the shape matches what cluster_util_face_clusters_sync unpacks.
🧹 Nitpick comments (1)
backend/tests/test_face_clusters.py (1)

468-614: ⚡ Quick win

Add an empty-dataset regression test for the new tuple return contract.

Please add a test that patches db_get_all_faces_with_cluster_names to [] and asserts cluster_util_cluster_all_face_embeddings() returns ([], 0). This would catch the current unpacking failure path early.

As per coding guidelines "Verify that all critical functionality is covered by tests".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/tests/test_face_clusters.py` around lines 468 - 614, Add a unit test
that patches db_get_all_faces_with_cluster_names to return an empty list and
asserts cluster_util_cluster_all_face_embeddings() returns ([], 0);
specifically, create a new test method (e.g.,
test_empty_dataset_returns_empty_and_zero) in TestFaceClusteringAlgo, use
`@patch`("app.utils.face_clusters.db_get_all_faces_with_cluster_names") to set
mock_db_get.return_value = [], call cluster_util_cluster_all_face_embeddings()
with default args, and assert the returned tuple equals ([], 0) to prevent the
current unpacking failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/app/config/settings.py`:
- Around line 40-56: The clustering env var parsing (PICTO_CLUSTERING_EPS,
PICTO_CLUSTERING_MIN_SAMPLES, PICTO_CLUSTERING_SIMILARITY_THRESHOLD,
PICTO_CLUSTERING_MERGE_THRESHOLD, PICTO_CLUSTERING_CONF_THRESHOLD,
PICTO_CLUSTERING_BLUR_THRESHOLD, PICTO_CLUSTERING_MIN_FACE_SIZE) needs robust
validation: replace the direct int/float casts with guarded parsing (try/except)
or helper functions that return defaults on parse failure, then clamp numeric
results to safe ranges (e.g., eps/similarity/merge/conf in [0.0,1.0],
min_samples >= 1, blur_threshold >= 0, min_face_size >= 1) and log or raise a
clear configuration error if values are out-of-bounds; update the constants
initialization to use these validators so malformed env vars won't crash startup
or produce invalid clustering behavior.

In `@backend/app/utils/face_quality.py`:
- Around line 27-35: In face_passes_quality_gate(), add an early-return guard
for empty/invalid face crops before the grayscale conversion: check that
face_crop is not None and face_crop.size > 0 (and that shape dimensions are
valid) right before the existing "if len(face_crop.shape) == 3:" block; if the
crop is empty or invalid, return False to avoid calling cv2.cvtColor /
cv2.Laplacian and crashing (this protects the variance = cv2.Laplacian(...) line
that uses blur_threshold).

In `@frontend/src/pages/SettingsPage/components/ApplicationControlsCard.tsx`:
- Line 45: The alert message currently states that skipped faces were due to
"invalid embeddings" but facesSkipped includes other filters (e.g.,
quality-gated faces); update the text in ApplicationControlsCard (where
facesSkipped is used) to reflect the broader reason (e.g., "were skipped during
clustering due to filtering or invalid embeddings" or "due to filtering (e.g.,
low quality) or invalid embeddings") so the alert is accurate and not
misleading.

---

Outside diff comments:
In `@backend/app/utils/face_clusters.py`:
- Around line 232-233: The early-return in
cluster_util_cluster_all_face_embeddings currently returns [] but
cluster_util_face_clusters_sync expects two values; change the empty-input
return to match the tuple contract (e.g., return an empty list and an empty
dict) so callers can safely unpack: update
cluster_util_cluster_all_face_embeddings to return ([], {}) when faces_data is
falsy, ensuring the shape matches what cluster_util_face_clusters_sync unpacks.

---

Nitpick comments:
In `@backend/tests/test_face_clusters.py`:
- Around line 468-614: Add a unit test that patches
db_get_all_faces_with_cluster_names to return an empty list and asserts
cluster_util_cluster_all_face_embeddings() returns ([], 0); specifically, create
a new test method (e.g., test_empty_dataset_returns_empty_and_zero) in
TestFaceClusteringAlgo, use
`@patch`("app.utils.face_clusters.db_get_all_faces_with_cluster_names") to set
mock_db_get.return_value = [], call cluster_util_cluster_all_face_embeddings()
with default args, and assert the returned tuple equals ([], 0) to prevent the
current unpacking failure.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 07fa7eb3-5305-4d3e-ab5e-e93fb9da28fd

📥 Commits

Reviewing files that changed from the base of the PR and between 0528751 and e11cc68.

📒 Files selected for processing (15)
  • backend/app/config/settings.py
  • backend/app/models/FaceDetector.py
  • backend/app/routes/face_clusters.py
  • backend/app/schemas/face_clusters.py
  • backend/app/utils/face_clusters.py
  • backend/app/utils/face_quality.py
  • backend/app/utils/images.py
  • backend/tests/test_face_clusters.py
  • docs/backend/backend_python/openapi.json
  • frontend/src/App.tsx
  • frontend/src/api/api-functions/face_clusters.ts
  • frontend/src/app/store.ts
  • frontend/src/components/GlobalAlert/GlobalAlert.tsx
  • frontend/src/features/globalAlertSlice.ts
  • frontend/src/pages/SettingsPage/components/ApplicationControlsCard.tsx

Comment thread backend/app/config/settings.py Outdated
Comment thread backend/app/utils/face_quality.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/app/config/settings.py`:
- Around line 115-117: The PICTO_CLUSTERING_MIN_SAMPLES configuration currently
allows a value of 1 which can reintroduce the chaining regression; update the
call to _get_env_int that defines PICTO_CLUSTERING_MIN_SAMPLES so its min_value
is 2 (i.e., change the min_value argument from 1 to 2) to enforce min_samples >=
2 and prevent the chaining issue.

In `@backend/tests/test_face_clusters.py`:
- Around line 615-627: The test currently uses bbox=(0, 0, 0, 0) so the
min-face-size check triggers before empty-crop handling; change the bbox to a
non-zero, sufficiently large box (for example bbox=(0, 0, 500, 500)) so the
min_face_size gate won't fail first and the call to
face_passes_quality_gate(face_crop=empty_crop, bbox=..., ...) will exercise the
empty-crop branch; keep the other args (conf_score, conf_threshold,
blur_threshold, min_face_size) the same and assert the result is False.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c21ef6ee-8549-43b7-b361-5a8946270f3e

📥 Commits

Reviewing files that changed from the base of the PR and between e11cc68 and 7ad239f.

📒 Files selected for processing (3)
  • backend/app/config/settings.py
  • backend/app/utils/face_quality.py
  • backend/tests/test_face_clusters.py

Comment thread backend/app/config/settings.py
Comment thread backend/tests/test_face_clusters.py
@github-actions

Copy link
Copy Markdown
Contributor

⚠️ This PR has merge conflicts.

Please resolve the merge conflicts before review.

Your PR will only be reviewed by a maintainer after all conflicts have been resolved.

📺 Watch this video to understand why conflicts occur and how to resolve them:
https://www.youtube.com/watch?v=Sqsz1-o7nXk

@rahulharpal1603 rahulharpal1603 merged commit 07a2e78 into AOSSIE-Org:main Jun 7, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Adaptive Density-Aware Face Clustering

2 participants