Skip to content

Feat:No content-based duplicate image detection (same image added multiple times in DB) #1069

@harsh1519

Description

@harsh1519

Describe the feature

Currently PictoPy identifies images only by their file path. If the same image file is:

Copied to another folder

Downloaded multiple times

Renamed

Or exists in backups

…it is stored as a new independent image in the database.

This causes:

Duplicate thumbnails

Duplicate metadata

Duplicate face processing

Duplicate tagging

Waste of storage and processing

No way to detect or manage duplicates

Add ScreenShots

Image

Same images:
Harsh_Shah.jpg exists in two folders

🔍 Current Behavior:

Images are uniquely identified by path

Same image in different folders = separate DB rows

No content hash or duplicate detection exists

✅ Expected Behavior:

System should compute a content hash (SHA256 or similar)

Store it in DB as image_hash

Allow:

Finding all images with same content

Showing duplicate groups

Letting user decide what to do with them (keep/delete/merge)

Record

  • I agree to follow this project's Code of Conduct
  • I want to work on this issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions