Skip to content

Add a "hide duplicate images" option to import dialogs#21319

Open
jeffc wants to merge 5 commits into
darktable-org:masterfrom
jeffc:skip-dupes
Open

Add a "hide duplicate images" option to import dialogs#21319
jeffc wants to merge 5 commits into
darktable-org:masterfrom
jeffc:skip-dupes

Conversation

@jeffc

@jeffc jeffc commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

I have a camera with two SD card slots (Canon R7), and when I have "save to both cards" enabled, darktable shows both copies of every file in the "import from camera" window. This is frustrating, because I have to then sort by name and manually deselect all of the duplicates, or import them and delete them manually.

This PR adds a hide duplicate images option to the import dialogs, which looks at the timestamp (from the photo metadata) and filesize of every image. If two (or more) match, it considers the one with the alphabetically-first path the "canonical" version and marks the others with a "possible duplicate" flag. Checking the "hide duplicate images" box filters the possible duplicates out of the list. It only compares against other images in the same import job, not against anything already in the darktable library/database.

This doesn't take into account the filename(s) of the images, which is how the "only new images" box works (in addition to timestamp), but relying on a size check felt more reliable to me. I'm happy to update that to match if you'd prefer.

image

note: I used AI tools to write most of this code, since C++ is not my strongest language. I reviewed the changes as carefully as I could and they all look reasonable

@jeffc jeffc changed the title Add a "skip duplicates" option to import dialogs Add a "hide duplicate images" option to import dialogs Jun 15, 2026

@victoryforce victoryforce left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit changes the 'swap' argument of dt_gui_preferences_bool function (which controls the position in the grid and has nothing to do with memory allocation). I've failed to see how this can fix "pre-existing memory leak". Could you please explain why the commit comment mentions a memory leak?

EDIT: Unexpected behavior from GitHub UI, I wrote this comment under a specific commit, and it is published in the general thread. It is about commit 71d8f2a

@jeffc

jeffc commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

@victoryforce

That's my bad; I made two back-to-back commits (in order to put the leak fix in its own), but added the wrong files to each. I'll add a comment on the offending commit to explain for any other reviewers

The first leak fix is in camera_jobs.c here, just switching a g_list_free() to a g_list_free_full() in order to free the list contents in addition to the list object itself.

The second leak fix is properly in its own commit, here (making a scope-local copy of the list so that we don't inhibit the caller's ability to free the images list)

@jenshannoschwalm

Copy link
Copy Markdown
Collaborator

You should do a proper PR with the proposed fix for 5.6 first i think.

@jeffc

jeffc commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

@jenshannoschwalm by that, do you mean separate out the memory leak fixes into their own PR? (I'm happy to do that, just making sure I understand you properly)

@jenshannoschwalm

Copy link
Copy Markdown
Collaborator

by that, do you mean separate out the memory leak fixes into their own PR?

Exactly.

  1. If there is a bug you found it should be fixed for 5.6 if still possible
  2. Rebasing on that would make your pr more easy to discuss.

@wpferguson

Copy link
Copy Markdown
Member

I think this is an enhancement and not a bug. It's also somewhat of an edge cause because you have to be saving the same image to both cards and then importing from the camera

The question I have is can one image be corrupt and the other not and still have the same file size. Would check summing be a better test of duplicate (though painfully slow).

@jeffc jeffc requested a review from LebedevRI as a code owner June 15, 2026 18:00
@jeffc

jeffc commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

I separated out the memory leaks into their own PR (#21326), but seem to have goofed up this PR with my rebase. Fixing that now

jeffc added 4 commits June 15, 2026 14:10
 - skip duplicate checking when size or timestamp is not set
 - resolve "chains" (all dupes point to the same canonical image)
 - match UI defaults and config defaults ("hide duplicates" starts
   unchecked)
@jeffc

jeffc commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

Okay, rebase is untangled and should be good for further review

The question I have is can one image be corrupt and the other not and still have the same file size. Would check summing be a better test of duplicate (though painfully slow).

I actually did implement this at first -- the timestamp+size test marked "possible duplicate" images, then they were still copied from the camera and checksummed. It only discarded images if their "canonical" version's checksum matched perfectly (I had originally wanted to read the "thumbnail preview" for every image and compare the checksums of those, but I realized this would indeed be painfully slow). I later changed to the current behavior because I realized that doing it that way effectively doubled the import time. I suspect (without evidence) that a corrupted image would most likely be due to an incomplete write (and would thus have a different filesize), but even if that's not true I think it's fair to let the user handle that case on their own. The current system for detecting "already imported" doesn't take into account file checksums, only the file name and timestamp, so doing this the performant way (with metadata only) rather than the "catches any edge case" way (with checksums) felt like the right course.

@jeffc

jeffc commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

I think this is an enhancement and not a bug.

Agreed. I don't have permissions to add tags to this PR, but please add whatever you think is best.

It's also somewhat of an edge cause because you have to be saving the same image to both cards and then importing from the camera

I'd argue this isn't an edge case at all; I think it's actually the way the camera is intended to be used. The point of having two memory card slots is for redundancy - write every image to both, and if one card dies you don't lose anything. Importing from the camera is, to me, the "most correct" way to make sure you actually import everything. If there's a write error or other hiccup and most images are written to both but one or two images are only written to one, you have a few options:

  1. Import everything from the camera and deduplicate afterwards
  2. Import from the camera and deduplicate in the process (this PR)
  3. Import from the cards directly (ie, with a card reader) and deduplicate, either afterwards or as part of the import process
  4. Import from only one card and just hope that there isn't anything that's only on the other one.

Right now, options 1 and 3 are the only viable ways to make sure you get everything that you shot (off of both cards). Deduplicating by hand is a pain in the butt, and swapping cards in and out of the reader is mostly automated but still involves extra steps, which can be either tedious or forgotten.

@wpferguson wpferguson added the feature: enhancement current features to improve label Jun 20, 2026
@wpferguson

Copy link
Copy Markdown
Member

The point of having two memory card slots is for redundancy

In my case the point of having 2 cards is twice the storage and automatically switching when I fill the first one. I shoot sports, so lots of images with limited/no breaks to swap cards. When I do get a break I swap out the full card so I have an empty one to swap to when the second one gets filled.

@jeffc

jeffc commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

Ah, that makes sense. I've never had a card go bad, but I'm paranoid enough about it that I always end up saving to both if I can. For your use case, this change wouldn't cause any behavior differences for the import

@wpferguson

Copy link
Copy Markdown
Member

I've never had a card go bad

I didn't want to say that in case the card gods were listening 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature: enhancement current features to improve

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants