Skip to content

pack-files: never drop a whole pack to delete a single object#9813

Open
mr-raj12 wants to merge 4 commits into
borgbackup:masterfrom
mr-raj12:pack-files-cleanup
Open

pack-files: never drop a whole pack to delete a single object#9813
mr-raj12 wants to merge 4 commits into
borgbackup:masterfrom
mr-raj12:pack-files-cleanup

Conversation

@mr-raj12

@mr-raj12 mr-raj12 commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

What

This branch started as a comment cleanup and grew to fix the over-deletion it exposed: nothing in the pack-files path should drop a whole pack when only a single object was meant. A pack now holds several objects, so dropping the pack to remove one chunk would take its neighbours with it.

Single-object delete no longer drops a pack

borg debug delete-obj is now a no-op that reports it ignored the deletion, mirroring Repository.delete() (already a no-op with a warning and a TODO). It no longer calls store_delete on the pack. A single object cannot be removed without rewriting its pack, so the debug tool refuses rather than taking the neighbours down with it.

The delete_chunk test damage helper now routes through Repository.compact_pack: it rewrites the pack without the target object (the survivors are copied forward and stay readable) and persists the chunk index the way borg compact does, so a following borg check or borg extract in a separate process sees the change.

Tests

Three check tests that were skipped (refs #8572) are revived: test_missing_file_chunk, test_missing_archive_item_chunk and test_missing_archive_metadata. They were skipped because the old helper dropped the whole pack but left the dropped chunk's index entry in place, an inconsistent state a non-repair check could not report cleanly. With the helper removing the object from both the pack and the index, the state is consistent and the tests pass.

test_debug_put_get_delete_obj is updated for the no-op: it asserts the object is reported as ignored and then re-reads it to prove it survives.

test_basic_operations and test_chunk_index_persisted_on_close get an explicit flush() after their put loop. They were only passing because the put count happened to fill whole packs before close; the flush makes them independent of the pack size.

Comments

Dropped the N=1, N>1 and pack_id == chunk_id comments left over from development. They described stages the code passed through rather than how it works now, where a pack holds several objects. The comments are reworded to describe the general case.

Left deliberately

Repository.delete() stays a no-op with a warning and a TODO. Real single-object removal needs the pack rewrite that compact performs; until that is wired into delete, removal happens at the pack level via borg compact.

The new_pack_id == pack_id comparison in compact_pack is accurate (a defrag that reproduces the pack returns the same sha256 name) and is left unchanged.

These comments were phrased around development milestones (N=1, N>1, pack_id == chunk_id); with multi-object packs in place that framing is stale. Reword for the general case. Comment-only, no behavior change.
@ThomasWaldmann

ThomasWaldmann commented Jun 25, 2026

Copy link
Copy Markdown
Member

debug delete-obj drops the whole pack holding an object,
so it also removes any neighbouring objects in that pack.

That needs to get fixed.

Make it a no-op for now, similar as Repository.delete(), with a warning / "TODO" comment as in Repository.delete.

@codecov

codecov Bot commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.99%. Comparing base (4a28995) to head (e74d7ef).
⚠️ Report is 2 commits behind head on master.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #9813      +/-   ##
==========================================
+ Coverage   84.77%   84.99%   +0.21%     
==========================================
  Files          92       92              
  Lines       15289    15291       +2     
  Branches     2297     2296       -1     
==========================================
+ Hits        12961    12996      +35     
+ Misses       1630     1603      -27     
+ Partials      698      692       -6     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

@ThomasWaldmann ThomasWaldmann left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this surfaced some serious issues.

never delete a pack when only a single object was meant.

Comment thread src/borg/archiver/debug_cmd.py Outdated
Comment thread src/borg/testsuite/archiver/__init__.py Outdated
Comment thread src/borg/testsuite/repository_test.py
Comment thread src/borg/repository.py Outdated
mr-raj12 added 2 commits June 26, 2026 03:40
…ng the whole pack

Add explicit flush() in two repository tests so they no longer rely on the put count filling whole packs before close()/get().
…hole pack

Rewrite the damage helper to rewrite the pack without the target chunk (neighbours survive) and persist the index, then revive three check tests that were skipped because the old whole-pack drop left an inconsistent index.
@mr-raj12 mr-raj12 requested a review from ThomasWaldmann June 25, 2026 22:37
@mr-raj12 mr-raj12 changed the title comments: drop N=1/N>1 pack-files scaffolding, describe the general case pack-files: never drop a whole pack to delete a single object Jun 25, 2026

@ThomasWaldmann ThomasWaldmann left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the functionality again, but if nothing speaks against it, that is Repository.delete.

@@ -181,17 +181,23 @@ def open_archive(repo_path, name):


def delete_chunk(repository, id):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess you just implemented Repository.delete here?

It is not very fast, but considering it is rarely ever used, this is not a problem.

Rewrite the object's pack via compact_pack and persist a full chunk index. Wire debug delete-obj and the delete_chunk test helper to it, and drop the store.info HEAD coverage check from compact_pack.

@ThomasWaldmann ThomasWaldmann left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good cleanup!

Comment on lines 182 to +184
def delete_chunk(repository, id):
"""Remove a single chunk from the repo, leaving the rest of its pack intact (test damage helper).

A pack holds several objects, so dropping the whole pack would take innocent neighbours with it.
Route through Repository.compact_pack, which rewrites the pack without the target and re-points the
survivors, then persist the index the way `borg compact` does (invalidate the cached index before
the store change, write it back after) so a following borg process, e.g. `borg check`, sees it.
"""
entry = repository.chunks.get(id)
if entry is None:
raise Repository.ObjectNotFound(id, repository)
pack_id = entry.pack_id
delete_chunkindex_from_repo(repository) # invalidate the cached index before changing the store
# the index now rebuilds from the packs; keep every object in this pack except the target
keep_ids = {cid for cid, e in repository.chunks.iteritems() if e.pack_id == pack_id}
keep_ids.discard(id)
repository.compact_pack(pack_id, keep_ids=keep_ids, drop_ids={id})
write_chunkindex_to_repo(repository, repository.chunks, incremental=False, force_write=True, delete_other=True)
"""Remove a single chunk from the repo, leaving the rest of its pack intact (test damage helper)."""
repository.delete(id)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a special reason why you kept this wrapper?

Comment thread src/borg/repository.py
# objects; real removal is store_delete at the pack level (compact). For now just check the
# object exists (ObjectNotFound contract), log, and do nothing.
# TODO: implement single-object delete (today removal only happens at the pack level, via compact).
from .cache import write_chunkindex_to_repo

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a circular import issue if you import this on module level?

Comment thread src/borg/repository.py
Comment on lines +859 to +861
# pass. we do not cross-check against the pack's on-disk size: that needs a store.info() HEAD,
# which the stdio rest backend mishandles (it blocks reading a body the HEAD response never
# carries), and the caller already guarantees the two sets are the pack's complete object set.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like it should be in a issue on the borgstore repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants