repository: add max_size byte limit to PackWriter, OR-ed with max_count#9821
Conversation
max_size=None (default) keeps count-only behaviour. Also bump default max_count 1->3 and flush test_list's last partial pack explicitly.
|
On the default can keep it at 3 as a temporary diagnostic, or revert to 2 before merge, whichever you prefer. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #9821 +/- ##
=======================================
Coverage 84.94% 84.95%
=======================================
Files 92 92
Lines 15291 15297 +6
Branches 2296 2297 +1
=======================================
+ Hits 12989 12995 +6
- Misses 1611 1612 +1
+ Partials 691 690 -1 ☔ View full report in Codecov by Harness. |
There was a problem hiding this comment.
Guess the new feature works, good.
But, as a general comment (not limited to this PR):
I am not happy with the quality of code comments and doc strings.
If the AI is doing a bad job, it is YOUR job to find that and improve that before submitting.
Comments are talking about irrelevant stuff (e.g. things we do NOT do), hacks of the past (things that were temporary hacks and we stopped doing), they are too repetitive, talking about what a function CALLER does instead of clearly saying what the function does, stating the obvious (like that a pack starts empty), ...
|
max_count: keep it at 3 for now. |
113cb38 to
0e20258
Compare
What
PackWriterpreviously decided a pack was full by object count alone (max_count). This adds an optionalmax_sizelimit so a pack can also be closed once it reaches a target byte size. A flush happens as soon as either the count or the size limit is reached, whichever comes first.max_sizedefaults toNone, which leaves the size check off and keeps the current count-only behaviour. A caller opts in by passing a byte limit.Why
Object count says nothing about the resulting file size: the same number of chunks gives a tiny pack for small chunks and a very large one for big chunks. Limiting by byte size gives direct control over how large packs grow on disk, which is the next step toward size-controlled packs.
Also in this PR
max_countis raised to 3, and the explicitmax_count=2at theRepository.open()call site is removed so the repository relies on PackWriter's own default. Running with a higher default exposed tests that only passed because packs happened to tile evenly at a count of 2.test_listnow flushes its last partial pack before listing, instead of depending on the object count dividing evenly into whole packs.Tests
test_pack_writer_flushes_on_max_size: confirms the size limit triggers a flush, with the count kept high so only size can fire.test_pack_writer_max_size_none_is_count_only: confirmsmax_size=Noneleaves the size check off and count remains the only trigger.