Extract DeletionVector logic from PuffinFile#3491
Conversation
cf9a2ce to
374d25c
Compare
374d25c to
74e0d7b
Compare
|
Thanks for your review! Addressed comments. |
|
Thanks for this contribution @ebyhr . I've approved it, and I'll leave it open for a day for any additional reviews before we merge. |
|
Thanks for splitting this out — the read path looks behavior-preserving and the separation is clean. For the write side, we'd like to coordinate before building on top of this: once it lands we're planning to rebase #3474 (the DV writer, currently Given that, would it make sense to decide now whether the write/serialize path should live on Our preference would be to mirror the read-side split — keep the DV bitmap serialization on |
Rationale for this change
PuffinFile handles two tasks: format parsing (magic bytes, footer, blobs) and deletion vector domain logic (bitmap deserialization and PyArrow conversion).
This will become problematic when we introduce support for the NDV
apache-datasketches-theta-v1blob in the future.Are these changes tested?
Yes
Are there any user-facing changes?
Yes - PuffinFile class user needs to call DeletionVector.