Add rounding versions of size methods on Count#76
Conversation
Add integer-returning size and size_with_sketch methods that round the floating-point estimates. Update all usages to call the new methods.
There was a problem hiding this comment.
Pull Request Overview
This PR refactors the Count trait to provide both real-valued and integer-returning size estimation methods. The changes rename existing methods to *_real variants and introduce new integer-returning methods that round the floating-point estimates.
- Renamed
size()andsize_with_sketch()to*_realvariants that returnf32 - Added new
size()andsize_with_sketch()methods that returnusizeby rounding the real values - Added debug assertions to validate that floating-point estimates are finite, non-negative, and within
usizerange
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/geo_filters/src/lib.rs | Updated trait definition with new method signatures and added debug assertion helper |
| crates/geo_filters/src/evaluation/simulation.rs | Updated trait implementations to call size_real() instead of size() |
| crates/geo_filters/src/evaluation/hll.rs | Renamed method implementations from size to size_real and size_with_sketch to size_with_sketch_real |
| crates/geo_filters/src/distinct_count.rs | Updated method implementations and test calls to use size_real() |
| crates/geo_filters/src/diff_count.rs | Updated method implementations and test calls to use size_real() |
| crates/geo_filters/src/config.rs | Updated test code to call size_real() instead of size() |
|
Lovely, the #[doc = include_str!("../README.md")]
#[cfg(doctest)]
pub struct ReadmeDocTests;code caught a regression in the README :) |
| } | ||
|
|
||
| fn size(&self) -> f32 { | ||
| fn size_real(&self) -> f32 { |
There was a problem hiding this comment.
why the suffix real?
Isn't a f32 suffix usually used in rust?
There was a problem hiding this comment.
while size_f32 is technical, it is also pretty clear what you will get :)
There was a problem hiding this comment.
I guess I wanted to encode the semantics relating to the use of the function rather than the technical implementation.
I'll just go with size_f32 so we can sidestep the bike shedding :)
Why?
When estimating the size of a set it makes sense that you'd want an unsigned integer number value. I've kept the ones that return a real in case.
How?
Rename size methods to
size_realandsize_with_sketch_real. Add integer-returningsizeandsize_with_sketchmethods that round the floating-point estimates. Update all usages in test to call the new methods. Specifically, there are tests that assert that the estimates returned are a specific real value. It is useful that these stay reals because it's possible that if the hashing function changes then we might end up with the same rounded integer but a different real value and I think its good that we know the hash has changed?Added some debug assertions to panic if the math ever goes awry.
Open Questions
I'm not super in love with the
_realaffix._f32feels a bit technical? Maybe_decimal?_estimate?_exact?_exact_estimate?