Skip to content

Add rounding versions of size methods on Count#76

Merged
itsibitzi merged 5 commits into
mainfrom
sc-20250730-size-as-usize
Jul 30, 2025
Merged

Add rounding versions of size methods on Count#76
itsibitzi merged 5 commits into
mainfrom
sc-20250730-size-as-usize

Conversation

@itsibitzi
Copy link
Copy Markdown
Contributor

@itsibitzi itsibitzi commented Jul 30, 2025

Why?

When estimating the size of a set it makes sense that you'd want an unsigned integer number value. I've kept the ones that return a real in case.

How?

Rename size methods to size_real and size_with_sketch_real. Add integer-returning size and size_with_sketch methods that round the floating-point estimates. Update all usages in test to call the new methods. Specifically, there are tests that assert that the estimates returned are a specific real value. It is useful that these stay reals because it's possible that if the hashing function changes then we might end up with the same rounded integer but a different real value and I think its good that we know the hash has changed?

Added some debug assertions to panic if the math ever goes awry.

Open Questions

I'm not super in love with the _real affix. _f32 feels a bit technical? Maybe _decimal? _estimate? _exact? _exact_estimate?

Add integer-returning size and size_with_sketch methods that round the
floating-point estimates. Update all usages to call the new methods.
Copilot AI review requested due to automatic review settings July 30, 2025 14:10
@itsibitzi itsibitzi requested a review from a team as a code owner July 30, 2025 14:10
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the Count trait to provide both real-valued and integer-returning size estimation methods. The changes rename existing methods to *_real variants and introduce new integer-returning methods that round the floating-point estimates.

  • Renamed size() and size_with_sketch() to *_real variants that return f32
  • Added new size() and size_with_sketch() methods that return usize by rounding the real values
  • Added debug assertions to validate that floating-point estimates are finite, non-negative, and within usize range

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
crates/geo_filters/src/lib.rs Updated trait definition with new method signatures and added debug assertion helper
crates/geo_filters/src/evaluation/simulation.rs Updated trait implementations to call size_real() instead of size()
crates/geo_filters/src/evaluation/hll.rs Renamed method implementations from size to size_real and size_with_sketch to size_with_sketch_real
crates/geo_filters/src/distinct_count.rs Updated method implementations and test calls to use size_real()
crates/geo_filters/src/diff_count.rs Updated method implementations and test calls to use size_real()
crates/geo_filters/src/config.rs Updated test code to call size_real() instead of size()

Comment thread crates/geo_filters/src/lib.rs
Comment thread crates/geo_filters/src/lib.rs
@itsibitzi
Copy link
Copy Markdown
Contributor Author

Lovely, the

#[doc = include_str!("../README.md")]
#[cfg(doctest)]
pub struct ReadmeDocTests;

code caught a regression in the README :)

}

fn size(&self) -> f32 {
fn size_real(&self) -> f32 {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the suffix real?
Isn't a f32 suffix usually used in rust?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while size_f32 is technical, it is also pretty clear what you will get :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I wanted to encode the semantics relating to the use of the function rather than the technical implementation.

I'll just go with size_f32 so we can sidestep the bike shedding :)

@itsibitzi itsibitzi enabled auto-merge July 30, 2025 16:43
@itsibitzi itsibitzi merged commit f6bc74b into main Jul 30, 2025
7 checks passed
@itsibitzi itsibitzi deleted the sc-20250730-size-as-usize branch July 30, 2025 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants