Skip to content

fix: return None from SUM when no values were collected#2878

Open
mdashti wants to merge 4 commits into
quickwit-oss:mainfrom
paradedb:fix/sum-none-on-empty-values
Open

fix: return None from SUM when no values were collected#2878
mdashti wants to merge 4 commits into
quickwit-oss:mainfrom
paradedb:fix/sum-none-on-empty-values

Conversation

@mdashti
Copy link
Copy Markdown
Collaborator

@mdashti mdashti commented Apr 2, 2026

What

Return None from IntermediateSum::finalize() when count == 0.

Why

IntermediateSum::finalize() returned Some(0.0) even when all documents had missing/NULL values, unlike MIN, MAX, and AVG which all return None for count == 0. The 0.0 came from IntermediateStats' default sum initialization. Consumers (like ParadeDB) that map None to SQL NULL were incorrectly getting 0 for SUM on all-NULL groups.

How

Check count == 0 before returning the sum value, returning None instead of Some(0.0).

Tests

  • Verify SUM returns None when all values are missing/NULL
  • Verify SUM still returns correct values for non-empty groups
  • Verify MIN, MAX, AVG behavior unchanged

@mdashti mdashti force-pushed the fix/sum-none-on-empty-values branch from db4b308 to f62c345 Compare April 2, 2026 17:40
@mdashti mdashti self-assigned this Apr 2, 2026
@mdashti mdashti force-pushed the fix/sum-none-on-empty-values branch 2 times, most recently from fd8250a to b29a7be Compare April 2, 2026 17:59
@mdashti mdashti force-pushed the fix/sum-none-on-empty-values branch from b29a7be to 65a9e03 Compare April 2, 2026 18:05
mdashti added a commit to paradedb/tantivy that referenced this pull request Apr 2, 2026
## What
Return `None` from `IntermediateSum::finalize()` when `count == 0`.

## Why
`IntermediateSum::finalize()` returned `Some(0.0)` even when all
documents had missing/NULL values, unlike MIN, MAX, and AVG which all
return `None` for `count == 0`. The `0.0` came from `IntermediateStats`'
default sum initialization. Consumers (like ParadeDB) that map `None` to
SQL `NULL` were incorrectly getting `0` for SUM on all-NULL groups.

Fixes paradedb/paradedb#4621

## How
Check `count == 0` before returning the sum value, returning `None`
instead of `Some(0.0)`.

## Tests
- Verify SUM returns `None` when all values are missing/NULL
- Verify SUM still returns correct values for non-empty groups
- Verify MIN, MAX, AVG behavior unchanged

Upstream PR: quickwit-oss#2878
mdashti added a commit to paradedb/tantivy that referenced this pull request Apr 2, 2026
## What
Return `None` from `IntermediateSum::finalize()` when `count == 0`.

## Why
`IntermediateSum::finalize()` returned `Some(0.0)` even when all
documents had missing/NULL values, unlike MIN, MAX, and AVG which all
return `None` for `count == 0`. The `0.0` came from `IntermediateStats`'
default sum initialization. Consumers (like ParadeDB) that map `None` to
SQL `NULL` were incorrectly getting `0` for SUM on all-NULL groups.

Fixes paradedb/paradedb#4621

## How
Check `count == 0` before returning the sum value, returning `None`
instead of `Some(0.0)`.

## Tests
- Verify SUM returns `None` when all values are missing/NULL
- Verify SUM still returns correct values for non-empty groups
- Verify MIN, MAX, AVG behavior unchanged

Upstream PR: quickwit-oss#2878
IntermediateSum::finalize() returned Some(0.0) even when count==0
(all documents had missing/NULL values). This differs from MIN, MAX,
and AVG which all return None for count==0.

The 0.0 came from IntermediateStats' default sum initialization.
Consumers (like ParadeDB) that map None to SQL NULL were incorrectly
getting 0 for SUM on all-NULL groups.

Fixes paradedb/paradedb#4621
@mdashti mdashti force-pushed the fix/sum-none-on-empty-values branch from 65a9e03 to 32d07c0 Compare April 2, 2026 18:19
/// `IntermediateMin`, `IntermediateMax`, and `IntermediateAvg`.
pub fn finalize(&self) -> Option<f64> {
Some(self.stats.finalize().sum)
let stats = self.stats.finalize();
Copy link
Copy Markdown
Collaborator

@PSeitz PSeitz Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we change sum on the Stats struct to Option instead?
Update: actually that would make de/serialization a bit weird with serde

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the spec. Sum of 0 elements is defined. Why do we return an Option?
What does elastic return?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the struct itself we have some freedom how to express null, only the serialization needs to follow elastic search spec

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant the sum of no number is defined. It is 0. Why are we having this discussion? Is this about reproducing a glitch in Elasticsearch?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sum of NULL is NULL in SQL

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmation of the behavior
https://onecompiler.com/postgresql/44jr6em28

Aggregation are not meant to align themselves with SQL. We want elasticsearch behavior by default.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but we match elastic search behavior only on the serialized input and output, not on the intermediate formats.

Currently the change is a breaking change, because of

    /// Sum metric result.
    Sum(SingleMetricResult),

/// Single-metric aggregations use this common result structure.
///
/// Main reason to wrap it in value is to match elasticsearch output structure.
#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
pub struct SingleMetricResult {
    /// The value of the single value metric.
    pub value: Option<f64>,
}

Currently sum is weird (on main), SingleMetricResult is Option, but it's never None.
I think we should fix that one way or the other.

@mdashti Can you check how elasticsearch behaves?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On flipping Stats.sum itself to Option, agree the serde dance isn't worth it. Kept IntermediateStats untouched and branched only inside IntermediateSum::finalize so the broader stats path stays as-is.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked ES on this. Sum on an empty / all-missing bucket comes back as "value": 0, while min/max/avg return "value": null. So strict ES parity does mean Some(0.0) for sum, and you're right that this PR diverges on the wire.

The reason we leaned toward None: ParadeDB's SQL SUM over zero rows is NULL, and Some(0.0) is indistinguishable from a bucket that genuinely sums to zero, so the SQL boundary loses that signal. Hence the asymmetry feels worse than the ES divergence from where we sit.

Re the breaking-change point, fair. SingleMetricResult.value is already Option<f64> for sum but never None on main, so consumers that already match on Option won't blow up. Anything coercing null → 0 to mirror the ES wire format will see a diff though.

Pulled in 4486166 a small doc note on finalize calling out the ES divergence and the SQL motivation, so future readers see the trade-off without rediscovering this thread.

Happy to take this whichever way you want, keep None, revert to Some(0.0) and just document the Option-but-never-None quirk, or gate the None behind something opt-in.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could add a none_if_no_match: Option to let users opt-in into a postgres-like behavior if they want. We don't want to differ from ES, but we could add our own extensions. wdyt?

@trinity-1686a Love the opt-in framing. Pushed 4daa35d in that shape.

  • SumAggregation gets a new none_if_no_match: Option<bool> (skipped on serialize when unset, so JSON stays clean for the ES default path).
  • IntermediateSum::finalize still returns Option<f64> internally so the Rust side stays parallel to min/max/avg.
  • The ES-vs-SQL choice now lives at the boundary in IntermediateMetricResult::into_final_metric_result: None is coerced to Some(0.0) by default, and the flag opts out of that coercion. Added AggregationVariants::as_sum() to read it.
  • Two end-to-end tests on an empty index, one for the ES default ("value": 0) and one for none_if_no_match: true ("value": null). Full aggregation:: suite green locally.

So out of the box the wire format is back to matching ES, and pg-like behaviour just sets the flag on its side.

jamessewell pushed a commit to paradedb/tantivy that referenced this pull request Apr 8, 2026
## What
Return `None` from `IntermediateSum::finalize()` when `count == 0`.

## Why
`IntermediateSum::finalize()` returned `Some(0.0)` even when all
documents had missing/NULL values, unlike MIN, MAX, and AVG which all
return `None` for `count == 0`. The `0.0` came from `IntermediateStats`'
default sum initialization. Consumers (like ParadeDB) that map `None` to
SQL `NULL` were incorrectly getting `0` for SUM on all-NULL groups.

Fixes paradedb/paradedb#4621

## How
Check `count == 0` before returning the sum value, returning `None`
instead of `Some(0.0)`.

## Tests
- Verify SUM returns `None` when all values are missing/NULL
- Verify SUM still returns correct values for non-empty groups
- Verify MIN, MAX, AVG behavior unchanged

Upstream PR: quickwit-oss#2878
rebasedming pushed a commit to paradedb/tantivy that referenced this pull request Apr 15, 2026
## What
Return `None` from `IntermediateSum::finalize()` when `count == 0`.

## Why
`IntermediateSum::finalize()` returned `Some(0.0)` even when all
documents had missing/NULL values, unlike MIN, MAX, and AVG which all
return `None` for `count == 0`. The `0.0` came from `IntermediateStats`'
default sum initialization. Consumers (like ParadeDB) that map `None` to
SQL `NULL` were incorrectly getting `0` for SUM on all-NULL groups.

Fixes paradedb/paradedb#4621

## How
Check `count == 0` before returning the sum value, returning `None`
instead of `Some(0.0)`.

## Tests
- Verify SUM returns `None` when all values are missing/NULL
- Verify SUM still returns correct values for non-empty groups
- Verify MIN, MAX, AVG behavior unchanged

Upstream PR: quickwit-oss#2878
Surface the trade-off in the doc comment so future reviewers see why
this differs from ES (which returns "value": 0 for sum over
empty/all-missing buckets) and what consumers (ParadeDB SQL NULL) the
None variant is meant to serve.
Copy link
Copy Markdown
Collaborator Author

@mdashti mdashti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fulmicoton @PSeitz Thanks for the comments.

/// `IntermediateMin`, `IntermediateMax`, and `IntermediateAvg`.
pub fn finalize(&self) -> Option<f64> {
Some(self.stats.finalize().sum)
let stats = self.stats.finalize();
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On flipping Stats.sum itself to Option, agree the serde dance isn't worth it. Kept IntermediateStats untouched and branched only inside IntermediateSum::finalize so the broader stats path stays as-is.

/// `IntermediateMin`, `IntermediateMax`, and `IntermediateAvg`.
pub fn finalize(&self) -> Option<f64> {
Some(self.stats.finalize().sum)
let stats = self.stats.finalize();
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked ES on this. Sum on an empty / all-missing bucket comes back as "value": 0, while min/max/avg return "value": null. So strict ES parity does mean Some(0.0) for sum, and you're right that this PR diverges on the wire.

The reason we leaned toward None: ParadeDB's SQL SUM over zero rows is NULL, and Some(0.0) is indistinguishable from a bucket that genuinely sums to zero, so the SQL boundary loses that signal. Hence the asymmetry feels worse than the ES divergence from where we sit.

Re the breaking-change point, fair. SingleMetricResult.value is already Option<f64> for sum but never None on main, so consumers that already match on Option won't blow up. Anything coercing null → 0 to mirror the ES wire format will see a diff though.

Pulled in 4486166 a small doc note on finalize calling out the ES divergence and the SQL motivation, so future readers see the trade-off without rediscovering this thread.

Happy to take this whichever way you want, keep None, revert to Some(0.0) and just document the Option-but-never-None quirk, or gate the None behind something opt-in.

@trinity-1686a
Copy link
Copy Markdown
Collaborator

we could add a none_if_no_match: Option<bool> to let users opt-in into a postgres-like behavior if they want. We don't want to differ from ES, but we could add our own extensions. wdyt?

Switch the default serialized output of `sum` on empty / all-missing
buckets back to `"value": 0` to match Elasticsearch, and gate the
SQL-style `"value": null` behavior behind a new
`none_if_no_match: Option<bool>` flag on `SumAggregation`.

`IntermediateSum::finalize` still returns `Option<f64>` internally so
the Rust API stays parallel to min/max/avg, but the ES-vs-SQL choice is
made at the boundary in `IntermediateMetricResult::into_final_metric_result`:
`None` is coerced to `Some(0.0)` unless `none_if_no_match` is set on the
aggregation request.

Adds `AggregationVariants::as_sum()` accessor for that boundary check
and two end-to-end tests covering both the default ES behavior and the
opt-in null behavior on an empty index.
Copy link
Copy Markdown
Collaborator Author

@mdashti mdashti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trinity-1686a @fulmicoton Please take another look.

/// `IntermediateMin`, `IntermediateMax`, and `IntermediateAvg`.
pub fn finalize(&self) -> Option<f64> {
Some(self.stats.finalize().sum)
let stats = self.stats.finalize();
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could add a none_if_no_match: Option to let users opt-in into a postgres-like behavior if they want. We don't want to differ from ES, but we could add our own extensions. wdyt?

@trinity-1686a Love the opt-in framing. Pushed 4daa35d in that shape.

  • SumAggregation gets a new none_if_no_match: Option<bool> (skipped on serialize when unset, so JSON stays clean for the ES default path).
  • IntermediateSum::finalize still returns Option<f64> internally so the Rust side stays parallel to min/max/avg.
  • The ES-vs-SQL choice now lives at the boundary in IntermediateMetricResult::into_final_metric_result: None is coerced to Some(0.0) by default, and the flag opts out of that coercion. Added AggregationVariants::as_sum() to read it.
  • Two end-to-end tests on an empty index, one for the ES default ("value": 0) and one for none_if_no_match: true ("value": null). Full aggregation:: suite green locally.

So out of the box the wire format is back to matching ES, and pg-like behaviour just sets the flag on its side.

@mdashti mdashti requested review from PSeitz and fulmicoton May 15, 2026 02:44
The flag's purpose is described well enough by "SQL-style consumers";
no need to call out a specific downstream.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants