Skip to content

[MINOR][CORE] Fix swapped depth/width formulas in CountMinSketch class doc#56897

Closed
LuciferYang wants to merge 1 commit into
apache:masterfrom
LuciferYang:minor-countminsketch-doc-fix
Closed

[MINOR][CORE] Fix swapped depth/width formulas in CountMinSketch class doc#56897
LuciferYang wants to merge 1 commit into
apache:masterfrom
LuciferYang:minor-countminsketch-doc-fix

Conversation

@LuciferYang

@LuciferYang LuciferYang commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

This PR fixes the class-level Javadoc of org.apache.spark.util.sketch.CountMinSketch, which assigned the two table dimensions to the wrong symbols. The doc previously read:

d = ceil(2 / eps)
w = ceil(-log(1 - confidence) / log(2))

but the implementation (CountMinSketchImpl, the (eps, confidence, seed) constructor) actually sets:

this.width = (int) Math.ceil(2 / eps);
this.depth = (int) Math.ceil(-Math.log1p(-confidence) / Math.log(2));

i.e. width is derived from eps and depth from confidence -- the opposite of what the doc stated. The fix swaps the two lines so the doc matches the code:

w = ceil(2 / eps)
d = ceil(-log(1 - confidence) / log(2))

Why are the changes needed?

Doc fix

Does this PR introduce any user-facing change?

No. Documentation-only change; no behavior, serialization, or API impact.

How was this patch tested?

Pass Github Actions

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code

…s doc

The class doc had `d` and `w` swapped relative to the implementation, which sets `width = ceil(2 / eps)` and `depth = ceil(-log(1 - confidence) / log(2))`.
@MaxGekk

MaxGekk commented Jun 30, 2026

Copy link
Copy Markdown
Member

LGTM. The corrected formulas match CountMinSketchImpl: width = ceil(2 / eps) and depth = ceil(-log(1 - confidence) / log(2)) (and table = new long[depth][width]). Thanks for the fix!

@MaxGekk

MaxGekk commented Jun 30, 2026

Copy link
Copy Markdown
Member

+1, LGTM. Merging to master/4.x.
Thank you, @LuciferYang.

@MaxGekk MaxGekk closed this in 1409d88 Jun 30, 2026
MaxGekk pushed a commit that referenced this pull request Jun 30, 2026
…s doc

### What changes were proposed in this pull request?

This PR fixes the class-level Javadoc of `org.apache.spark.util.sketch.CountMinSketch`, which assigned the two table dimensions to the wrong symbols. The doc previously read:

```
d = ceil(2 / eps)
w = ceil(-log(1 - confidence) / log(2))
```

but the implementation (`CountMinSketchImpl`, the `(eps, confidence, seed)` constructor) actually sets:

```java
this.width = (int) Math.ceil(2 / eps);
this.depth = (int) Math.ceil(-Math.log1p(-confidence) / Math.log(2));
```

i.e. `width` is derived from `eps` and `depth` from `confidence` -- the opposite of what the doc stated. The fix swaps the two lines so the doc matches the code:

```
w = ceil(2 / eps)
d = ceil(-log(1 - confidence) / log(2))
```

### Why are the changes needed?

Doc fix

### Does this PR introduce _any_ user-facing change?

No. Documentation-only change; no behavior, serialization, or API impact.

### How was this patch tested?

Pass Github Actions

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code

Closes #56897 from LuciferYang/minor-countminsketch-doc-fix.

Authored-by: YangJie <yangjie01@baidu.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(cherry picked from commit 1409d88)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants