Commit 43626cd
[FLINK-39399] Fix integer overflow in HyperLogLogPlusPlus causing APPROX_COUNT_DISTINCT undercount
In HyperLogLogPlusPlus.query(), the expression "1 << mIdx" uses int
shift which wraps modulo 32. At high cardinality (~100M+ distinct
values), some registers accumulate values >= 32, causing the shift to
produce incorrect zInverse contributions and wildly wrong estimates.
Change "1 << mIdx" to "1L << mIdx" to use long shift arithmetic.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 463330c commit 43626cd
2 files changed
Lines changed: 32 additions & 2 deletions
File tree
- flink-table/flink-table-runtime/src
- main/java/org/apache/flink/table/runtime/functions/aggregate/hyperloglog
- test/java/org/apache/flink/table/runtime/functions/aggregate/hyperloglog
Lines changed: 2 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4946 | 4946 | | |
4947 | 4947 | | |
4948 | 4948 | | |
4949 | | - | |
4950 | | - | |
| 4949 | + | |
| 4950 | + | |
4951 | 4951 | | |
4952 | 4952 | | |
4953 | 4953 | | |
| |||
Lines changed: 30 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
159 | 159 | | |
160 | 160 | | |
161 | 161 | | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
162 | 192 | | |
163 | 193 | | |
164 | 194 | | |
| |||
0 commit comments