CAGRA Bloom Filter#2236
Conversation
📝 WalkthroughSummary by CodeRabbit
WalkthroughThis PR integrates cuco bloom filters into CAGRA search by adding a new ChangesBloom Filter Support for CAGRA Search
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
examples/cpp/plot_filter_benchmark.py (1)
120-153: 💤 Low value
plot_panel_linesappears unused.This function is defined but never called in the script. Consider removing it if not needed, or adding a comment indicating its intended future use.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/cpp/plot_filter_benchmark.py` around lines 120 - 153, The function plot_panel_lines is defined but is never called anywhere in the script. Either remove this function entirely if it is not needed, or if it is intended for future use, add a docstring or comment at the top of the function that clearly explains its intended use case and why it is being retained. This will help clarify whether the function is dead code or work-in-progress that should remain in the codebase.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@examples/cpp/src/cagra_filter_benchmark.cu`:
- Around line 146-151: The `valid_ids_device` variable is destroyed when the if
block ends at line 151, but the `bloom.add_async()` call is asynchronous and may
still access this memory after it's freed. To fix this, either move the stream
synchronization call (currently at line 158) inside the if block before the
closing brace to ensure the async operation completes before valid_ids_device is
destroyed, or replace the `bloom.add_async()` call with the synchronous
`bloom.add()` method to avoid the race condition entirely.
---
Nitpick comments:
In `@examples/cpp/plot_filter_benchmark.py`:
- Around line 120-153: The function plot_panel_lines is defined but is never
called anywhere in the script. Either remove this function entirely if it is not
needed, or if it is intended for future use, add a docstring or comment at the
top of the function that clearly explains its intended use case and why it is
being retained. This will help clarify whether the function is dead code or
work-in-progress that should remain in the codebase.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 232cce46-f7f4-454f-98de-3f4904ed45c9
📒 Files selected for processing (3)
examples/cpp/CMakeLists.txtexamples/cpp/plot_filter_benchmark.pyexamples/cpp/src/cagra_filter_benchmark.cu
🚧 Files skipped from review as they are similar to previous changes (1)
- examples/cpp/CMakeLists.txt
| if (!valid_ids_host.empty()) { | ||
| rmm::device_uvector<key_type> valid_ids_device(valid_ids_host.size(), stream); | ||
| raft::copy(valid_ids_device.data(), valid_ids_host.data(), valid_ids_host.size(), stream); | ||
| bloom.add_async( | ||
| valid_ids_device.data(), valid_ids_device.data() + valid_ids_device.size(), stream); | ||
| } |
There was a problem hiding this comment.
Use-after-free: valid_ids_device destroyed before async operation completes.
valid_ids_device is destroyed when the if block ends (line 151), but bloom.add_async is asynchronous. The device memory may be freed while the kernel is still accessing it. The sync_stream at line 158 is outside this block, so it does not protect this memory.
🐛 Proposed fix: move sync inside the if-block
if (!valid_ids_host.empty()) {
rmm::device_uvector<key_type> valid_ids_device(valid_ids_host.size(), stream);
raft::copy(valid_ids_device.data(), valid_ids_host.data(), valid_ids_host.size(), stream);
bloom.add_async(
valid_ids_device.data(), valid_ids_device.data() + valid_ids_device.size(), stream);
+ raft::resource::sync_stream(res);
}Alternatively, use the synchronous bloom.add() instead of add_async().
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (!valid_ids_host.empty()) { | |
| rmm::device_uvector<key_type> valid_ids_device(valid_ids_host.size(), stream); | |
| raft::copy(valid_ids_device.data(), valid_ids_host.data(), valid_ids_host.size(), stream); | |
| bloom.add_async( | |
| valid_ids_device.data(), valid_ids_device.data() + valid_ids_device.size(), stream); | |
| } | |
| if (!valid_ids_host.empty()) { | |
| rmm::device_uvector<key_type> valid_ids_device(valid_ids_host.size(), stream); | |
| raft::copy(valid_ids_device.data(), valid_ids_host.data(), valid_ids_host.size(), stream); | |
| bloom.add_async( | |
| valid_ids_device.data(), valid_ids_device.data() + valid_ids_device.size(), stream); | |
| raft::resource::sync_stream(res); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/cpp/src/cagra_filter_benchmark.cu` around lines 146 - 151, The
`valid_ids_device` variable is destroyed when the if block ends at line 151, but
the `bloom.add_async()` call is asynchronous and may still access this memory
after it's freed. To fix this, either move the stream synchronization call
(currently at line 158) inside the if block before the closing brace to ensure
the async operation completes before valid_ids_device is destroyed, or replace
the `bloom.add_async()` call with the synchronous `bloom.add()` method to avoid
the race condition entirely.
No description provided.