[QA test fix] Adjust image dedup example#1781
Conversation
Greptile SummaryThis PR fixes three QA issues in the image deduplication example script: it extracts the download logic into Confidence Score: 5/5This PR is safe to merge — all changes are targeted bug fixes with no logic errors introduced. No P0 or P1 findings. The three changes (function extraction, Ray restart, reduced num_threads) are each correct and directly address the stated bugs. Ray lifecycle is handled properly: the new client created on restart is the one stopped at the end of main(). No files require special attention.
|
| Filename | Overview |
|---|---|
| tutorials/image/getting-started/image_dedup_example.py | Extracted download logic into _download_step(), added Ray restart between steps 2.2 and 2.3, and reduced num_threads from 16 to 4 in the dedup pipeline — all changes are correct and match the stated intent. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Start main] --> B[ray_client.start]
B --> C[_download_step]
C --> D{skip_download?}
D -- No --> E[download_webdataset]
D -- Yes --> F[Use existing dataset]
E --> G[Step 2.1: Image Embedding Pipeline\nImageReaderStage num_threads=16]
F --> G
G --> H[Step 2.2: Semantic Dedup Workflow\nembeddings → removal parquets]
H --> I[ray_client.stop\nEvict CLIP actors ~17GB each]
I --> J[ray_client = RayClient\nray_client.start]
J --> K[Step 2.3: Image Deduplication Pipeline\nImageReaderStage num_threads=4]
K --> L[ray_client.stop]
L --> M[End]
Reviews (3): Last reviewed commit: "Merge remote-tracking branch 'origin/mai..." | Re-trigger Greptile
- Restart Ray between steps 2.2 and 2.3 to evict idle CLIP actors and prevent OOM (Bug 1) - Reduce ImageReaderStage num_threads 16→4 in dedup pipeline to prevent SIGSEGV from DALI GC pressure (Bug 2) - Extract download logic into _download_step() to fix PLR0915 lint Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| # Restart Ray between steps 2.2 and 2.3 to evict idle CLIP embedding actors | ||
| # (~17 GB each) left over from step 2.1. |
There was a problem hiding this comment.
I think we should fix this issue at the root
Description
Fix QA issue: https://nvbugspro.nvidia.com/bug/6054589
Usage
# Add snippet demonstrating usageChecklist