Skip to content

[QA test fix] Adjust image dedup example#1781

Open
huvunvidia wants to merge 3 commits intomainfrom
huvu/image_qa_test
Open

[QA test fix] Adjust image dedup example#1781
huvunvidia wants to merge 3 commits intomainfrom
huvu/image_qa_test

Conversation

@huvunvidia
Copy link
Copy Markdown
Contributor

@huvunvidia huvunvidia commented Apr 9, 2026

Description

Fix QA issue: https://nvbugspro.nvidia.com/bug/6054589

  • Extract download logic into _download_step() to fix PLR0915 lint
  • Restart Ray between steps 2.2 and 2.3 to evict idle CLIP actors and prevent OOM (Bug 1)
  • Reduce ImageReaderStage num_threads 16→4 in the dedup pipeline to prevent SIGSEGV from DALI GC pressure (Bug 2)

Usage

# Add snippet demonstrating usage

Checklist

  • I am familiar with the Contributing Guide.
  • New or Existing tests cover these changes.
  • The documentation is up to date with these changes.

@huvunvidia huvunvidia requested a review from a team as a code owner April 9, 2026 20:59
@huvunvidia huvunvidia requested review from sarahyurick and removed request for a team April 9, 2026 20:59
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 9, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 9, 2026

Greptile Summary

This PR fixes three QA issues in the image deduplication example script: it extracts the download logic into _download_step() to fix a PLR0915 lint error, adds a Ray cluster restart between steps 2.2 and 2.3 to evict idle CLIP embedding actors (~17 GB each) and prevent OOM, and reduces ImageReaderStage.num_threads from 16 to 4 in create_image_deduplication_pipeline to prevent SIGSEGV under DALI GC pressure. All three changes are targeted and well-scoped.

Confidence Score: 5/5

This PR is safe to merge — all changes are targeted bug fixes with no logic errors introduced.

No P0 or P1 findings. The three changes (function extraction, Ray restart, reduced num_threads) are each correct and directly address the stated bugs. Ray lifecycle is handled properly: the new client created on restart is the one stopped at the end of main().

No files require special attention.

Vulnerabilities

No security concerns identified.

Important Files Changed

Filename Overview
tutorials/image/getting-started/image_dedup_example.py Extracted download logic into _download_step(), added Ray restart between steps 2.2 and 2.3, and reduced num_threads from 16 to 4 in the dedup pipeline — all changes are correct and match the stated intent.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Start main] --> B[ray_client.start]
    B --> C[_download_step]
    C --> D{skip_download?}
    D -- No --> E[download_webdataset]
    D -- Yes --> F[Use existing dataset]
    E --> G[Step 2.1: Image Embedding Pipeline\nImageReaderStage num_threads=16]
    F --> G
    G --> H[Step 2.2: Semantic Dedup Workflow\nembeddings → removal parquets]
    H --> I[ray_client.stop\nEvict CLIP actors ~17GB each]
    I --> J[ray_client = RayClient\nray_client.start]
    J --> K[Step 2.3: Image Deduplication Pipeline\nImageReaderStage num_threads=4]
    K --> L[ray_client.stop]
    L --> M[End]
Loading

Reviews (3): Last reviewed commit: "Merge remote-tracking branch 'origin/mai..." | Re-trigger Greptile

Huy Vu2 and others added 2 commits April 9, 2026 14:05
- Restart Ray between steps 2.2 and 2.3 to evict idle CLIP actors and prevent OOM (Bug 1)
- Reduce ImageReaderStage num_threads 16→4 in dedup pipeline to prevent SIGSEGV from DALI GC pressure (Bug 2)
- Extract download logic into _download_step() to fix PLR0915 lint

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment on lines +177 to +178
# Restart Ray between steps 2.2 and 2.3 to evict idle CLIP embedding actors
# (~17 GB each) left over from step 2.1.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should fix this issue at the root

@sarahyurick sarahyurick added the r1.2.0 Pick this label for auto cherry-picking into r1.2.0 label Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

r1.2.0 Pick this label for auto cherry-picking into r1.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants