fix(easyocr): unwrap DataParallel to prevent SIGABRT under concurrent load by anandray · Pull Request #1070 · rocketride-org/rocketride-server

anandray · 2026-06-02T16:46:47Z

Summary

EasyOCR initialises its detector and recogniser wrapped in torch.nn.DataParallel, which scatters every inference call across all visible GPUs via parallel_apply() worker threads. On an 8× H200 box this spawns up to 8 CUDA threads per inference request. Under the chaos test's OVERLOAD phase (32 concurrent workers), these threads collide and produce a FATAL crash — tcache_thread_shutdown() SIGABRT from a parallel_apply worker thread:

easyocr/recognition.py: recognizer_predict
easyocr/recognition.py: get_text
easyocr/easyocr.py: recognize / readtext
torch/nn/parallel/parallel_apply.py: parallel_apply   ← crash here
FATAL ERROR: Application has terminated unexpectedly
[14,250MB] Minidump created: /tmp/...dmp

Fix: After Reader() initialises, unwrap DataParallel for both detector and recogniser and move them to the single GPU the model server allocated. Each EasyOCR instance stays on its own device with no cross-GPU scatter.

Test plan

./builder model_server:test — 35 passed, 11 deselected
./builder model_server:test-chaos — 1 passed (17 min), server previously crashed 9× per run; with this fix the server stays up for the full chaos duration

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes
- Improved OCR GPU handling to keep OCR processing components pinned to the selected GPU, preventing unintended replication across devices and improving stability and performance under concurrent workloads.
- Added safer handling and informative logging when GPU pinning isn't applicable to avoid crashes or silent failures.

… load EasyOCR initialises its detector and recogniser wrapped in torch.nn.DataParallel, which scatters every inference call across ALL visible GPUs via parallel_apply() worker threads. On an 8× H200 box this spawns up to 8 CUDA threads per request; under the 32-worker chaos test's OVERLOAD phase they collide and produce a FATAL crash (tcache/SIGABRT from a CUDA kernel thread). After Reader() initialises, unwrap DataParallel for both sub-models and move them to the single GPU that the model server allocated. This keeps each EasyOCR copy on its own device without cross-GPU scatter. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai · 2026-06-02T16:47:02Z

📝 Walkthrough

Walkthrough

Pins EasyOCR's internal detector and recognizer modules to a single CUDA device by unwrapping torch.nn.DataParallel and moving the underlying modules to cuda:{gpu_index} when GPU use is enabled and gpu_index >= 0.

Changes

EasyOCR GPU Pinning

Layer / File(s)	Summary
DataParallel unwrapping for GPU device pinning `packages/ai/src/ai/common/models/ocr/easyocr.py`	After reader creation, inspects `detector` and `recognizer`; skips missing attributes, unwraps `torch.nn.DataParallel` by replacing with `.module` and moves the module to `cuda:{gpu_index}`, otherwise logs module type and inferred parameter device.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through code at break of dawn,
Unwrapped the wraps that wandered on,
Pinned detector, nudged recognizer near,
Now GPUs hum without a fear,
Quiet carrots, compute, and brawn.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically summarizes the main change: unwrapping DataParallel to fix a SIGABRT crash under concurrent load, which is the core objective of the PR.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/easyocr-datparallel-crash

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-02T16:47:10Z

No description provided.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/ai/src/ai/common/models/ocr/easyocr.py`:
- Around line 148-152: Add info-level logging around the existing loop over
('detector','recognizer') to surface missing or non-DataParallel attributes: for
each attr, retrieve module = getattr(reader, attr, None); if module is None log
logger.info indicating the reader is missing that expected attribute (include
attr and reader class/name), else if not isinstance(module,
torch.nn.DataParallel) log logger.info that the attribute exists but is not
wrapped in DataParallel (include attr and type(module) and target/gpu_index
context); leave the current DataParallel unwrapping and logger.debug line
unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2c9e4c28-7723-4d89-b1b7-b7a62e7bc69e

📥 Commits

Reviewing files that changed from the base of the PR and between eae941c and 6559e59.

📒 Files selected for processing (1)

packages/ai/src/ai/common/models/ocr/easyocr.py

…butes Log at info level when detector or recognizer is absent from the reader (EasyOCR API change) or present but not wrapped in DataParallel (future version change), so device-pinning failures are diagnosable. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/ai/src/ai/common/models/ocr/easyocr.py`:
- Around line 155-158: The non-DataParallel logging path in easyocr.py can raise
StopIteration when calling next(module.parameters()) for parameterless modules;
update the logic in the block that logs "EasyOCR {attr}: not wrapped in
DataParallel" to use a safe probe like next(module.parameters(), None) and if it
returns None set device='unknown' (or infer from module if possible) before
calling logger.info so parameterless modules do not throw during model loading;
adjust the code around the module/device determination where module and attr are
referenced.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0bd828ec-7ef7-4f17-a29e-92aedc66887e

📥 Commits

Reviewing files that changed from the base of the PR and between 6559e59 and e2fbdcc.

📒 Files selected for processing (1)

packages/ai/src/ai/common/models/ocr/easyocr.py

…less modules next(module.parameters()) raises StopIteration when the module has no registered parameters — a valid case for some EasyOCR sub-modules. Use the default-value form and guard device access on None. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

packages/ai/src/ai/common/models/ocr/easyocr.py (1)
132-159: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Pass the allocated CUDA device into easyocr.Reader as well.

The loader unpacks/moves reader.detector and reader.recognizer to cuda:{gpu_index}, but easyocr.Reader is constructed with gpu=use_gpu (boolean), so EasyOCR keeps reader.device as the generic CUDA device (typically cuda/cuda:0). EasyOCR uses reader.device during inference to place tensors, which can cause device mismatches or silently target the wrong GPU when gpu_index != 0. EasyOCR supports passing a concrete device string to gpu (e.g., gpu='cuda:3').
🔧 Minimal fix
         try:
             reader = easyocr.Reader(
                 languages,
-                gpu=use_gpu,
+                gpu=torch_device if use_gpu else False,
                 verbose=False,
             )
         except Exception as e:
             logger.error(f'Failed to load EasyOCR: {e}')
             raise Exception(f'Failed to load EasyOCR: {e}')
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/ai/src/ai/common/models/ocr/easyocr.py` around lines 132 - 159, When
constructing easyocr.Reader, pass the concrete CUDA device string when a
specific GPU index is allocated instead of the boolean use_gpu; i.e., compute
gpu_arg = f'cuda:{gpu_index}' if use_gpu and gpu_index >= 0 else use_gpu and
pass that into easyocr.Reader(...) so reader.device is set to the same device
you later pin detector/recognizer to (symbols: easyocr.Reader, reader, use_gpu,
gpu_index, detector, recognizer, reader.device).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@packages/ai/src/ai/common/models/ocr/easyocr.py`:
- Around line 132-159: When constructing easyocr.Reader, pass the concrete CUDA
device string when a specific GPU index is allocated instead of the boolean
use_gpu; i.e., compute gpu_arg = f'cuda:{gpu_index}' if use_gpu and gpu_index >=
0 else use_gpu and pass that into easyocr.Reader(...) so reader.device is set to
the same device you later pin detector/recognizer to (symbols: easyocr.Reader,
reader, use_gpu, gpu_index, detector, recognizer, reader.device).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 731a4e10-a401-46c0-acf9-b78b411e5b4a

📥 Commits

Reviewing files that changed from the base of the PR and between e2fbdcc and 47925a4.

📒 Files selected for processing (1)

packages/ai/src/ai/common/models/ocr/easyocr.py

anandray requested review from Rod-Christensen, jmaionchi and stepmikhaylov as code owners June 2, 2026 16:46

anandray requested review from asclearuc, dsapandora and kwit75 June 2, 2026 16:46

github-actions Bot added the module:ai AI/ML modules label Jun 2, 2026

coderabbitai Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread packages/ai/src/ai/common/models/ocr/easyocr.py

coderabbitai Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread packages/ai/src/ai/common/models/ocr/easyocr.py

coderabbitai Bot reviewed Jun 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(easyocr): unwrap DataParallel to prevent SIGABRT under concurrent load#1070

fix(easyocr): unwrap DataParallel to prevent SIGABRT under concurrent load#1070
anandray wants to merge 3 commits into
developfrom
fix/easyocr-datparallel-crash

anandray commented Jun 2, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 2, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anandray commented Jun 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

anandray commented Jun 2, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 2, 2026 •

edited

Loading