🐛 Fix multi-gpu error (AttributeError) for kongnet and nucleus_detector#1074
🐛 Fix multi-gpu error (AttributeError) for kongnet and nucleus_detector#1074gozdeg wants to merge 1 commit into
kongnet and nucleus_detector#1074Conversation
| with torch.inference_mode(): | ||
| logits = model(imgs) | ||
| target_logits = logits[:, model.target_channels, :, :] | ||
| target_logits = logits[:, target_channels, :, :] |
There was a problem hiding this comment.
@shaneahmed _get_model_attr() was defined in engineABC, I don't think we can reach it from here
There was a problem hiding this comment.
Can you try this?
if hasattr(model, "target_channels"):
target_channels = model.target_channels
elif hasattr(model.module, "target_channels"):
target_channels = model.module.target_channels
else:
raise AttributeError:kongnet and nucleus_detector
There was a problem hiding this comment.
Pull request overview
This PR addresses a multi-GPU crash in TIAToolbox nucleus detection paths by ensuring model attributes are read safely when the model is wrapped by nn.DataParallel / DistributedDataParallel.
Changes:
- Update
NucleusDetector.post_process_wsito readmin_distanceandtile_shapevia_get_model_attr()instead of directly fromself.model. - Update
KongNet.infer_batchto resolvetarget_channelswhenmodelis wrapped, avoidingAttributeErrorunder multi-GPU wrappers.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
tiatoolbox/models/engine/nucleus_detector.py |
Uses the existing unwrapping helper to access model attributes under DP/DDP. |
tiatoolbox/models/architecture/kongnet.py |
Adds wrapper-aware access to target_channels so inference works under DP/DDP. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # min_distance and postproc_tile_shape cannot be None here | ||
| min_distance = kwargs.get("min_distance") | ||
| if min_distance is None: | ||
| min_distance = self.model.min_distance | ||
| min_distance = self._get_model_attr("min_distance") | ||
| tile_shape = kwargs.get("tile_shape") |
| try: | ||
| target_channels = model.target_channels | ||
| except AttributeError: | ||
| target_channels = model.module.target_channels | ||
|
|
| try: | ||
| target_channels = model.target_channels | ||
| except AttributeError: | ||
| target_channels = model.module.target_channels | ||
|
|
||
| with torch.inference_mode(): | ||
| logits = model(imgs) | ||
| target_logits = logits[:, model.target_channels, :, :] | ||
| target_logits = logits[:, target_channels, :, :] |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #1074 +/- ##
===========================================
- Coverage 99.88% 99.87% -0.02%
===========================================
Files 85 85
Lines 11626 11630 +4
Branches 1524 1524
===========================================
+ Hits 11613 11615 +2
- Misses 7 9 +2
Partials 6 6 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Follow-up to the earlier multi-GPU fix. kongnet and nucleus_detector were missed and still crash on multi-GPU machines with:
AttributeError: 'DataParallel' object has no attribute 'target_channels'
When multi-gpu, the model is wrapped in
nn.DataParallel/DistributedDataParallel, so these need to go through the_get_model_attr()wrapper helper function (or a similar approach) which unwraps the module before reading the attribute.Changes
nucleus_detector.py: readmin_distance/tile_shapevia the_get_model_attrhelper instead of directly offself.model.kongnet.py: unwrap the module before readingtarget_channelsininfer_batch. This can't reach the helper function, so it applies the same unwrap inline.