eager: dequantize INT8 weights instead of torch._int_mm on AMD RDNA / CPU#56
eager: dequantize INT8 weights instead of torch._int_mm on AMD RDNA / CPU#560xDELUXA wants to merge 1 commit into
Conversation
|
Important Review skippedNo new commits to review since the last review. ⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThe eager quantization path now selects INT8 matmul at runtime using environment and device checks. When disabled, ChangesRuntime INT8 path selection
Possibly related PRs
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
✅ All contributors have signed the CLA. Thank you! This PR is ready to be merged. |
|
I have read and agree to the Contributor License Agreement |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@comfy_kitchen/backends/eager/quantization.py`:
- Around line 845-857: The _use_int8_mm decision is currently cached only by
device type and reads the current CUDA device, so it can return the wrong result
on multi-GPU/ROCm setups. Update the call path around _use_int8_mm so the cache
key includes the tensor’s CUDA device index (not just x.device.type), and use
that index when querying device properties instead of
torch.cuda.current_device(). Keep the existing mode checks and fallback behavior
intact while making the lookup specific to the actual device.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 5801200e-940a-47fe-a0a2-24550ec2aa0f
📒 Files selected for processing (1)
comfy_kitchen/backends/eager/quantization.py
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
comfy_kitchen/backends/eager/quantization.py (1)
846-860: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winCache the hardware probe, not the env override.
Because
_use_int8_mmis cached, line 847’sCOMFY_KITCHEN_INT8_EAGERread can stay stale after the first call in long-lived tests/notebooks. Keep env parsing uncached and cache only the auto hardware detection.♻️ Proposed fix
-@functools.lru_cache(maxsize=None) def _use_int8_mm(device_type: str, device_index: int) -> bool: mode = os.environ.get("COMFY_KITCHEN_INT8_EAGER", "auto").lower() if mode == "int_mm": return True if mode == "dequant": return False + return _use_int8_mm_auto(device_type, device_index) + + +@functools.lru_cache(maxsize=None) +def _use_int8_mm_auto(device_type: str, device_index: int) -> bool: if device_type != "cuda" or not torch.cuda.is_available(): return False🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@comfy_kitchen/backends/eager/quantization.py` around lines 846 - 860, _update _use_int8_mm so COMFY_KITCHEN_INT8_EAGER is read and applied on every call, while only the hardware-probe branch is cached; move the lru/cache boundary to the CUDA/HIP detection logic inside _use_int8_mm (or a helper it calls) so env overrides like int_mm and dequant are never stale in long-lived sessions._
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@comfy_kitchen/backends/eager/quantization.py`:
- Around line 846-860: _update _use_int8_mm so COMFY_KITCHEN_INT8_EAGER is read
and applied on every call, while only the hardware-probe branch is cached; move
the lru/cache boundary to the CUDA/HIP detection logic inside _use_int8_mm (or a
helper it calls) so env overrides like int_mm and dequant are never stale in
long-lived sessions._
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: e27cfb7d-5acc-4a35-a4ee-f23bf68a3163
📒 Files selected for processing (1)
comfy_kitchen/backends/eager/quantization.py
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
comfy_kitchen/backends/eager/quantization.py (1)
897-905: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winPreserve ConvRot before the dequant hop — or results may flop.
The new early
F.linearfallback appears to bypass the activation rotation used byquantize_and_rotate_rowwise, whileconvrotis still forwarded into this op. Forconvrot=Trueon CPU/RDNA, rotatexbefore the dequantized linear path or gate this fallback until the ConvRot-equivalent path is applied.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@comfy_kitchen/backends/eager/quantization.py` around lines 897 - 905, The early fallback in the `quantization.py` linear path bypasses the activation rotation, so `convrot=True` can produce incorrect results on the dequantized `F.linear` branch. Update the fallback around the `torch.nn.functional.linear` call to apply the same rotation handling used by `quantize_and_rotate_rowwise` before dequantizing, or otherwise prevent this branch from running when `convrot` is enabled until the ConvRot-equivalent path is preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@comfy_kitchen/backends/eager/quantization.py`:
- Around line 897-905: The early fallback in the `quantization.py` linear path
bypasses the activation rotation, so `convrot=True` can produce incorrect
results on the dequantized `F.linear` branch. Update the fallback around the
`torch.nn.functional.linear` call to apply the same rotation handling used by
`quantize_and_rotate_rowwise` before dequantizing, or otherwise prevent this
branch from running when `convrot` is enabled until the ConvRot-equivalent path
is preserved.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 1546ff0a-b444-4e0a-ac41-db9175152222
📒 Files selected for processing (1)
comfy_kitchen/backends/eager/quantization.py
29dafa0 to
cceb93f
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@comfy_kitchen/backends/eager/quantization.py`:
- Around line 931-935: The ConvRot validation in quantization.py should reject a
zero `convrot_groupsize` before using it in the modulus check. Update the
`convrot` guard around the existing `ValueError` path so `convrot_groupsize ==
0` raises a clear `ValueError` instead of triggering `ZeroDivisionError`,
keeping the check alongside the current feature-divisibility validation in the
same block.
- Around line 870-876: The _use_int8_mm helper currently treats any unknown
COMFY_KITCHEN_INT8_EAGER value as auto, which hides configuration mistakes.
Update _use_int8_mm to validate the env var before falling back, explicitly
handling only the supported modes (int_mm, dequant, auto) and raising an error
for any other value. Keep the behavior centered in _use_int8_mm and its
_use_int8_mm_auto fallback so invalid overrides are rejected instead of silently
accepted.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: c330eaf5-9a6d-4924-8c9a-61e1ad062c66
📒 Files selected for processing (1)
comfy_kitchen/backends/eager/quantization.py
There was a problem hiding this comment.
Caution
Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@comfy_kitchen/backends/eager/quantization.py`:
- Around line 931-935: The ConvRot validation in quantization.py should reject a
zero `convrot_groupsize` before using it in the modulus check. Update the
`convrot` guard around the existing `ValueError` path so `convrot_groupsize ==
0` raises a clear `ValueError` instead of triggering `ZeroDivisionError`,
keeping the check alongside the current feature-divisibility validation in the
same block.
- Around line 870-876: The _use_int8_mm helper currently treats any unknown
COMFY_KITCHEN_INT8_EAGER value as auto, which hides configuration mistakes.
Update _use_int8_mm to validate the env var before falling back, explicitly
handling only the supported modes (int_mm, dequant, auto) and raising an error
for any other value. Keep the behavior centered in _use_int8_mm and its
_use_int8_mm_auto fallback so invalid overrides are rejected instead of silently
accepted.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: c330eaf5-9a6d-4924-8c9a-61e1ad062c66
📒 Files selected for processing (1)
comfy_kitchen/backends/eager/quantization.py
🛑 Comments failed to post (2)
comfy_kitchen/backends/eager/quantization.py (2)
870-876: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Reject invalid INT8 override values instead of silently using
auto.Line 871 treats any unknown
COMFY_KITCHEN_INT8_EAGERvalue asauto, so a typo can summon the wrong matmul goblin.Proposed fix
def _use_int8_mm(device_type: str, device_index: int) -> bool: - mode = os.environ.get("COMFY_KITCHEN_INT8_EAGER", "auto").lower() + mode = os.environ.get("COMFY_KITCHEN_INT8_EAGER", "auto").strip().lower() if mode == "int_mm": return True if mode == "dequant": return False - return _use_int8_mm_auto(device_type, device_index) + if mode == "auto": + return _use_int8_mm_auto(device_type, device_index) + raise ValueError("COMFY_KITCHEN_INT8_EAGER must be one of: auto, int_mm, dequant")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.def _use_int8_mm(device_type: str, device_index: int) -> bool: mode = os.environ.get("COMFY_KITCHEN_INT8_EAGER", "auto").strip().lower() if mode == "int_mm": return True if mode == "dequant": return False if mode == "auto": return _use_int8_mm_auto(device_type, device_index) raise ValueError("COMFY_KITCHEN_INT8_EAGER must be one of: auto, int_mm, dequant")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@comfy_kitchen/backends/eager/quantization.py` around lines 870 - 876, The _use_int8_mm helper currently treats any unknown COMFY_KITCHEN_INT8_EAGER value as auto, which hides configuration mistakes. Update _use_int8_mm to validate the env var before falling back, explicitly handling only the supported modes (int_mm, dequant, auto) and raising an error for any other value. Keep the behavior centered in _use_int8_mm and its _use_int8_mm_auto fallback so invalid overrides are rejected instead of silently accepted.
931-935: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win
Guard
convrot_groupsizebefore using it as a divisor.Line 932 can raise
ZeroDivisionErrorwhenconvrot_groupsizeis0; return a clearValueErrorinstead.Proposed fix
if convrot: + if convrot_groupsize <= 0: + raise ValueError(f"ConvRot group size must be positive, got {convrot_groupsize}") if x.shape[-1] % convrot_groupsize != 0: raise ValueError( f"ConvRot group size {convrot_groupsize} does not divide input features {x.shape[-1]}" )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.if convrot: if convrot_groupsize <= 0: raise ValueError(f"ConvRot group size must be positive, got {convrot_groupsize}") if x.shape[-1] % convrot_groupsize != 0: raise ValueError( f"ConvRot group size {convrot_groupsize} does not divide input features {x.shape[-1]}" )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@comfy_kitchen/backends/eager/quantization.py` around lines 931 - 935, The ConvRot validation in quantization.py should reject a zero `convrot_groupsize` before using it in the modulus check. Update the `convrot` guard around the existing `ValueError` path so `convrot_groupsize == 0` raises a clear `ValueError` instead of triggering `ZeroDivisionError`, keeping the check alongside the current feature-divisibility validation in the same block.
cceb93f to
77a8854
Compare
|
The @coderabbitai resolve |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@comfy_kitchen/backends/eager/quantization.py`:
- Around line 941-948: The dequant fallback in the linear path is downcasting
`weight_scale` too early, which reduces accuracy for bf16 outputs. Update the
fallback inside `quantization.py` so the scale multiply happens in float32,
matching the `_int_mm` path’s `chunk.float() * chunk_scales` behavior, and only
cast the final product to `out_dtype` just before calling
`torch.nn.functional.linear`. Keep the existing `out_dtype`, `weight_scale`, and
`weight` flow intact, but move the precision-sensitive conversion to the end of
the computation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: b9a69d60-cc0f-44b4-8102-050d5c65597d
📒 Files selected for processing (1)
comfy_kitchen/backends/eager/quantization.py
✅ Action performedComments resolved. Approval is disabled; enable |
|
✏️ Learnings added
✅ Action performedComments resolved. Approval is disabled; enable |
|
✅ Action performedComments resolved. Approval is disabled; enable |
77a8854 to
d1dbc71
Compare
|
@coderabbitai review |
✅ Action performedReviews resumed. |
|
@rattus128 I hope you don't mind the ping, but based on my experience, you are one of the people from the ComfyUI team open to AMD support. |
✅ Action performedReview finished.
|
d1dbc71 to
9f60e02
Compare
Action performedReview triggered.
|
✅ Action performedReview finished.
|
On AMD RDNA GPUs (and CPU), eager
int8_linearrelies ontorch._int_mm, which has no fast hipBLASLt INT8 path there - a performance cliff on RDNA3/4, and a hard failure on RDNA2 and other archs that have no int8 GEMM backend at all.This adds a device-aware branch: when a fast int8 GEMM isn't available, dequantize the INT8 weight to the compute dtype and run a normal
F.linear(a bf16/fp16 GEMM, which rocBLAS supports broadly). NVIDIA and AMD CDNA keep the_int_mmpath. Weights remain stored as INT8 (VRAM saving preserved) and activations stay full precision.Benchmarked on an RX 9060 XT (gfx1200, ROCm 7.14, bf16) across DiT linear shapes -
_int_mmlost every case; dequantize+bf16 was 2-9x faster.End-to-end on the Krea2 INT8 model: the old
_int_mmpath (COMFY_KITCHEN_INT8_EAGER=int_mm) gives ~17 s/it, while the new automatic dequant path (no env var) gives ~5 s/it.Selection is automatic by arch, with
COMFY_KITCHEN_INT8_EAGER=auto|int_mm|dequantas an override.Scope: Improves the eager backend on AMD RDNA3/4 and unblocks archs with no int8 GEMM (e.g., RDNA2).