Skip to content

Change rounding method in quantization process (triton backend fix for amd gpus)#54

Open
patientx wants to merge 1 commit into
Comfy-Org:mainfrom
patientx:patch-1
Open

Change rounding method in quantization process (triton backend fix for amd gpus)#54
patientx wants to merge 1 commit into
Comfy-Org:mainfrom
patientx:patch-1

Conversation

@patientx

Copy link
Copy Markdown

fails on ROCm:
AttributeError("module 'triton.language.extra.hip.libdevice' has no attribute 'rint'") The generic from triton.language.extra import libdevice import resolves to triton.language.extra.hip.libdevice on AMD GPUs, which doesn't expose rint. This is the same gap as triton-lang#7135, where libdevice.round is declared but throws the same AttributeError when actually compiled on HIP — the documented workaround there is the same swap used below. Repro: Load an INT8/ConvRot-quantized checkpoint with the native UNETLoader on an AMD GPU with the triton backend enabled (COMFY_KITCHEN_BACKEND=triton or equivalent). Fails on first call into int8_linear → triton_quantize_rowwise.

tl.floor(x + 0.5) is portable across both CUDA and HIP backends (no branching needed), and the rounding difference vs. rint's round-half-to-even is negligible for INT8 quantization scales. Confirmed against current main (v0.2.12) — this is the only occurrence of libdevice.rint/libdevice.round in the codebase, so this single-line change should resolve ROCm compatibility for this path.

fails on ROCm:
AttributeError("module 'triton.language.extra.hip.libdevice' has no attribute 'rint'")
The generic from triton.language.extra import libdevice import resolves to triton.language.extra.hip.libdevice on AMD GPUs, which doesn't expose rint. This is the same gap as triton-lang#7135, where libdevice.round is declared but throws the same AttributeError when actually compiled on HIP — the documented workaround there is the same swap used below.
Repro: Load an INT8/ConvRot-quantized checkpoint with the native UNETLoader on an AMD GPU with the triton backend enabled (COMFY_KITCHEN_BACKEND=triton or equivalent). Fails on first call into int8_linear → triton_quantize_rowwise.

tl.floor(x + 0.5) is portable across both CUDA and HIP backends (no branching needed), and the rounding difference vs. rint's round-half-to-even is negligible for INT8 quantization scales.
Confirmed against current main (v0.2.12) — this is the only occurrence of libdevice.rint/libdevice.round in the codebase, so this single-line change should resolve ROCm compatibility for this path.
@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown

✅ All contributors have signed the CLA. Thank you! This PR is ready to be merged.
Posted by the CLA Assistant Lite bot.

@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 6b4c448c-dbf0-46f6-a311-12a28f78fd34

📥 Commits

Reviewing files that changed from the base of the PR and between 4224ca0 and 3cb5be6.

📒 Files selected for processing (1)
  • comfy_kitchen/backends/triton/quantization.py

📝 Walkthrough

Walkthrough

The Triton INT8 row-wise quantization kernel now rounds q_f with tl.floor(q_f + 0.5) instead of libdevice.rint(q_f) before clamping to int8. Scaling, clamping, and stored outputs remain unchanged.

Changes

INT8 quantization rounding

Layer / File(s) Summary
Rounding expression change
comfy_kitchen/backends/triton/quantization.py
The quantization kernel changes the q_i rounding expression from libdevice.rint(q_f) to tl.floor(q_f + 0.5) before clamping.
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@patientx

Copy link
Copy Markdown
Author

I have read and agree to the Contributor License Agreement

comfy-legal added a commit to Comfy-Org/comfy-cla that referenced this pull request Jun 26, 2026
@patientx

Copy link
Copy Markdown
Author

Just loading a model converted to int8-convrot without this patch results in this log from a failed run :

[ERROR] !!! Exception during processing !!! at 35:10:

    # Reduction to find max
    max_val = tl.max(abs_x, axis=0)

    # 2. Compute Scale
    scale = tl.maximum(max_val / 127.0, 1e-30)

    # 3. Quantize
    q_f = x / scale

    # Round and Clamp
    q_i = libdevice.rint(q_f).to(tl.int32)
          ^
AttributeError("module 'triton.language.extra.hip.libdevice' has no attribute 'rint'")
[ERROR] Traceback (most recent call last):
  File "d:\comfyui\execution.py", line 542, in execute
    output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\comfyui\execution.py", line 341, in get_output_data
    return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\comfyui\execution.py", line 315, in _async_map_node_over_list
    await process_inputs(input_dict, i)
  File "d:\comfyui\execution.py", line 303, in process_inputs
    result = f(**inputs)
             ^^^^^^^^^^^
  File "d:\comfyui\comfy_api\internal\__init__.py", line 152, in wrapped_func
    return method(locked_class, **inputs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\comfyui\comfy_api\latest\_io.py", line 1935, in EXECUTE_NORMALIZED
    to_return = cls.execute(*args, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\comfyui\comfy_extras\nodes_custom_sampler.py", line 1048, in execute
    samples = guider.sample(noise.generate_noise(latent), latent_image, sampler, sigmas, denoise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=noise.seed)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\comfyui\comfy\samplers.py", line 1316, in sample
    output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\comfyui\comfy\patcher_extension.py", line 113, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\comfyui\comfy\samplers.py", line 1254, in outer_sample
    output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\comfyui\comfy\samplers.py", line 1229, in inner_sample
    samples = executor.execute(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\comfyui\comfy\patcher_extension.py", line 113, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\comfyui\comfy\samplers.py", line 999, in sample
    samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\venv\Lib\site-packages\torch\utils\_contextlib.py", line 124, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "d:\comfyui\comfy\k_diffusion\sampling.py", line 205, in sample_euler
    denoised = model(x, sigma_hat * s_in, **extra_args)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\comfyui\comfy\samplers.py", line 639, in __call__
    out = self.inner_model(x, sigma, model_options=model_options, seed=seed)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\comfyui\comfy\samplers.py", line 1202, in __call__
    return self.outer_predict_noise(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\comfyui\comfy\samplers.py", line 1209, in outer_predict_noise
    ).execute(x, timestep, model_options, seed)
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\comfyui\comfy\patcher_extension.py", line 113, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\comfyui\comfy\samplers.py", line 1212, in predict_noise
    return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\comfyui\comfy\samplers.py", line 619, in sampling_function
    out = calc_cond_batch(model, conds, x, timestep, model_options)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\comfyui\comfy\samplers.py", line 210, in calc_cond_batch
    return _calc_cond_batch_outer(model, conds, x_in, timestep, model_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\comfyui\comfy\samplers.py", line 218, in _calc_cond_batch_outer
    return executor.execute(model, conds, x_in, timestep, model_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\comfyui\comfy\patcher_extension.py", line 113, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\comfyui\comfy\samplers.py", line 334, in _calc_cond_batch
    output = model.apply_model(input_x, timestep_, **c).chunk(batch_chunks)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\comfyui\comfy\model_base.py", line 191, in apply_model
    return comfy.patcher_extension.WrapperExecutor.new_class_executor(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\comfyui\comfy\patcher_extension.py", line 113, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\comfyui\comfy\model_base.py", line 235, in _apply_model
    model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1778, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1789, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\comfyui\comfy\ldm\lightricks\av_model.py", line 1069, in forward
    return super().forward(
           ^^^^^^^^^^^^^^^^
  File "d:\comfyui\comfy\ldm\lightricks\model.py", line 936, in forward
    return comfy.patcher_extension.WrapperExecutor.new_class_executor(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\comfyui\comfy\patcher_extension.py", line 113, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\comfyui\comfy\ldm\lightricks\model.py", line 989, in _forward
    x = self._process_transformer_blocks(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\comfyui\comfy\ldm\lightricks\av_model.py", line 969, in _process_transformer_blocks
    vx, ax = block(
             ^^^^^^
  File "D:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1778, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1789, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\comfyui\comfy\ldm\lightricks\av_model.py", line 276, in forward
    attn1_out = self.attn1(norm_vx, pe=v_pe, mask=self_attention_mask, transformer_options=transformer_options)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1778, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1789, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\comfyui\comfy\ldm\lightricks\model.py", line 456, in forward
    q = self.to_q(x)
        ^^^^^^^^^^^^
  File "D:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1778, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1789, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\comfyui\comfy\ops.py", line 1288, in forward
    output = self.forward_comfy_cast_weights(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\comfyui\comfy\ops.py", line 1230, in forward_comfy_cast_weights
    x = self._forward(input, weight, bias)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\comfyui\comfy\ops.py", line 1201, in _forward
    return torch.nn.functional.linear(input, weight, bias)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\venv\Lib\site-packages\comfy_kitchen\tensor\base.py", line 348, in __torch_dispatch__
    return op_handlers[parent_cls](qt, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\venv\Lib\site-packages\comfy_kitchen\tensor\int8.py", line 232, in _handle_int8_linear_tensorwise
    return torch.ops.comfy_kitchen.int8_linear(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\venv\Lib\site-packages\torch\_ops.py", line 1275, in __call__
    return self._op(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\venv\Lib\site-packages\torch\_library\custom_ops.py", line 375, in backend_impl
    result = self._backend_fns[device_type](*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\venv\Lib\site-packages\torch\_compile.py", line 54, in inner
    return disable_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\venv\Lib\site-packages\torch\_dynamo\eval_frame.py", line 1298, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\venv\Lib\site-packages\torch\_library\custom_ops.py", line 410, in wrapped_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\venv\Lib\site-packages\comfy_kitchen\backends\eager\quantization.py", line 981, in _op_int8_linear
    return impl(**kwargs)
           ^^^^^^^^^^^^^^
  File "D:\ComfyUI\venv\Lib\site-packages\comfy_kitchen\backends\triton\quantization.py", line 1039, in int8_linear
    x_int8, x_scale = triton_quantize_rowwise(x_rotated)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\venv\Lib\site-packages\comfy_kitchen\backends\triton\quantization.py", line 826, in triton_quantize_rowwise
    _quantize_rowwise_kernel[grid](x, y, s, cols, block_size=block_size)
  File "D:\ComfyUI\venv\Lib\site-packages\triton\runtime\jit.py", line 370, in <lambda>
    return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\venv\Lib\site-packages\triton\runtime\jit.py", line 720, in run
    kernel = self._do_compile(key, signature, device, constexprs, options, attrs, warmup)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\venv\Lib\site-packages\triton\runtime\jit.py", line 849, in _do_compile
    kernel = self.compile(src, target=target, options=options.__dict__)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\venv\Lib\site-packages\triton\compiler\compiler.py", line 304, in compile
    module = src.make_ir(target, options, codegen_fns, module_map, context)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\venv\Lib\site-packages\triton\compiler\compiler.py", line 80, in make_ir
    return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
triton.compiler.errors.CompilationError: at 35:10:

    # Reduction to find max
    max_val = tl.max(abs_x, axis=0)

    # 2. Compute Scale
    scale = tl.maximum(max_val / 127.0, 1e-30)

    # 3. Quantize
    q_f = x / scale

    # Round and Clamp
    q_i = libdevice.rint(q_f).to(tl.int32)
          ^
AttributeError("module 'triton.language.extra.hip.libdevice' has no attribute 'rint'")

@patientx

patientx commented Jun 26, 2026

Copy link
Copy Markdown
Author

this PR #56 for eager works more reliably on amd rdna2 although triton still looks faster

@patientx patientx changed the title Change rounding method in quantization process Change rounding method in quantization process (triton backend fix for amd gpus) Jun 27, 2026
@LordFlashmeow

Copy link
Copy Markdown

@patientx

Just loading a model converted to int8-convrot without this patch results in this log from a failed run :

upgrading from triton-windows-3.6.0.post25 to triton-windows==3.7.0.post26 fixed that issue for me.
I think support for some int8 operations was added in 3.7.0.

@patientx

Copy link
Copy Markdown
Author

@patientx

Just loading a model converted to int8-convrot without this patch results in this log from a failed run :

upgrading from triton-windows-3.6.0.post25 to triton-windows==3.7.0.post26 fixed that issue for me. I think support for some int8 operations was added in 3.7.0.

there was a reason I kept using 3.6.0.post25 and I remembered that reason after trying the 3.7.0, yes this lets me use triton backend without a PATCH but it is still slower than node route -tried 8 gens on both- BUT more importantly ... That version + using native loader straight up stops generation in like 3rd or 4th time and after a few seconds whole PC freezes, I monitor vram etc. no temp raising, no gpu usage maxing etc. It just kills the comfy & windows. Only a hard reset solves it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants