Add active-MoE AutoQuant cost accounting by meenchen · Pull Request #1497 · NVIDIA/Model-Optimizer

meenchen · 2026-05-14T22:31:10Z

What does this PR do?

• Type of change: new feature

Adds an active_moe cost model for auto_quantize effective-bits search. This lets AutoQuant account for routed MoE expert weights by active decode weight traffic instead of total checkpoint weight
size, using active_moe_expert_ratio = num_experts_per_tok / num_experts.

The default behavior is unchanged: cost_model="weight" still counts all quantizable weights equally.

Usage

import modelopt.torch.quantization as mtq

model, search_state = mtq.auto_quantize(
model,
constraints={"effective_bits": 5.0},
quantization_formats=[
mtq.NVFP4_DEFAULT_CFG,
mtq.FP8_DEFAULT_CFG,
],
data_loader=calib_dataloader,
forward_step=forward_step,
loss_func=loss_func,
cost_model="active_moe",
# Optional. If omitted, ModelOpt tries to infer this from model.config.
active_moe_expert_ratio=2 / 64,
)

The HF PTQ example also exposes:

--auto_quantize_cost_model active_moe
--auto_quantize_active_moe_expert_ratio 0.03125

Testing

python -m pytest tests/unit/torch/quantization/test_autoquant.py -q -k 'active_moe or quant_recipe_hparam_cost_weight'
python -m pytest tests/unit/torch/quantization/test_autoquant.py -q -k 'not data_parallel_auto_quantize'

Results:

4 passed
58 passed, 1 deselected

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅ / ❌ / N/A
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
Did you write any new necessary tests?: ✅ / ❌ / N/A
Did you update Changelog?: ✅ / ❌ / N/A
Did you get Claude approval on this PR?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

New Features
- Added active-MoE cost model option for auto-quantization with configurable expert ratio; API and CLI accept cost_model and active_moe_expert_ratio
- Unified auto-quantize supports new quant format w4a16_nvfp4
Bug Fixes
- Ensure labels are moved to the logits device for base models without an lm_head
- CLI enforces valid expert-ratio range and requires active-MoE mode when a ratio is provided
Tests
- Added unit tests for active-MoE behavior, cost-weighting, ratio handling, and search budget selection

copy-pr-bot · 2026-05-14T22:31:14Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-05-14T22:31:23Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds an active-MoE cost model option to auto-quantization: detects routed MoE modules, applies per-module cost weighting using an expert-activity ratio, threads cost_model and active_moe_expert_ratio through the searcher and API/CLI, and adds unit tests covering behavior and searcher selection.

Changes

Active-MoE Cost Model Support

Layer / File(s)	Summary
MoE cost model foundation `modelopt/torch/quantization/algorithms.py`	Introduces `_is_routed_moe_module_name()` and `_get_active_moe_cost_weight()` utilities for MoE detection and scaling. Extends `QuantRecipeHparam` with `cost_weight` parameter for per-module cost scaling and updates `get_cost()` to accept optional cost weight override. Adds `cost_model` and `active_moe_expert_ratio` to search configuration defaults and validation.
Searcher cost computation and integration `modelopt/torch/quantization/algorithms.py`	Updates hparam insertion to compute per-group `cost_weight` from routed MoE modules and pass it into `QuantRecipeHparam`. Extends candidate stats initialization to track both constraint costs and active costs with `cost_weight` recorded. Modifies `before_search` to validate and set cost model fields, and `run_search` to branch weight-size computation based on `cost_model` using new helpers `_get_total_weight_size_from_named_modules()` and `_get_search_lower_bounds()`. Updates LP lower-bound retry logic and best-recipe resolution to prefer persisted cost denominator.
User API and CLI integration `modelopt/torch/quantization/model_quant.py`, `examples/llm_ptq/hf_ptq.py`	Extends `auto_quantize()` with `cost_model` and `active_moe_expert_ratio` parameters, adds internal helpers to infer ratio from model config attributes, and validates inputs. Adds CLI arguments `--auto_quantize_cost_model` and `--auto_quantize_active_moe_expert_ratio` with post-parse validation ensuring ratio is in `(0.0, 1.0]` and only set when `cost_model` is `"active_moe"`. Parameters propagate through to searcher configuration.
Tests for active-MoE cost model `tests/unit/torch/quantization/test_autoquant.py`	Adds `_AutoQuantMoeModel` fixture with routed expert and shared expert submodules. Validates `QuantRecipeHparam.get_cost()` scaling with `cost_weight` across recipes. Tests `auto_quantize()` with `cost_model="active_moe"` verifying expert/shared-expert cost-weight assignments (0.25 and 1.0 respectively) and active-cost tracking in search history. Verifies `AutoQuantizeGradientSearcher` selects budget-lower-bound recipes under MoE cost scenarios.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

ajrasane
cjluo-nv

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 48.72% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately summarizes the main change: adding active-MoE cost accounting to the AutoQuant system, which is the central feature across all modified files.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	No security anti-patterns detected. All modified files pass checks: no unsafe torch.load/numpy.load, no hardcoded trust_remote_code, no eval/exec, no nosec comments, no unsafe dependencies.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch weimingc/autoquant_edge

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-14T22:35:19Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-06-02 09:06 UTC

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@modelopt/torch/quantization/model_quant.py`:
- Around line 300-315: _infer_active_moe_expert_ratio currently calls
_get_first_numeric_config_attr twice which can pick values from two different
config objects; instead iterate the same configs (use _iter_model_configs) and
for each config check both attribute groups (_ACTIVE_MOE_TOP_K_ATTRS and
_ACTIVE_MOE_NUM_EXPERTS_ATTRS) on that single config object, ensure both are
numeric and num_experts > 0, then return min(num_active_experts / num_experts,
1.0); if no single config contains both numeric values return None.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5b320520-fd7c-4c67-b182-efe01e721d39

📥 Commits

Reviewing files that changed from the base of the PR and between e27f76f and 9eb1ee0.

📒 Files selected for processing (4)

examples/llm_ptq/hf_ptq.py
modelopt/torch/quantization/algorithms.py
modelopt/torch/quantization/model_quant.py
tests/unit/torch/quantization/test_autoquant.py

codecov · 2026-05-15T23:04:17Z

Codecov Report

❌ Patch coverage is 96.40288% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.26%. Comparing base (905259f) to head (b6b8be2).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/torch/quantization/_auto_quantize_cost.py	96.87%	3 Missing ⚠️
modelopt/torch/quantization/algorithms.py	94.87%	2 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (905259f) and HEAD (b6b8be2). Click for more details.

HEAD has 12 uploads less than BASE

Flag BASE (905259f) HEAD (b6b8be2)

gpu 4 1

examples 12 3

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1497       +/-   ##
===========================================
- Coverage   77.38%   55.26%   -22.12%     
===========================================
  Files         479      479               
  Lines       52435    52496       +61     
===========================================
- Hits        40578    29014    -11564     
- Misses      11857    23482    +11625

Flag	Coverage Δ
examples	`15.13% <26.61%> (-25.69%)`	⬇️
gpu	`14.80% <26.61%> (-45.65%)`	⬇️
regression	`15.23% <26.61%> (+0.10%)`	⬆️
unit	`53.73% <96.40%> (+0.10%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

juhi10071998 · 2026-05-26T21:04:58Z

        )
        return config

+    def _get_cost_constraints(self) -> tuple[str, float | None]:


hi @meenchen - is it possible to unify (adding a hook or something) the check in the model_quant.py _normalize_auto_quantize_constraints and this function? it seems like there is a duplication of checking the constraints and it might be prone to missing as we scale and add more cost models

Updated in fac22b7. I moved shared validation/normalization into _auto_quantize_cost.normalize_auto_quantize_constraints() and both the public API and the searcher call it, so adding a cost model has one validation path.

shengliangxu · 2026-05-27T01:42:26Z

+def _normalize_auto_quantize_constraints(
+    model: nn.Module, constraints: dict[str, Any] | None
+) -> dict[str, Any]:
+    constraints = {"effective_bits": 4.8} if constraints is None else dict(constraints)


can we move this 4.8 magic number to some constant and add comments explain the meaning? it's scattered around in multiple places

Done in fac22b7. DEFAULT_AUTO_QUANTIZE_EFFECTIVE_BITS is centralized in _auto_quantize_cost.py with a compatibility comment; auto_quantize() gets the default through the shared normalizer.

shengliangxu · 2026-05-27T01:45:03Z

+) -> dict[str, Any]:
+    constraints = {"effective_bits": 4.8} if constraints is None else dict(constraints)
+    cost_model = constraints.get("cost_model", "weight")
+    if cost_model not in ("weight", "active_moe"):


We should also move these constants somewhere centralized and add comments explain what are they.

better define 2 classes and define their behaviour/properties using the class.

Done in fac22b7. Added AutoQuantizeCostModel, WeightCostModel, and ActiveMoECostModel; the searcher now asks the selected class for module cost weights and total cost denominator.

Done in fac22b7. Constants and cost-model-specific keys moved into _auto_quantize_cost.py, with comments/docstrings for the default target and active-MoE behavior.

cjluo-nv

Bot review — DM the bot to share feedback.

Adds an active_moe cost model for AutoQuantize effective-bits accounting. Overall structure is reasonable: separate _auto_quantize_cost.py with a small registry, propagation of cost_weight through QuantRecipeHparam, checkpoint compat (falls back to sum(costs[-1]) when cost_denominator is missing), and unit tests covering the routed-vs-shared expert weighting, config inference, and search-budget lower-bound ordering.

A few concrete things to look at before merging:

active_costs looks like dead code. In initialize_candidate_stats, hparam.get_cost(recipe) and hparam.get_cost(recipe, cost_weight=hparam.cost_weight) are equivalent because get_cost's default already substitutes self.cost_weight when cost_weight is None. So active_costs == costs for every recipe, and the only test assertion on it is "active_costs" in stats. Either compute something genuinely different (e.g. the unweighted physical cost with cost_weight=1.0, which would actually be useful for reporting) or drop the field and the corresponding cost_weight/state-dict plumbing.
Constraints are normalized twice. model_quant.auto_quantize calls normalize_auto_quantize_constraints before apply_mode, and _AutoQuantizeBaseSearcher.before_search calls it again on self.constraints. Idempotent today, but it's an easy footgun if either side changes shape — pick one site.
_resolve_best_recipe reconstructs a partial searcher config: searcher.config = {"cost_model": searcher.cost_model} (no "cost" key, no other defaults). Today _get_search_lower_bounds only reads cost_model so this works, but it's fragile — any future code in run_search_with_stats that touches self.config[...] will break the resolve path. Consider populating from search_state more completely, or routing through default_search_config.
Minor: in modelopt/torch/quantization/model_quant.py the new from ._auto_quantize_cost import normalize_auto_quantize_constraints is placed after from .algorithms import ...; isort would put the underscore-prefixed sibling first. Also the deleted blank line above def auto_quantize(...) makes the module-level constant _AUTO_QUANTIZE_SUPPORTED_ALGORITHMS visually run into the function definition.

cjluo-nv · 2026-05-29T17:03:24Z

                score = hparam.get_score(recipe)  # type: ignore [arg-type]
                cost = hparam.get_cost(recipe)  # type: ignore [arg-type]
+                active_cost = hparam.get_cost(recipe, cost_weight=hparam.cost_weight)  # type: ignore [arg-type]



Bot comment.

active_cost = hparam.get_cost(recipe, cost_weight=hparam.cost_weight) is identical to hparam.get_cost(recipe) because get_cost defaults cost_weight to self.cost_weight (which is hparam.cost_weight). So active_costs ends up equal to costs for every entry, and the test only checks key presence. Either drop the field, or compute something actually distinct — e.g. the unweighted physical cost via cost_weight=1.0 for reporting routed-MoE savings vs. total weight size.

Fixed in 34a6baf. I dropped active_costs from newly generated candidate stats instead of keeping a duplicate of costs; the active-MoE test now asserts the duplicate field is absent while preserving cost_weight for the actual cost accounting.

cjluo-nv · 2026-05-29T17:03:24Z

        )

    searcher.candidate_stats = candidate_stats
+    searcher.cost_model = search_state.get("cost_model", COST_MODEL_WEIGHT)


Bot comment.

This builds a stub searcher.config = {"cost_model": ...} with no "cost" key and none of the other default search-config entries. It works today only because _get_search_lower_bounds reads just cost_model. If anything in run_search_with_stats (or anything it calls) starts touching self.config["cost"] or other keys, the get_auto_quantize_config(..., constraints=...) re-solve path will silently break. Consider initializing through default_search_config or persisting/restoring the full config from search_state.

Fixed in 34a6baf. _resolve_best_recipe() now starts from default_search_config and fills the persisted cost fields (cost_model, cost, active_moe_expert_ratio) instead of using a one-key stub config.

cjluo-nv · 2026-05-29T17:03:24Z

    else:
        raise ValueError(f"Invalid method: {method}. Valid options are 'gradient' or 'kl_div'.")

+    constraints = normalize_auto_quantize_constraints(model, constraints)


Bot comment.

Constraints are now normalized here and again in _AutoQuantizeBaseSearcher.before_search. The function is idempotent today but it's an easy place for the two sites to drift. Pick one (probably before_search, so external callers using the searcher directly also get normalized constraints) and drop the other.

Fixed in 34a6baf. Normalization now happens in _AutoQuantizeBaseSearcher.before_search() only, so direct searcher callers and the public API share the same normalization path.

meenchen · 2026-05-29T17:09:17Z

Pushed follow-up fixes for the latest review comments: 34a6baf handles the three substantive AutoQuantize cost-model comments, and 3411e4a fixes the minor API formatting issue. Validation: compileall passed, git diff --check passed, focused active-MoE/get-config tests passed, and tests/unit/torch/quantization/test_autoquant.py -q -k "not data_parallel_auto_quantize" passed with 59 passed / 1 deselected.

shengliangxu

LGTM now.

juhi10071998 · 2026-05-29T20:34:56Z

+        parser.error(
+            "--auto_quantize_active_moe_expert_ratio requires "
+            "--auto_quantize_cost_model active_moe."
+        )


I think for this PR we should keep this, once I update the Autoquant YAML recipe PR to reflect the cost model, I will remove the CLI arg support for AutoQuant as we discussed yesterday, does that align? @meenchen , @shengliangxu ?

juhi10071998

LGTM, left a minor comment

cjluo-nv

Bot review — DM the bot to share feedback.

Re-review: all important previous review comments are addressed in the current diff.

Critical comments resolved:

💬 CodeRabbit's _infer_active_moe_expert_ratio cross-config pairing bug — fixed; logic moved into _auto_quantize_cost.infer_active_moe_expert_ratio which iterates configs and checks both top-k and num-experts attributes on the same config object. New test test_active_moe_ratio_requires_single_config_object covers the wrapper/text_config mismatch.
💬 cjluo-nv's "active_costs is dead code" — resolved; the duplicated field was dropped from initialize_candidate_stats, and the test now asserts its absence while still validating cost_weight.
💬 cjluo-nv's "constraints normalized twice" — resolved; normalization now lives only in _AutoQuantizeBaseSearcher.before_search, and auto_quantize() in model_quant.py no longer pre-normalizes.
💬 cjluo-nv's "_resolve_best_recipe partial searcher config" — resolved; it now seeds from default_search_config and populates cost_model, cost, and active_moe_expert_ratio from search_state.
💬 realAsma / juhi10071998 design feedback (move cost knobs into constraints, unify validation) — implemented; cost_model/cost live under constraints, and normalize_auto_quantize_constraints is the single validation entry point.
💬 shengliangxu's magic-number / class-refactor asks — resolved via DEFAULT_AUTO_QUANTIZE_EFFECTIVE_BITS and the AutoQuantizeCostModel / WeightCostModel / ActiveMoECostModel hierarchy.

Minor remaining nits (non-blocking): default_state_dict hardcodes the literal "weight" instead of using COST_MODEL_WEIGHT, and _resolve_best_recipe reaches into search_state for cost fields that older checkpoints predating this PR will not contain (the .get(..., default) calls do degrade gracefully, but a checkpoint saved before this PR has no cost_denominator, so the or sum(...) fallback kicks in — that path is exercised but not unit-tested).

Test coverage is solid: routed-vs-shared expert weighting, num_experts vs num_local_experts config-attr fallback, single-config inference requirement, and lower-bound search-order preference for active-MoE are all covered.

cjluo-nv

left a comment. Need clarification on the added flags

meenchen · 2026-06-01T20:48:28Z

/ok to test adbf9f0

meenchen · 2026-06-02T05:30:55Z

/ok to test 93c4138

Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>

kevalmorabia97 · 2026-06-02T08:06:43Z

/ok to test b6b8be2

codecov · 2026-06-02T08:27:17Z

Codecov Report

❌ Patch coverage is 96.40288% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.29%. Comparing base (905259f) to head (b6b8be2).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/torch/quantization/_auto_quantize_cost.py	96.87%	3 Missing ⚠️
modelopt/torch/quantization/algorithms.py	94.87%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1497      +/-   ##
==========================================
- Coverage   77.38%   77.29%   -0.10%     
==========================================
  Files         479      480       +1     
  Lines       52435    52564     +129     
==========================================
+ Hits        40578    40628      +50     
- Misses      11857    11936      +79

Flag	Coverage Δ
examples	`41.72% <62.58%> (+0.90%)`	⬆️
gpu	`59.85% <62.58%> (-0.59%)`	⬇️
regression	`15.23% <26.61%> (+0.10%)`	⬆️
unit	`53.73% <96.40%> (+0.10%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

meenchen requested review from a team as code owners May 14, 2026 22:31

meenchen requested a review from Edwardf0t1 May 14, 2026 22:31

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

Comment thread modelopt/torch/quantization/model_quant.py Outdated

realAsma reviewed May 15, 2026

View reviewed changes

Comment thread modelopt/torch/quantization/model_quant.py Outdated

meenchen force-pushed the weimingc/autoquant_edge branch 4 times, most recently from b721f1d to f681009 Compare May 15, 2026 22:50

meenchen force-pushed the weimingc/autoquant_edge branch from f681009 to 6f791d1 Compare May 18, 2026 19:34

juhi10071998 reviewed May 20, 2026

View reviewed changes

Comment thread modelopt/torch/quantization/model_quant.py

juhi10071998 mentioned this pull request May 21, 2026

Add YAML based AutoQuantize recipe (currently only CLI is supported) #1523

Open

6 tasks

meenchen requested review from juhi10071998 and realAsma May 26, 2026 19:16

juhi10071998 reviewed May 26, 2026

View reviewed changes

Comment thread examples/llm_ptq/hf_ptq.py

shengliangxu reviewed May 27, 2026

View reviewed changes

meenchen requested review from cjluo-nv, juhi10071998 and shengliangxu and removed request for juhi10071998 May 29, 2026 16:11

cjluo-nv reviewed May 29, 2026

View reviewed changes

shengliangxu approved these changes May 29, 2026

View reviewed changes

juhi10071998 reviewed May 29, 2026

View reviewed changes

juhi10071998 approved these changes May 29, 2026

View reviewed changes

cjluo-nv approved these changes May 29, 2026

View reviewed changes

meenchen force-pushed the weimingc/autoquant_edge branch from 3411e4a to 0d0cef1 Compare May 29, 2026 22:35

cjluo-nv reviewed Jun 1, 2026

View reviewed changes

Comment thread examples/llm_ptq/hf_ptq.py

cjluo-nv requested changes Jun 1, 2026

View reviewed changes

meenchen requested a review from cjluo-nv June 1, 2026 16:42

cjluo-nv approved these changes Jun 1, 2026

View reviewed changes

coderabbitai Bot approved these changes Jun 1, 2026

View reviewed changes

meenchen enabled auto-merge (squash) June 1, 2026 17:39

meenchen force-pushed the weimingc/autoquant_edge branch from adbf9f0 to 93c4138 Compare June 2, 2026 04:03

meenchen added 8 commits June 2, 2026 00:23

Add active-MoE AutoQuant cost accounting

f66b985

Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>

Refactor AutoQuantize cost model handling

6c44200

Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>

Address AutoQuantize cost model review feedback

28b215e

Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>

Fix AutoQuantize API formatting

5708136

Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>

Fix AutoQuantize code quality issues

e67a1aa

Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>

Fix AutoQuantize cast lint

24a46ad

Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>

Clarify active MoE AutoQuant ratio

09da485

Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>

Increase GPU Megatron test timeout

b6b8be2

Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>

meenchen force-pushed the weimingc/autoquant_edge branch from 93c4138 to b6b8be2 Compare June 2, 2026 07:27

meenchen requested a review from a team as a code owner June 2, 2026 07:27

meenchen requested a review from kevalmorabia97 June 2, 2026 07:27

kevalmorabia97 approved these changes Jun 2, 2026

View reviewed changes

meenchen merged commit 72df833 into main Jun 2, 2026
52 checks passed

meenchen deleted the weimingc/autoquant_edge branch June 2, 2026 09:06

coderabbitai Bot mentioned this pull request Jun 3, 2026

Adds AutoQuant support for VLM / Qwen3.5-Qwen3.6 style models #1381

Merged

Conversation

meenchen commented May 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented May 14, 2026

Uh oh!

coderabbitai Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

meenchen commented May 29, 2026

Uh oh!

shengliangxu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

juhi10071998 left a comment

Choose a reason for hiding this comment

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

meenchen commented May 14, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 14, 2026 •

edited

Loading

github-actions Bot commented May 14, 2026 •

edited

Loading

codecov Bot commented May 15, 2026 •

edited

Loading

codecov Bot commented Jun 2, 2026 •

edited

Loading