Skip to content

Add active-MoE AutoQuant cost accounting#1497

Merged
meenchen merged 8 commits into
mainfrom
weimingc/autoquant_edge
Jun 2, 2026
Merged

Add active-MoE AutoQuant cost accounting#1497
meenchen merged 8 commits into
mainfrom
weimingc/autoquant_edge

Conversation

@meenchen

@meenchen meenchen commented May 14, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

• Type of change: new feature

Adds an active_moe cost model for auto_quantize effective-bits search. This lets AutoQuant account for routed MoE expert weights by active decode weight traffic instead of total checkpoint weight
size, using active_moe_expert_ratio = num_experts_per_tok / num_experts.

The default behavior is unchanged: cost_model="weight" still counts all quantizable weights equally.

Usage

import modelopt.torch.quantization as mtq

model, search_state = mtq.auto_quantize(
model,
constraints={"effective_bits": 5.0},
quantization_formats=[
mtq.NVFP4_DEFAULT_CFG,
mtq.FP8_DEFAULT_CFG,
],
data_loader=calib_dataloader,
forward_step=forward_step,
loss_func=loss_func,
cost_model="active_moe",
# Optional. If omitted, ModelOpt tries to infer this from model.config.
active_moe_expert_ratio=2 / 64,
)

The HF PTQ example also exposes:

--auto_quantize_cost_model active_moe
--auto_quantize_active_moe_expert_ratio 0.03125

Testing

python -m pytest tests/unit/torch/quantization/test_autoquant.py -q -k 'active_moe or quant_recipe_hparam_cost_weight'
python -m pytest tests/unit/torch/quantization/test_autoquant.py -q -k 'not data_parallel_auto_quantize'

Results:

  • 4 passed
  • 58 passed, 1 deselected

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅ / ❌ / N/A
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
  • Did you write any new necessary tests?: ✅ / ❌ / N/A
  • Did you update Changelog?: ✅ / ❌ / N/A
  • Did you get Claude approval on this PR?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

  • New Features

    • Added active-MoE cost model option for auto-quantization with configurable expert ratio; API and CLI accept cost_model and active_moe_expert_ratio
    • Unified auto-quantize supports new quant format w4a16_nvfp4
  • Bug Fixes

    • Ensure labels are moved to the logits device for base models without an lm_head
    • CLI enforces valid expert-ratio range and requires active-MoE mode when a ratio is provided
  • Tests

    • Added unit tests for active-MoE behavior, cost-weighting, ratio handling, and search budget selection

Review Change Stack

@meenchen meenchen requested review from a team as code owners May 14, 2026 22:31
@meenchen meenchen requested a review from Edwardf0t1 May 14, 2026 22:31
@copy-pr-bot

copy-pr-bot Bot commented May 14, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented May 14, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds an active-MoE cost model option to auto-quantization: detects routed MoE modules, applies per-module cost weighting using an expert-activity ratio, threads cost_model and active_moe_expert_ratio through the searcher and API/CLI, and adds unit tests covering behavior and searcher selection.

Changes

Active-MoE Cost Model Support

Layer / File(s) Summary
MoE cost model foundation
modelopt/torch/quantization/algorithms.py
Introduces _is_routed_moe_module_name() and _get_active_moe_cost_weight() utilities for MoE detection and scaling. Extends QuantRecipeHparam with cost_weight parameter for per-module cost scaling and updates get_cost() to accept optional cost weight override. Adds cost_model and active_moe_expert_ratio to search configuration defaults and validation.
Searcher cost computation and integration
modelopt/torch/quantization/algorithms.py
Updates hparam insertion to compute per-group cost_weight from routed MoE modules and pass it into QuantRecipeHparam. Extends candidate stats initialization to track both constraint costs and active costs with cost_weight recorded. Modifies before_search to validate and set cost model fields, and run_search to branch weight-size computation based on cost_model using new helpers _get_total_weight_size_from_named_modules() and _get_search_lower_bounds(). Updates LP lower-bound retry logic and best-recipe resolution to prefer persisted cost denominator.
User API and CLI integration
modelopt/torch/quantization/model_quant.py, examples/llm_ptq/hf_ptq.py
Extends auto_quantize() with cost_model and active_moe_expert_ratio parameters, adds internal helpers to infer ratio from model config attributes, and validates inputs. Adds CLI arguments --auto_quantize_cost_model and --auto_quantize_active_moe_expert_ratio with post-parse validation ensuring ratio is in (0.0, 1.0] and only set when cost_model is "active_moe". Parameters propagate through to searcher configuration.
Tests for active-MoE cost model
tests/unit/torch/quantization/test_autoquant.py
Adds _AutoQuantMoeModel fixture with routed expert and shared expert submodules. Validates QuantRecipeHparam.get_cost() scaling with cost_weight across recipes. Tests auto_quantize() with cost_model="active_moe" verifying expert/shared-expert cost-weight assignments (0.25 and 1.0 respectively) and active-cost tracking in search history. Verifies AutoQuantizeGradientSearcher selects budget-lower-bound recipes under MoE cost scenarios.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • ajrasane
  • cjluo-nv
🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 48.72% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately summarizes the main change: adding active-MoE cost accounting to the AutoQuant system, which is the central feature across all modified files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed No security anti-patterns detected. All modified files pass checks: no unsafe torch.load/numpy.load, no hardcoded trust_remote_code, no eval/exec, no nosec comments, no unsafe dependencies.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch weimingc/autoquant_edge

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented May 14, 2026

Copy link
Copy Markdown
Contributor
PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-06-02 09:06 UTC

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@modelopt/torch/quantization/model_quant.py`:
- Around line 300-315: _infer_active_moe_expert_ratio currently calls
_get_first_numeric_config_attr twice which can pick values from two different
config objects; instead iterate the same configs (use _iter_model_configs) and
for each config check both attribute groups (_ACTIVE_MOE_TOP_K_ATTRS and
_ACTIVE_MOE_NUM_EXPERTS_ATTRS) on that single config object, ensure both are
numeric and num_experts > 0, then return min(num_active_experts / num_experts,
1.0); if no single config contains both numeric values return None.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5b320520-fd7c-4c67-b182-efe01e721d39

📥 Commits

Reviewing files that changed from the base of the PR and between e27f76f and 9eb1ee0.

📒 Files selected for processing (4)
  • examples/llm_ptq/hf_ptq.py
  • modelopt/torch/quantization/algorithms.py
  • modelopt/torch/quantization/model_quant.py
  • tests/unit/torch/quantization/test_autoquant.py

Comment thread modelopt/torch/quantization/model_quant.py Outdated
Comment thread modelopt/torch/quantization/model_quant.py Outdated
@meenchen meenchen force-pushed the weimingc/autoquant_edge branch 4 times, most recently from b721f1d to f681009 Compare May 15, 2026 22:50
@codecov

codecov Bot commented May 15, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 96.40288% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.26%. Comparing base (905259f) to head (b6b8be2).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/torch/quantization/_auto_quantize_cost.py 96.87% 3 Missing ⚠️
modelopt/torch/quantization/algorithms.py 94.87% 2 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (905259f) and HEAD (b6b8be2). Click for more details.

HEAD has 12 uploads less than BASE
Flag BASE (905259f) HEAD (b6b8be2)
gpu 4 1
examples 12 3
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1497       +/-   ##
===========================================
- Coverage   77.38%   55.26%   -22.12%     
===========================================
  Files         479      479               
  Lines       52435    52496       +61     
===========================================
- Hits        40578    29014    -11564     
- Misses      11857    23482    +11625     
Flag Coverage Δ
examples 15.13% <26.61%> (-25.69%) ⬇️
gpu 14.80% <26.61%> (-45.65%) ⬇️
regression 15.23% <26.61%> (+0.10%) ⬆️
unit 53.73% <96.40%> (+0.10%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@meenchen meenchen force-pushed the weimingc/autoquant_edge branch from f681009 to 6f791d1 Compare May 18, 2026 19:34
Comment thread modelopt/torch/quantization/model_quant.py
)
return config

def _get_cost_constraints(self) -> tuple[str, float | None]:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @meenchen - is it possible to unify (adding a hook or something) the check in the model_quant.py _normalize_auto_quantize_constraints and this function? it seems like there is a duplication of checking the constraints and it might be prone to missing as we scale and add more cost models

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in fac22b7. I moved shared validation/normalization into _auto_quantize_cost.normalize_auto_quantize_constraints() and both the public API and the searcher call it, so adding a cost model has one validation path.

Comment thread examples/llm_ptq/hf_ptq.py
def _normalize_auto_quantize_constraints(
model: nn.Module, constraints: dict[str, Any] | None
) -> dict[str, Any]:
constraints = {"effective_bits": 4.8} if constraints is None else dict(constraints)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move this 4.8 magic number to some constant and add comments explain the meaning? it's scattered around in multiple places

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in fac22b7. DEFAULT_AUTO_QUANTIZE_EFFECTIVE_BITS is centralized in _auto_quantize_cost.py with a compatibility comment; auto_quantize() gets the default through the shared normalizer.

) -> dict[str, Any]:
constraints = {"effective_bits": 4.8} if constraints is None else dict(constraints)
cost_model = constraints.get("cost_model", "weight")
if cost_model not in ("weight", "active_moe"):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also move these constants somewhere centralized and add comments explain what are they.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better define 2 classes and define their behaviour/properties using the class.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in fac22b7. Added AutoQuantizeCostModel, WeightCostModel, and ActiveMoECostModel; the searcher now asks the selected class for module cost weights and total cost denominator.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in fac22b7. Constants and cost-model-specific keys moved into _auto_quantize_cost.py, with comments/docstrings for the default target and active-MoE behavior.

@meenchen meenchen requested review from cjluo-nv, juhi10071998 and shengliangxu and removed request for juhi10071998 May 29, 2026 16:11

@cjluo-nv cjluo-nv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Adds an active_moe cost model for AutoQuantize effective-bits accounting. Overall structure is reasonable: separate _auto_quantize_cost.py with a small registry, propagation of cost_weight through QuantRecipeHparam, checkpoint compat (falls back to sum(costs[-1]) when cost_denominator is missing), and unit tests covering the routed-vs-shared expert weighting, config inference, and search-budget lower-bound ordering.

A few concrete things to look at before merging:

  1. active_costs looks like dead code. In initialize_candidate_stats, hparam.get_cost(recipe) and hparam.get_cost(recipe, cost_weight=hparam.cost_weight) are equivalent because get_cost's default already substitutes self.cost_weight when cost_weight is None. So active_costs == costs for every recipe, and the only test assertion on it is "active_costs" in stats. Either compute something genuinely different (e.g. the unweighted physical cost with cost_weight=1.0, which would actually be useful for reporting) or drop the field and the corresponding cost_weight/state-dict plumbing.

  2. Constraints are normalized twice. model_quant.auto_quantize calls normalize_auto_quantize_constraints before apply_mode, and _AutoQuantizeBaseSearcher.before_search calls it again on self.constraints. Idempotent today, but it's an easy footgun if either side changes shape — pick one site.

  3. _resolve_best_recipe reconstructs a partial searcher config: searcher.config = {"cost_model": searcher.cost_model} (no "cost" key, no other defaults). Today _get_search_lower_bounds only reads cost_model so this works, but it's fragile — any future code in run_search_with_stats that touches self.config[...] will break the resolve path. Consider populating from search_state more completely, or routing through default_search_config.

  4. Minor: in modelopt/torch/quantization/model_quant.py the new from ._auto_quantize_cost import normalize_auto_quantize_constraints is placed after from .algorithms import ...; isort would put the underscore-prefixed sibling first. Also the deleted blank line above def auto_quantize(...) makes the module-level constant _AUTO_QUANTIZE_SUPPORTED_ALGORITHMS visually run into the function definition.

score = hparam.get_score(recipe) # type: ignore [arg-type]
cost = hparam.get_cost(recipe) # type: ignore [arg-type]
active_cost = hparam.get_cost(recipe, cost_weight=hparam.cost_weight) # type: ignore [arg-type]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot comment.

active_cost = hparam.get_cost(recipe, cost_weight=hparam.cost_weight) is identical to hparam.get_cost(recipe) because get_cost defaults cost_weight to self.cost_weight (which is hparam.cost_weight). So active_costs ends up equal to costs for every entry, and the test only checks key presence. Either drop the field, or compute something actually distinct — e.g. the unweighted physical cost via cost_weight=1.0 for reporting routed-MoE savings vs. total weight size.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 34a6baf. I dropped active_costs from newly generated candidate stats instead of keeping a duplicate of costs; the active-MoE test now asserts the duplicate field is absent while preserving cost_weight for the actual cost accounting.

)

searcher.candidate_stats = candidate_stats
searcher.cost_model = search_state.get("cost_model", COST_MODEL_WEIGHT)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot comment.

This builds a stub searcher.config = {"cost_model": ...} with no "cost" key and none of the other default search-config entries. It works today only because _get_search_lower_bounds reads just cost_model. If anything in run_search_with_stats (or anything it calls) starts touching self.config["cost"] or other keys, the get_auto_quantize_config(..., constraints=...) re-solve path will silently break. Consider initializing through default_search_config or persisting/restoring the full config from search_state.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 34a6baf. _resolve_best_recipe() now starts from default_search_config and fills the persisted cost fields (cost_model, cost, active_moe_expert_ratio) instead of using a one-key stub config.

else:
raise ValueError(f"Invalid method: {method}. Valid options are 'gradient' or 'kl_div'.")

constraints = normalize_auto_quantize_constraints(model, constraints)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot comment.

Constraints are now normalized here and again in _AutoQuantizeBaseSearcher.before_search. The function is idempotent today but it's an easy place for the two sites to drift. Pick one (probably before_search, so external callers using the searcher directly also get normalized constraints) and drop the other.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 34a6baf. Normalization now happens in _AutoQuantizeBaseSearcher.before_search() only, so direct searcher callers and the public API share the same normalization path.

@meenchen

Copy link
Copy Markdown
Contributor Author

Pushed follow-up fixes for the latest review comments: 34a6baf handles the three substantive AutoQuantize cost-model comments, and 3411e4a fixes the minor API formatting issue. Validation: compileall passed, git diff --check passed, focused active-MoE/get-config tests passed, and tests/unit/torch/quantization/test_autoquant.py -q -k "not data_parallel_auto_quantize" passed with 59 passed / 1 deselected.

@shengliangxu shengliangxu left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now.

parser.error(
"--auto_quantize_active_moe_expert_ratio requires "
"--auto_quantize_cost_model active_moe."
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for this PR we should keep this, once I update the Autoquant YAML recipe PR to reflect the cost model, I will remove the CLI arg support for AutoQuant as we discussed yesterday, does that align? @meenchen , @shengliangxu ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

@juhi10071998 juhi10071998 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left a minor comment

@cjluo-nv cjluo-nv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Re-review: all important previous review comments are addressed in the current diff.

Critical comments resolved:

  • 💬 CodeRabbit's _infer_active_moe_expert_ratio cross-config pairing bug — fixed; logic moved into _auto_quantize_cost.infer_active_moe_expert_ratio which iterates configs and checks both top-k and num-experts attributes on the same config object. New test test_active_moe_ratio_requires_single_config_object covers the wrapper/text_config mismatch.
  • 💬 cjluo-nv's "active_costs is dead code" — resolved; the duplicated field was dropped from initialize_candidate_stats, and the test now asserts its absence while still validating cost_weight.
  • 💬 cjluo-nv's "constraints normalized twice" — resolved; normalization now lives only in _AutoQuantizeBaseSearcher.before_search, and auto_quantize() in model_quant.py no longer pre-normalizes.
  • 💬 cjluo-nv's "_resolve_best_recipe partial searcher config" — resolved; it now seeds from default_search_config and populates cost_model, cost, and active_moe_expert_ratio from search_state.
  • 💬 realAsma / juhi10071998 design feedback (move cost knobs into constraints, unify validation) — implemented; cost_model/cost live under constraints, and normalize_auto_quantize_constraints is the single validation entry point.
  • 💬 shengliangxu's magic-number / class-refactor asks — resolved via DEFAULT_AUTO_QUANTIZE_EFFECTIVE_BITS and the AutoQuantizeCostModel / WeightCostModel / ActiveMoECostModel hierarchy.

Minor remaining nits (non-blocking): default_state_dict hardcodes the literal "weight" instead of using COST_MODEL_WEIGHT, and _resolve_best_recipe reaches into search_state for cost fields that older checkpoints predating this PR will not contain (the .get(..., default) calls do degrade gracefully, but a checkpoint saved before this PR has no cost_denominator, so the or sum(...) fallback kicks in — that path is exercised but not unit-tested).

Test coverage is solid: routed-vs-shared expert weighting, num_experts vs num_local_experts config-attr fallback, single-config inference requirement, and lower-bound search-order preference for active-MoE are all covered.

@meenchen meenchen force-pushed the weimingc/autoquant_edge branch from 3411e4a to 0d0cef1 Compare May 29, 2026 22:35
Comment thread examples/llm_ptq/hf_ptq.py

@cjluo-nv cjluo-nv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a comment. Need clarification on the added flags

@meenchen meenchen requested a review from cjluo-nv June 1, 2026 16:42
@meenchen meenchen enabled auto-merge (squash) June 1, 2026 17:39
@meenchen

meenchen commented Jun 1, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test adbf9f0

@meenchen meenchen force-pushed the weimingc/autoquant_edge branch from adbf9f0 to 93c4138 Compare June 2, 2026 04:03
@meenchen

meenchen commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test 93c4138

meenchen added 8 commits June 2, 2026 00:23
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
@meenchen meenchen force-pushed the weimingc/autoquant_edge branch from 93c4138 to b6b8be2 Compare June 2, 2026 07:27
@meenchen meenchen requested a review from a team as a code owner June 2, 2026 07:27
@meenchen meenchen requested a review from kevalmorabia97 June 2, 2026 07:27
@kevalmorabia97

Copy link
Copy Markdown
Collaborator

/ok to test b6b8be2

@codecov

codecov Bot commented Jun 2, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 96.40288% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.29%. Comparing base (905259f) to head (b6b8be2).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/torch/quantization/_auto_quantize_cost.py 96.87% 3 Missing ⚠️
modelopt/torch/quantization/algorithms.py 94.87% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1497      +/-   ##
==========================================
- Coverage   77.38%   77.29%   -0.10%     
==========================================
  Files         479      480       +1     
  Lines       52435    52564     +129     
==========================================
+ Hits        40578    40628      +50     
- Misses      11857    11936      +79     
Flag Coverage Δ
examples 41.72% <62.58%> (+0.90%) ⬆️
gpu 59.85% <62.58%> (-0.59%) ⬇️
regression 15.23% <26.61%> (+0.10%) ⬆️
unit 53.73% <96.40%> (+0.10%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@meenchen meenchen merged commit 72df833 into main Jun 2, 2026
52 checks passed
@meenchen meenchen deleted the weimingc/autoquant_edge branch June 2, 2026 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants