Add Alpamayo-1 example#1594
Conversation
Signed-off-by: Rohan Joshi <rohjoshi@nvidia.com>
Signed-off-by: Rohan Joshi <rohjoshi@nvidia.com>
Signed-off-by: Rohan Joshi <rohjoshi@nvidia.com>
📝 WalkthroughWalkthroughThis PR adds a complete Alpamayo-R1 quantization example for ModelOpt, including documentation and a 688-line CLI script that supports both post-training quantization (FP8/NVFP4) and AutoQuantize search modes with joint VLM+diffusion calibration. ChangesAlpamayo-R1 Quantization Example
🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 3
🧹 Nitpick comments (1)
examples/alpamayo/quantize.py (1)
655-681: ⚡ Quick winGuard checkpoint writes to rank 0 for distributed-safe execution.
Current save/write operations are unconditional; in multi-rank runs, each rank may write the same files concurrently.
🛡️ Proposed fix
+ is_rank0 = ( + not torch.distributed.is_available() + or not torch.distributed.is_initialized() + or torch.distributed.get_rank() == 0 + ) @@ - os.makedirs(args.output_dir, exist_ok=True) - print(f"Saving quantized checkpoint to {args.output_dir!r} ...") + if is_rank0: + os.makedirs(args.output_dir, exist_ok=True) + print(f"Saving quantized checkpoint to {args.output_dir!r} ...") @@ - with torch.inference_mode(): - model.save_pretrained(args.output_dir) - processor.save_pretrained(args.output_dir) - model.config.save_pretrained(args.output_dir) + if is_rank0: + with torch.inference_mode(): + model.save_pretrained(args.output_dir) + processor.save_pretrained(args.output_dir) + model.config.save_pretrained(args.output_dir) @@ - with torch.inference_mode(): - model.save_pretrained(args.output_dir) - - processor.save_pretrained(args.output_dir) - model.config.save_pretrained(args.output_dir) - - quant_cfg = get_quant_config(model) - with open(os.path.join(args.output_dir, "hf_quant_config.json"), "w") as f: - json.dump(quant_cfg, f) + if is_rank0: + with torch.inference_mode(): + model.save_pretrained(args.output_dir) + processor.save_pretrained(args.output_dir) + model.config.save_pretrained(args.output_dir) + quant_cfg = get_quant_config(model) + with open(os.path.join(args.output_dir, "hf_quant_config.json"), "w") as f: + json.dump(quant_cfg, f) + + if torch.distributed.is_available() and torch.distributed.is_initialized(): + torch.distributed.barrier()As per coding guidelines "Develop with distributed processing in mind ... guarding shared side effects such as file writes ... against race conditions between ranks."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/alpamayo/quantize.py` around lines 655 - 681, The saves under both the real_quant and else branches (mtq.compress + model.save_pretrained, processor.save_pretrained, model.config.save_pretrained, and writing hf_quant_config.json from get_quant_config) must be guarded so only the main process/rank 0 performs file writes in distributed runs; detect distributed mode via torch.distributed.is_initialized() and torch.distributed.get_rank() (or the project’s existing is_main_process helper) and wrap all save_pretrained calls and the json file write in a conditional that only runs on rank 0 to avoid concurrent writes to args.output_dir.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@examples/alpamayo/quantize.py`:
- Around line 346-355: The function read_clip_ids_from_parquet currently indexes
cols_lower["key"] which can KeyError; update it to search cols_lower for common
column names (e.g., "key", "clip_id", "clipid", "id", "filename") and use the
first match if present, otherwise fall back to using df.index.astype(str). After
obtaining the list, convert to strings and deduplicate while preserving first
occurrence order (use a seen set/ordered iteration); reference
read_clip_ids_from_parquet and the cols_lower variable to locate the change.
- Around line 27-55: Add an explicit __all__ list at module top to define the
public API; create a single-line assignment __all__ = [...] after the imports
that enumerates the module's public helpers and re-exports (e.g.,
"load_physical_aiavdataset", "AlpamayoR1", "to_special_token",
"create_forward_loop", "get_dataset_dataloader", "export_hf_checkpoint",
"get_quant_config", "mtq", "patch_pretrained_methods", "logger" — adjust to the
actual names this module exposes) and keep it updated when symbols change so
star-import behavior is explicit.
- Around line 500-512: The prints in the hot loop (inside forward_step's
tensor-stat print and loss_func) call .item() on tensors every iteration,
causing unwanted CUDA synchronizations; wrap these debug prints so they only run
when args.debug (and optionally only on the main rank) is true, and compute
.item() values only inside that conditional; update the debug logging in
forward_step (the v_pred/v_target print) and in loss_func (loss print) to be
gated by args.debug and a rank check (e.g., only on rank 0 or when not
distributed) so normal runs avoid per-batch .item() calls and GPU syncs.
---
Nitpick comments:
In `@examples/alpamayo/quantize.py`:
- Around line 655-681: The saves under both the real_quant and else branches
(mtq.compress + model.save_pretrained, processor.save_pretrained,
model.config.save_pretrained, and writing hf_quant_config.json from
get_quant_config) must be guarded so only the main process/rank 0 performs file
writes in distributed runs; detect distributed mode via
torch.distributed.is_initialized() and torch.distributed.get_rank() (or the
project’s existing is_main_process helper) and wrap all save_pretrained calls
and the json file write in a conditional that only runs on rank 0 to avoid
concurrent writes to args.output_dir.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 81978cc3-c66c-4b24-8be8-fe303abf7d0d
⛔ Files ignored due to path filters (1)
examples/alpamayo/0417_16rows_train_set_for_calibration_25.10.parquetis excluded by!**/*.parquet
📒 Files selected for processing (2)
examples/alpamayo/README.mdexamples/alpamayo/quantize.py
| import argparse | ||
| import collections.abc | ||
| import copy | ||
| import json | ||
| import logging | ||
| import os | ||
| from pathlib import Path | ||
| from typing import Any | ||
|
|
||
| import einops | ||
| import pandas as pd | ||
| import torch | ||
| from alpamayo_r1.load_physical_aiavdataset import load_physical_aiavdataset | ||
| from alpamayo_r1.models.alpamayo_r1 import AlpamayoR1 | ||
| from alpamayo_r1.models.token_utils import to_special_token | ||
| from tqdm import tqdm | ||
| from transformers import AutoProcessor, AutoTokenizer | ||
|
|
||
| import modelopt.torch.quantization as mtq | ||
| from modelopt.torch.export import export_hf_checkpoint | ||
| from modelopt.torch.export.quant_utils import get_quant_config | ||
| from modelopt.torch.opt.plugins.huggingface import ( | ||
| _LIBRARY_CLASSES_FOR_PATCHING, | ||
| _PATCHED_CLASSES, | ||
| patch_pretrained_methods, | ||
| ) | ||
| from modelopt.torch.utils.dataset_utils import create_forward_loop, get_dataset_dataloader | ||
|
|
||
| logger = logging.getLogger(__name__) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Add explicit module exports with __all__.
This module introduces several public helpers but does not define __all__, which makes star-import behavior implicit.
♻️ Proposed fix
+__all__ = [
+ "create_message",
+ "get_processor",
+ "to_device",
+ "read_clip_ids_from_parquet",
+ "quantize_model",
+ "auto_quantize_model",
+ "main",
+]As per coding guidelines "**/*.py: Define the public API with __all__ at the top of each Python module..."
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import argparse | |
| import collections.abc | |
| import copy | |
| import json | |
| import logging | |
| import os | |
| from pathlib import Path | |
| from typing import Any | |
| import einops | |
| import pandas as pd | |
| import torch | |
| from alpamayo_r1.load_physical_aiavdataset import load_physical_aiavdataset | |
| from alpamayo_r1.models.alpamayo_r1 import AlpamayoR1 | |
| from alpamayo_r1.models.token_utils import to_special_token | |
| from tqdm import tqdm | |
| from transformers import AutoProcessor, AutoTokenizer | |
| import modelopt.torch.quantization as mtq | |
| from modelopt.torch.export import export_hf_checkpoint | |
| from modelopt.torch.export.quant_utils import get_quant_config | |
| from modelopt.torch.opt.plugins.huggingface import ( | |
| _LIBRARY_CLASSES_FOR_PATCHING, | |
| _PATCHED_CLASSES, | |
| patch_pretrained_methods, | |
| ) | |
| from modelopt.torch.utils.dataset_utils import create_forward_loop, get_dataset_dataloader | |
| logger = logging.getLogger(__name__) | |
| import argparse | |
| import collections.abc | |
| import copy | |
| import json | |
| import logging | |
| import os | |
| from pathlib import Path | |
| from typing import Any | |
| import einops | |
| import pandas as pd | |
| import torch | |
| from alpamayo_r1.load_physical_aiavdataset import load_physical_aiavdataset | |
| from alpamayo_r1.models.alpamayo_r1 import AlpamayoR1 | |
| from alpamayo_r1.models.token_utils import to_special_token | |
| from tqdm import tqdm | |
| from transformers import AutoProcessor, AutoTokenizer | |
| import modelopt.torch.quantization as mtq | |
| from modelopt.torch.export import export_hf_checkpoint | |
| from modelopt.torch.export.quant_utils import get_quant_config | |
| from modelopt.torch.opt.plugins.huggingface import ( | |
| _LIBRARY_CLASSES_FOR_PATCHING, | |
| _PATCHED_CLASSES, | |
| patch_pretrained_methods, | |
| ) | |
| from modelopt.torch.utils.dataset_utils import create_forward_loop, get_dataset_dataloader | |
| __all__ = [ | |
| "create_message", | |
| "get_processor", | |
| "to_device", | |
| "read_clip_ids_from_parquet", | |
| "quantize_model", | |
| "auto_quantize_model", | |
| "main", | |
| ] | |
| logger = logging.getLogger(__name__) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/alpamayo/quantize.py` around lines 27 - 55, Add an explicit __all__
list at module top to define the public API; create a single-line assignment
__all__ = [...] after the imports that enumerates the module's public helpers
and re-exports (e.g., "load_physical_aiavdataset", "AlpamayoR1",
"to_special_token", "create_forward_loop", "get_dataset_dataloader",
"export_hf_checkpoint", "get_quant_config", "mtq", "patch_pretrained_methods",
"logger" — adjust to the actual names this module exposes) and keep it updated
when symbols change so star-import behavior is explicit.
| def read_clip_ids_from_parquet(parquet_path: str) -> list[str]: | ||
| """ | ||
| Reads clip_ids from parquet. Tries common column names; falls back to index if needed. | ||
| Returns clip_ids as a list of strings (unique, preserving first occurrence order). | ||
| """ | ||
| parquet_path = str(parquet_path) | ||
| df = pd.read_parquet(parquet_path) | ||
| cols_lower = {c.lower(): c for c in df.columns} | ||
| clip_ids = df[cols_lower["key"]].astype(str).tolist() | ||
|
|
There was a problem hiding this comment.
read_clip_ids_from_parquet can raise KeyError despite documented fallbacks.
Line 354 directly indexes cols_lower["key"], but the docstring says common names and index fallback are handled. This can hard-fail on valid parquet schemas.
🐛 Proposed fix
def read_clip_ids_from_parquet(parquet_path: str) -> list[str]:
@@
parquet_path = str(parquet_path)
df = pd.read_parquet(parquet_path)
cols_lower = {c.lower(): c for c in df.columns}
- clip_ids = df[cols_lower["key"]].astype(str).tolist()
+ candidate_cols = ("key", "clip_id", "clipid", "id")
+ selected_col = next((cols_lower[c] for c in candidate_cols if c in cols_lower), None)
+ if selected_col is not None:
+ clip_ids = df[selected_col].astype(str).tolist()
+ else:
+ clip_ids = df.index.astype(str).tolist()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/alpamayo/quantize.py` around lines 346 - 355, The function
read_clip_ids_from_parquet currently indexes cols_lower["key"] which can
KeyError; update it to search cols_lower for common column names (e.g., "key",
"clip_id", "clipid", "id", "filename") and use the first match if present,
otherwise fall back to using df.index.astype(str). After obtaining the list,
convert to strings and deduplicate while preserving first occurrence order (use
a seen set/ordered iteration); reference read_clip_ids_from_parquet and the
cols_lower variable to locate the change.
| print( | ||
| f"[autoquant-fwd] v_pred: finite={torch.isfinite(v_pred).all().item()} " | ||
| f"min={v_pred.min().item():.4g} max={v_pred.max().item():.4g} " | ||
| f"abs_mean={v_pred.abs().mean().item():.4g} | " | ||
| f"v_target: finite={torch.isfinite(v_target).all().item()} " | ||
| f"min={v_target.min().item():.4g} max={v_target.max().item():.4g}" | ||
| ) | ||
| return out | ||
|
|
||
| def loss_func(output, batch): | ||
| loss = torch.nn.functional.mse_loss(output["v_pred"], output["v_target"]) | ||
| print(f"[autoquant-loss] loss={loss.item():.6g} finite={torch.isfinite(loss).item()}") | ||
| return loss |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
ls -la
fd -t f "examples/alpamayo/quantize.py" . || true
sed -n '450,560p' examples/alpamayo/quantize.py
rg -n "args\.debug|logger\.|print_rank_0|warn_rank_0|autoquant-fwd|\[autoquant-loss\]" examples/alpamayo/quantize.py
rg -n "def forward_step|def loss_func" examples/alpamayo/quantize.pyRepository: NVIDIA/Model-Optimizer
Length of output: 7167
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '480,530p' examples/alpamayo/quantize.py
sed -n '530,590p' examples/alpamayo/quantize.py
rg -n "autoquant-fwd|autoquant-loss|print\(|logger\.|print_rank_0|warn_rank_0|args\.debug" examples/alpamayo/quantize.py
rg -n "def forward_step|def loss_func" examples/alpamayo/quantize.pyRepository: NVIDIA/Model-Optimizer
Length of output: 5442
🏁 Script executed:
sed -n '1,120p' examples/alpamayo/quantize.py
rg -n "print_rank_0|warn_rank_0" examples/alpamayo/quantize.py || true
rg -n "^logger\\b|logger\\s*=|getLogger" examples/alpamayo/quantize.py || true
sed -n '380,520p' examples/alpamayo/quantize.pyRepository: NVIDIA/Model-Optimizer
Length of output: 10148
Unconditional .item()-based tensor stats in AutoQuantize hot loop trigger CUDA syncs (even when args.debug is false).
forward_step() and loss_func() print every iteration and call .item() on torch.isfinite(...).all(), min()/max()/mean(), and loss, forcing per-batch CPU-GPU synchronization. Gate these prints behind args.debug (and use rank-aware logging if distributed).
⚡ Proposed fix
def forward_step(runtime_model, data):
with torch.autocast("cuda", dtype=torch.bfloat16):
out = runtime_model.teacher_forced_flow_loss_forward(data=data)
- v_pred, v_target = out["v_pred"], out["v_target"]
- print(
- f"[autoquant-fwd] v_pred: finite={torch.isfinite(v_pred).all().item()} "
- f"min={v_pred.min().item():.4g} max={v_pred.max().item():.4g} "
- f"abs_mean={v_pred.abs().mean().item():.4g} | "
- f"v_target: finite={torch.isfinite(v_target).all().item()} "
- f"min={v_target.min().item():.4g} max={v_target.max().item():.4g}"
- )
+ if args.debug:
+ v_pred, v_target = out["v_pred"], out["v_target"]
+ if not bool(torch.isfinite(v_pred).all() and torch.isfinite(v_target).all()):
+ logger.warning("[autoquant-fwd] Non-finite tensor detected.")
return out
def loss_func(output, batch):
loss = torch.nn.functional.mse_loss(output["v_pred"], output["v_target"])
- print(f"[autoquant-loss] loss={loss.item():.6g} finite={torch.isfinite(loss).item()}")
+ if args.debug and not bool(torch.isfinite(loss)):
+ logger.warning("[autoquant-loss] Non-finite loss detected.")
return loss🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/alpamayo/quantize.py` around lines 500 - 512, The prints in the hot
loop (inside forward_step's tensor-stat print and loss_func) call .item() on
tensors every iteration, causing unwanted CUDA synchronizations; wrap these
debug prints so they only run when args.debug (and optionally only on the main
rank) is true, and compute .item() values only inside that conditional; update
the debug logging in forward_step (the v_pred/v_target print) and in loss_func
(loss print) to be gated by args.debug and a rank check (e.g., only on rank 0 or
when not distributed) so normal runs avoid per-batch .item() calls and GPU
syncs.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1594 +/- ##
==========================================
+ Coverage 73.23% 73.60% +0.36%
==========================================
Files 479 479
Lines 52435 52435
==========================================
+ Hits 38401 38595 +194
+ Misses 14034 13840 -194
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
cjluo-nv
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
New ~700-line example. A few real concerns:
-
Likely bug — backslash-escaped special tokens. In
create_message, the f-strings use<\|traj_history_start\|>,<\|traj_history\|>,<\|traj_history_end\|>,<\|cot_start\|>. In Python\|is not a recognized escape sequence, so the literal string contains a backslash before each pipe. Qwen-family special tokens are<|...|>without backslashes — passing the backslash form toapply_chat_templatewill not match the tokenizer's added special tokens and these will tokenize as ordinary text, silently corrupting calibration. Looks like a copy-paste from a markdown-escaped source. (Bonus: these will start emittingSyntaxWarning: invalid escape sequenceon newer Python.) -
Reaches into private API instead of using the public one.
enable_huggingface_checkpointing_patch()re-implementsmto.enable_huggingface_checkpointing()by importing the private_LIBRARY_CLASSES_FOR_PATCHING,_PATCHED_CLASSES, andpatch_pretrained_methodsfrommodelopt.torch.opt.plugins.huggingface, just to skip the_from_configpatch. If the only goal is to skip_from_config, please make that an option on the public API (or document why this divergence is needed) instead of duplicating the loop with private symbols. Also, the file does this at import time as a top-level side effect, which is surprising. -
Monkey-patching
AlpamayoR1.teacher_forced_flow_loss_forwardat module import time is acknowledged in a docstring but it's a strong coupling: the body is a verbatim copy of an internal training-fork method, and any drift innvlabs/alpamayo(e.g.action_in_projsignature,expert_non_causal_attention,action_spaceAPI) will silently produce wrong calibration. At minimum: assert that the publicAlpamayoR1exposes the methods/attrs this body relies on, and pin a known-good Alpamayo SHA in the README setup section instead ofgit cloneofmain. -
read_clip_ids_from_parquetdoesn't match its docstring. The docstring says it "tries common column names; falls back to index if needed", but the implementation hard-codescols_lower["key"]and willKeyErroron any parquet without a literalkeycolumn. Either implement the documented fallback or simplify the docstring. -
Binary parquet checked into the repo. A
.parquetcalibration set is being added toexamples/alpamayo/. The repo doesn't appear to use git-LFS (.gitattributesis missing). Even if this file is small, distributing calibration data through the example tree is unusual for this repo — most examples download from HF on demand. Please confirm the file size is small and that there are no licensing/redistribution restrictions on the AV calibration clips list, or fetch it at runtime instead. -
No tests / no CHANGELOG entry. The PR template's test and changelog checkboxes are still unfilled. New examples in this repo (
vlm_ptq, etc.) typically don't get unit tests, but at least a CHANGELOG entry under "New Features" is expected. -
Minor:
args.parquetdefault is the bare filename, joined toscript_dir; if the user passes an absolute path it works (pathlib handles that), but a plain relative path in the user's CWD will be silently resolved against the script's directory, which is surprising. Consider documenting this or usingPath(args.parquet)as-is when absolute/exists, else fall back toscript_dir / args.parquet.
| { | ||
| "role": "system", | ||
| "content": [ | ||
| { |
There was a problem hiding this comment.
Bot comment.
These tokens contain literal backslashes (<\|traj_history_start\|>, etc.) — \| is not an escape sequence in Python, so the string literal keeps the backslash. Qwen-family special tokens are <|traj_history_start|> (no backslash); passing the backslash form to apply_chat_template will tokenize them as ordinary text and silently break calibration. Please drop the backslashes (and on 3.12+ this also avoids the SyntaxWarning: invalid escape sequence).
| """Recursively cast data into the specified device, dtype.""" | ||
| if isinstance(data, torch.Tensor): | ||
| data = data.to( | ||
| device=device, |
There was a problem hiding this comment.
Bot comment.
This duplicates mto.enable_huggingface_checkpointing() and pulls in three private symbols (_LIBRARY_CLASSES_FOR_PATCHING, _PATCHED_CLASSES, patch_pretrained_methods) just to skip the _from_config patch. Please use the public API, or — if the _from_config patch genuinely breaks AlpamayoR1 — add a public option to skip it rather than reaching into module internals from an example. Also, calling this at top-level import is a surprising side effect; prefer calling it from main().
|
|
||
|
|
||
| def patch_teacher_forced_flow_loss_forward() -> None: | ||
| """Attach teacher_forced_flow_loss_forward to AlpamayoR1 if missing. |
There was a problem hiding this comment.
Bot comment.
Module-level monkey-patching of an upstream class is fragile: the body assumes specific attrs/methods on AlpamayoR1 (action_in_proj signature, expert_non_causal_attention, fuse_traj_tokens, action_space.traj_to_action, action_space.get_action_space_dims, vlm.model.rope_deltas, etc.). If the public nvlabs/alpamayo drifts, this will silently produce wrong gradients/loss for AutoQuantize. Please (a) pin an Alpamayo SHA in the README setup, and (b) add explicit hasattr assertions before patching so a drift fails loudly.
| max_generation_length: int, | ||
| calibration_traj_samples: int, | ||
| device: str, | ||
| ): |
There was a problem hiding this comment.
Bot comment.
Docstring says "Tries common column names; falls back to index if needed", but the code only handles a literal key column and will KeyError on anything else. Either implement the fallback (e.g. try clip_id/clipid/index) or rewrite the docstring to match the actual behavior.
| from modelopt.torch.export.quant_utils import get_quant_config | ||
| from modelopt.torch.opt.plugins.huggingface import ( | ||
| _LIBRARY_CLASSES_FOR_PATCHING, | ||
| _PATCHED_CLASSES, |
There was a problem hiding this comment.
Bot comment.
assert torch.ops.tensorrt.quantize_op.default inside a try/except that only logs a warning on failure: this won't actually stop a broken install, and the assert form is also unusual here. Consider a plain attribute check + clearer error message, or remove this block (mtq.quantize will fail loudly later anyway).
| return_tensors="pt", | ||
| ) | ||
| model_inputs = { | ||
| "tokenized_data": inputs, |
There was a problem hiding this comment.
Bot comment.
disabled_layers here uses fnmatch wildcards (per auto_quantize docs), but the literal module names appended below (_name) are not wildcards. The corresponding exclusion in quantize_model uses f"{_name}.*" — please be consistent. If you want exact-name matching here, prepend/append * or document the difference; otherwise an unwrapped name like vlm.foo.bar may not match the way you expect against quantizer names like vlm.foo.bar.weight_quantizer.
|
|
||
| ```bash | ||
| git clone https://github.com/nvlabs/alpamayo | ||
| pip install ./alpamayo |
There was a problem hiding this comment.
Bot comment.
Setup says git clone https://github.com/nvlabs/alpamayo without pinning a commit. Given that this example monkey-patches an internal method onto AlpamayoR1, please pin a known-good SHA so future upstream changes don't silently break calibration.
What does this PR do?
Type of change: ? New example
Adds example for Alpamayo-1 quantization with ModelOpt (FP8, NVFP4, AutoQuant)
Usage
Testing
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: ✅ / ❌ / N/AAdditional Information
Summary by CodeRabbit
Documentation
New Features