Add Alpamayo-1 example by rohansjoshi · Pull Request #1594 · NVIDIA/Model-Optimizer

rohansjoshi · 2026-06-01T22:49:53Z

What does this PR do?

Type of change: ? New example

Adds example for Alpamayo-1 quantization with ModelOpt (FP8, NVFP4, AutoQuant)

Usage

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅ / ❌ / N/A
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
Did you write any new necessary tests?: ✅ / ❌ / N/A
Did you update Changelog?: ✅ / ❌ / N/A
Did you get Claude approval on this PR?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

Documentation
- Added comprehensive guide for Alpamayo model quantization, including setup and usage instructions.
New Features
- Quantization tool supporting multiple compression formats for optimized model deployment.
- Automatic quantization mode with intelligent per-layer format selection.
- Calibration dataset preparation workflow from parquet data sources.

Signed-off-by: Rohan Joshi <rohjoshi@nvidia.com>

coderabbitai · 2026-06-01T22:50:03Z

📝 Walkthrough

Walkthrough

This PR adds a complete Alpamayo-R1 quantization example for ModelOpt, including documentation and a 688-line CLI script that supports both post-training quantization (FP8/NVFP4) and AutoQuantize search modes with joint VLM+diffusion calibration.

Changes

Alpamayo-R1 Quantization Example

Layer / File(s)	Summary
Documentation and overview `examples/alpamayo/README.md`	README documenting the Alpamayo 1 quantization example, calibration dataset expectations, fake-quant vs real-quant modes, precision exceptions (vision tower, action-projection heads), and AutoQuantize behavior with effective-bits budgeting.
Script initialization and utilities `examples/alpamayo/quantize.py`	Module scaffolding, global constants for processor pixel constraints, and utility functions for chat message construction, processor configuration, and recursive device/dtype casting of nested data structures.
Calibration infrastructure and patching `examples/alpamayo/quantize.py`	HuggingFace checkpointing patch enabling ModelOpt state restoration during `from_pretrained`/`save_pretrained`, and monkeypatch of `teacher_forced_flow_loss_forward` that computes diffusion-style calibration outputs (`v_pred`/`v_target`) using cached VLM prompt KV and non-autoregressive expert prediction.
Calibration data pipeline `examples/alpamayo/quantize.py`	Joint calibration forward loop that iterates over clip IDs, loads frame/history/future tensors from parquet dataset, applies processor chat template, moves inputs to target device, and runs VLM rollout trajectory sampling; includes parquet clip ID reader with deduplication and column name flexibility.
Quantization implementations `examples/alpamayo/quantize.py`	Two quantization modes: ModelOpt PTQ with FP8/NVFP4 configs, vision tower and dimension-based exclusions for real-quant GEMM constraints, and configurable calibration loop; and ModelOpt AutoQuantize search that runs differentiable forward passes, minimizes MSE velocity field loss between predicted and target, and respects layer exclusions under effective-bits constraint.
CLI orchestration and checkpoint export `examples/alpamayo/quantize.py`	Main CLI entry point that parses arguments for model checkpoint, quantization mode (fp8/nvfp4/auto), and calibration parameters; loads model and processor, reads calibration clip IDs from parquet with truncation, selects PTQ or AutoQuantize path (building joint calibration loop when needed), saves quantized model with processor/config, optionally compresses to packed real-quant weights via `mtq.compress`, and writes `hf_quant_config.json`.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Add Alpamayo-1 example' directly and clearly describes the main change: adding a new example for Alpamayo-1 quantization including README and quantize.py script.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	PR adds only README and quantize.py example. No critical security anti-patterns found: no unsafe torch.load, numpy.load, trust_remote_code, eval/exec, nosec comments, or new PIP dependencies.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch rohjoshi/alpamayo-example

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 3

🧹 Nitpick comments (1)

examples/alpamayo/quantize.py (1)

655-681: ⚡ Quick win

Guard checkpoint writes to rank 0 for distributed-safe execution.

Current save/write operations are unconditional; in multi-rank runs, each rank may write the same files concurrently.

🛡️ Proposed fix

+    is_rank0 = (
+        not torch.distributed.is_available()
+        or not torch.distributed.is_initialized()
+        or torch.distributed.get_rank() == 0
+    )
@@
-    os.makedirs(args.output_dir, exist_ok=True)
-    print(f"Saving quantized checkpoint to {args.output_dir!r} ...")
+    if is_rank0:
+        os.makedirs(args.output_dir, exist_ok=True)
+        print(f"Saving quantized checkpoint to {args.output_dir!r} ...")
@@
-        with torch.inference_mode():
-            model.save_pretrained(args.output_dir)
-        processor.save_pretrained(args.output_dir)
-        model.config.save_pretrained(args.output_dir)
+        if is_rank0:
+            with torch.inference_mode():
+                model.save_pretrained(args.output_dir)
+            processor.save_pretrained(args.output_dir)
+            model.config.save_pretrained(args.output_dir)
@@
-        with torch.inference_mode():
-            model.save_pretrained(args.output_dir)
-
-        processor.save_pretrained(args.output_dir)
-        model.config.save_pretrained(args.output_dir)
-
-        quant_cfg = get_quant_config(model)
-        with open(os.path.join(args.output_dir, "hf_quant_config.json"), "w") as f:
-            json.dump(quant_cfg, f)
+        if is_rank0:
+            with torch.inference_mode():
+                model.save_pretrained(args.output_dir)
+            processor.save_pretrained(args.output_dir)
+            model.config.save_pretrained(args.output_dir)
+            quant_cfg = get_quant_config(model)
+            with open(os.path.join(args.output_dir, "hf_quant_config.json"), "w") as f:
+                json.dump(quant_cfg, f)
+
+    if torch.distributed.is_available() and torch.distributed.is_initialized():
+        torch.distributed.barrier()

As per coding guidelines "Develop with distributed processing in mind ... guarding shared side effects such as file writes ... against race conditions between ranks."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/alpamayo/quantize.py` around lines 655 - 681, The saves under both
the real_quant and else branches (mtq.compress + model.save_pretrained,
processor.save_pretrained, model.config.save_pretrained, and writing
hf_quant_config.json from get_quant_config) must be guarded so only the main
process/rank 0 performs file writes in distributed runs; detect distributed mode
via torch.distributed.is_initialized() and torch.distributed.get_rank() (or the
project’s existing is_main_process helper) and wrap all save_pretrained calls
and the json file write in a conditional that only runs on rank 0 to avoid
concurrent writes to args.output_dir.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/alpamayo/quantize.py`:
- Around line 346-355: The function read_clip_ids_from_parquet currently indexes
cols_lower["key"] which can KeyError; update it to search cols_lower for common
column names (e.g., "key", "clip_id", "clipid", "id", "filename") and use the
first match if present, otherwise fall back to using df.index.astype(str). After
obtaining the list, convert to strings and deduplicate while preserving first
occurrence order (use a seen set/ordered iteration); reference
read_clip_ids_from_parquet and the cols_lower variable to locate the change.
- Around line 27-55: Add an explicit __all__ list at module top to define the
public API; create a single-line assignment __all__ = [...] after the imports
that enumerates the module's public helpers and re-exports (e.g.,
"load_physical_aiavdataset", "AlpamayoR1", "to_special_token",
"create_forward_loop", "get_dataset_dataloader", "export_hf_checkpoint",
"get_quant_config", "mtq", "patch_pretrained_methods", "logger" — adjust to the
actual names this module exposes) and keep it updated when symbols change so
star-import behavior is explicit.
- Around line 500-512: The prints in the hot loop (inside forward_step's
tensor-stat print and loss_func) call .item() on tensors every iteration,
causing unwanted CUDA synchronizations; wrap these debug prints so they only run
when args.debug (and optionally only on the main rank) is true, and compute
.item() values only inside that conditional; update the debug logging in
forward_step (the v_pred/v_target print) and in loss_func (loss print) to be
gated by args.debug and a rank check (e.g., only on rank 0 or when not
distributed) so normal runs avoid per-batch .item() calls and GPU syncs.

---

Nitpick comments:
In `@examples/alpamayo/quantize.py`:
- Around line 655-681: The saves under both the real_quant and else branches
(mtq.compress + model.save_pretrained, processor.save_pretrained,
model.config.save_pretrained, and writing hf_quant_config.json from
get_quant_config) must be guarded so only the main process/rank 0 performs file
writes in distributed runs; detect distributed mode via
torch.distributed.is_initialized() and torch.distributed.get_rank() (or the
project’s existing is_main_process helper) and wrap all save_pretrained calls
and the json file write in a conditional that only runs on rank 0 to avoid
concurrent writes to args.output_dir.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 81978cc3-c66c-4b24-8be8-fe303abf7d0d

📥 Commits

Reviewing files that changed from the base of the PR and between 905259f and 6cc978e.

⛔ Files ignored due to path filters (1)

examples/alpamayo/0417_16rows_train_set_for_calibration_25.10.parquet is excluded by !**/*.parquet

📒 Files selected for processing (2)

examples/alpamayo/README.md
examples/alpamayo/quantize.py

coderabbitai · 2026-06-01T22:57:13Z

+import argparse
+import collections.abc
+import copy
+import json
+import logging
+import os
+from pathlib import Path
+from typing import Any
+
+import einops
+import pandas as pd
+import torch
+from alpamayo_r1.load_physical_aiavdataset import load_physical_aiavdataset
+from alpamayo_r1.models.alpamayo_r1 import AlpamayoR1
+from alpamayo_r1.models.token_utils import to_special_token
+from tqdm import tqdm
+from transformers import AutoProcessor, AutoTokenizer
+
+import modelopt.torch.quantization as mtq
+from modelopt.torch.export import export_hf_checkpoint
+from modelopt.torch.export.quant_utils import get_quant_config
+from modelopt.torch.opt.plugins.huggingface import (
+    _LIBRARY_CLASSES_FOR_PATCHING,
+    _PATCHED_CLASSES,
+    patch_pretrained_methods,
+)
+from modelopt.torch.utils.dataset_utils import create_forward_loop, get_dataset_dataloader
+
+logger = logging.getLogger(__name__)


🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add explicit module exports with __all__.

This module introduces several public helpers but does not define __all__, which makes star-import behavior implicit.

♻️ Proposed fix

+__all__ = [ + "create_message", + "get_processor", + "to_device", + "read_clip_ids_from_parquet", + "quantize_model", + "auto_quantize_model", + "main", +]

As per coding guidelines "**/*.py: Define the public API with __all__ at the top of each Python module..."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

import argparse

import collections.abc

import copy

import json

import logging

import os

from pathlib import Path

from typing import Any

import einops

import pandas as pd

import torch

from alpamayo_r1.load_physical_aiavdataset import load_physical_aiavdataset

from alpamayo_r1.models.alpamayo_r1 import AlpamayoR1

from alpamayo_r1.models.token_utils import to_special_token

from tqdm import tqdm

from transformers import AutoProcessor, AutoTokenizer

import modelopt.torch.quantization as mtq

from modelopt.torch.export import export_hf_checkpoint

from modelopt.torch.export.quant_utils import get_quant_config

from modelopt.torch.opt.plugins.huggingface import (

_LIBRARY_CLASSES_FOR_PATCHING,

_PATCHED_CLASSES,

patch_pretrained_methods,

)

from modelopt.torch.utils.dataset_utils import create_forward_loop, get_dataset_dataloader

logger = logging.getLogger(__name__)

import argparse

import collections.abc

import copy

import json

import logging

import os

from pathlib import Path

from typing import Any

import einops

import pandas as pd

import torch

from alpamayo_r1.load_physical_aiavdataset import load_physical_aiavdataset

from alpamayo_r1.models.alpamayo_r1 import AlpamayoR1

from alpamayo_r1.models.token_utils import to_special_token

from tqdm import tqdm

from transformers import AutoProcessor, AutoTokenizer

import modelopt.torch.quantization as mtq

from modelopt.torch.export import export_hf_checkpoint

from modelopt.torch.export.quant_utils import get_quant_config

from modelopt.torch.opt.plugins.huggingface import (

_LIBRARY_CLASSES_FOR_PATCHING,

_PATCHED_CLASSES,

patch_pretrained_methods,

)

from modelopt.torch.utils.dataset_utils import create_forward_loop, get_dataset_dataloader

__all__ = [

"create_message",

"get_processor",

"to_device",

"read_clip_ids_from_parquet",

"quantize_model",

"auto_quantize_model",

"main",

]

logger = logging.getLogger(__name__)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/alpamayo/quantize.py` around lines 27 - 55, Add an explicit __all__ list at module top to define the public API; create a single-line assignment __all__ = [...] after the imports that enumerates the module's public helpers and re-exports (e.g., "load_physical_aiavdataset", "AlpamayoR1", "to_special_token", "create_forward_loop", "get_dataset_dataloader", "export_hf_checkpoint", "get_quant_config", "mtq", "patch_pretrained_methods", "logger" — adjust to the actual names this module exposes) and keep it updated when symbols change so star-import behavior is explicit.

coderabbitai · 2026-06-01T22:57:14Z

+def read_clip_ids_from_parquet(parquet_path: str) -> list[str]:
+    """
+    Reads clip_ids from parquet. Tries common column names; falls back to index if needed.
+    Returns clip_ids as a list of strings (unique, preserving first occurrence order).
+    """
+    parquet_path = str(parquet_path)
+    df = pd.read_parquet(parquet_path)
+    cols_lower = {c.lower(): c for c in df.columns}
+    clip_ids = df[cols_lower["key"]].astype(str).tolist()
+


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

read_clip_ids_from_parquet can raise KeyError despite documented fallbacks.

Line 354 directly indexes cols_lower["key"], but the docstring says common names and index fallback are handled. This can hard-fail on valid parquet schemas.

🐛 Proposed fix

def read_clip_ids_from_parquet(parquet_path: str) -> list[str]: @@ parquet_path = str(parquet_path) df = pd.read_parquet(parquet_path) cols_lower = {c.lower(): c for c in df.columns} - clip_ids = df[cols_lower["key"]].astype(str).tolist() + candidate_cols = ("key", "clip_id", "clipid", "id") + selected_col = next((cols_lower[c] for c in candidate_cols if c in cols_lower), None) + if selected_col is not None: + clip_ids = df[selected_col].astype(str).tolist() + else: + clip_ids = df.index.astype(str).tolist()

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/alpamayo/quantize.py` around lines 346 - 355, The function read_clip_ids_from_parquet currently indexes cols_lower["key"] which can KeyError; update it to search cols_lower for common column names (e.g., "key", "clip_id", "clipid", "id", "filename") and use the first match if present, otherwise fall back to using df.index.astype(str). After obtaining the list, convert to strings and deduplicate while preserving first occurrence order (use a seen set/ordered iteration); reference read_clip_ids_from_parquet and the cols_lower variable to locate the change.

coderabbitai · 2026-06-01T22:57:14Z

+        print(
+            f"[autoquant-fwd] v_pred: finite={torch.isfinite(v_pred).all().item()} "
+            f"min={v_pred.min().item():.4g} max={v_pred.max().item():.4g} "
+            f"abs_mean={v_pred.abs().mean().item():.4g} | "
+            f"v_target: finite={torch.isfinite(v_target).all().item()} "
+            f"min={v_target.min().item():.4g} max={v_target.max().item():.4g}"
+        )
+        return out
+
+    def loss_func(output, batch):
+        loss = torch.nn.functional.mse_loss(output["v_pred"], output["v_target"])
+        print(f"[autoquant-loss] loss={loss.item():.6g} finite={torch.isfinite(loss).item()}")
+        return loss


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail ls -la fd -t f "examples/alpamayo/quantize.py" . || true sed -n '450,560p' examples/alpamayo/quantize.py rg -n "args\.debug|logger\.|print_rank_0|warn_rank_0|autoquant-fwd|\[autoquant-loss\]" examples/alpamayo/quantize.py rg -n "def forward_step|def loss_func" examples/alpamayo/quantize.py

Repository: NVIDIA/Model-Optimizer

Length of output: 7167

🏁 Script executed:

#!/bin/bash set -euo pipefail sed -n '480,530p' examples/alpamayo/quantize.py sed -n '530,590p' examples/alpamayo/quantize.py rg -n "autoquant-fwd|autoquant-loss|print\(|logger\.|print_rank_0|warn_rank_0|args\.debug" examples/alpamayo/quantize.py rg -n "def forward_step|def loss_func" examples/alpamayo/quantize.py

Repository: NVIDIA/Model-Optimizer

Length of output: 5442

🏁 Script executed:

sed -n '1,120p' examples/alpamayo/quantize.py rg -n "print_rank_0|warn_rank_0" examples/alpamayo/quantize.py || true rg -n "^logger\\b|logger\\s*=|getLogger" examples/alpamayo/quantize.py || true sed -n '380,520p' examples/alpamayo/quantize.py

Repository: NVIDIA/Model-Optimizer

Length of output: 10148

Unconditional .item()-based tensor stats in AutoQuantize hot loop trigger CUDA syncs (even when args.debug is false).

forward_step() and loss_func() print every iteration and call .item() on torch.isfinite(...).all(), min()/max()/mean(), and loss, forcing per-batch CPU-GPU synchronization. Gate these prints behind args.debug (and use rank-aware logging if distributed).

⚡ Proposed fix

def forward_step(runtime_model, data): with torch.autocast("cuda", dtype=torch.bfloat16): out = runtime_model.teacher_forced_flow_loss_forward(data=data) - v_pred, v_target = out["v_pred"], out["v_target"] - print( - f"[autoquant-fwd] v_pred: finite={torch.isfinite(v_pred).all().item()} " - f"min={v_pred.min().item():.4g} max={v_pred.max().item():.4g} " - f"abs_mean={v_pred.abs().mean().item():.4g} | " - f"v_target: finite={torch.isfinite(v_target).all().item()} " - f"min={v_target.min().item():.4g} max={v_target.max().item():.4g}" - ) + if args.debug: + v_pred, v_target = out["v_pred"], out["v_target"] + if not bool(torch.isfinite(v_pred).all() and torch.isfinite(v_target).all()): + logger.warning("[autoquant-fwd] Non-finite tensor detected.") return out def loss_func(output, batch): loss = torch.nn.functional.mse_loss(output["v_pred"], output["v_target"]) - print(f"[autoquant-loss] loss={loss.item():.6g} finite={torch.isfinite(loss).item()}") + if args.debug and not bool(torch.isfinite(loss)): + logger.warning("[autoquant-loss] Non-finite loss detected.") return loss

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/alpamayo/quantize.py` around lines 500 - 512, The prints in the hot loop (inside forward_step's tensor-stat print and loss_func) call .item() on tensors every iteration, causing unwanted CUDA synchronizations; wrap these debug prints so they only run when args.debug (and optionally only on the main rank) is true, and compute .item() values only inside that conditional; update the debug logging in forward_step (the v_pred/v_target print) and in loss_func (loss print) to be gated by args.debug and a rank check (e.g., only on rank 0 or when not distributed) so normal runs avoid per-batch .item() calls and GPU syncs.

codecov · 2026-06-01T23:03:14Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.60%. Comparing base (f0d2237) to head (6cc978e).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1594      +/-   ##
==========================================
+ Coverage   73.23%   73.60%   +0.36%     
==========================================
  Files         479      479              
  Lines       52435    52435              
==========================================
+ Hits        38401    38595     +194     
+ Misses      14034    13840     -194

Flag	Coverage Δ
examples	`41.67% <ø> (+0.85%)`	⬆️
unit	`53.62% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

cjluo-nv

Bot review — DM the bot to share feedback.

New ~700-line example. A few real concerns:

Likely bug — backslash-escaped special tokens. In create_message, the f-strings use <\|traj_history_start\|>, <\|traj_history\|>, <\|traj_history_end\|>, <\|cot_start\|>. In Python \| is not a recognized escape sequence, so the literal string contains a backslash before each pipe. Qwen-family special tokens are <|...|> without backslashes — passing the backslash form to apply_chat_template will not match the tokenizer's added special tokens and these will tokenize as ordinary text, silently corrupting calibration. Looks like a copy-paste from a markdown-escaped source. (Bonus: these will start emitting SyntaxWarning: invalid escape sequence on newer Python.)
Reaches into private API instead of using the public one. enable_huggingface_checkpointing_patch() re-implements mto.enable_huggingface_checkpointing() by importing the private _LIBRARY_CLASSES_FOR_PATCHING, _PATCHED_CLASSES, and patch_pretrained_methods from modelopt.torch.opt.plugins.huggingface, just to skip the _from_config patch. If the only goal is to skip _from_config, please make that an option on the public API (or document why this divergence is needed) instead of duplicating the loop with private symbols. Also, the file does this at import time as a top-level side effect, which is surprising.
Monkey-patching AlpamayoR1.teacher_forced_flow_loss_forward at module import time is acknowledged in a docstring but it's a strong coupling: the body is a verbatim copy of an internal training-fork method, and any drift in nvlabs/alpamayo (e.g. action_in_proj signature, expert_non_causal_attention, action_space API) will silently produce wrong calibration. At minimum: assert that the public AlpamayoR1 exposes the methods/attrs this body relies on, and pin a known-good Alpamayo SHA in the README setup section instead of git clone of main.
read_clip_ids_from_parquet doesn't match its docstring. The docstring says it "tries common column names; falls back to index if needed", but the implementation hard-codes cols_lower["key"] and will KeyError on any parquet without a literal key column. Either implement the documented fallback or simplify the docstring.
Binary parquet checked into the repo. A .parquet calibration set is being added to examples/alpamayo/. The repo doesn't appear to use git-LFS (.gitattributes is missing). Even if this file is small, distributing calibration data through the example tree is unusual for this repo — most examples download from HF on demand. Please confirm the file size is small and that there are no licensing/redistribution restrictions on the AV calibration clips list, or fetch it at runtime instead.
No tests / no CHANGELOG entry. The PR template's test and changelog checkboxes are still unfilled. New examples in this repo (vlm_ptq, etc.) typically don't get unit tests, but at least a CHANGELOG entry under "New Features" is expected.
Minor: args.parquet default is the bare filename, joined to script_dir; if the user passes an absolute path it works (pathlib handles that), but a plain relative path in the user's CWD will be silently resolved against the script's directory, which is surprising. Consider documenting this or using Path(args.parquet) as-is when absolute/exists, else fall back to script_dir / args.parquet.

cjluo-nv · 2026-06-01T23:16:44Z

+        {
+            "role": "system",
+            "content": [
+                {


Bot comment.

These tokens contain literal backslashes (<\|traj_history_start\|>, etc.) — \| is not an escape sequence in Python, so the string literal keeps the backslash. Qwen-family special tokens are <|traj_history_start|> (no backslash); passing the backslash form to apply_chat_template will tokenize them as ordinary text and silently break calibration. Please drop the backslashes (and on 3.12+ this also avoids the SyntaxWarning: invalid escape sequence).

cjluo-nv · 2026-06-01T23:16:44Z

+    """Recursively cast data into the specified device, dtype."""
+    if isinstance(data, torch.Tensor):
+        data = data.to(
+            device=device,


Bot comment.

This duplicates mto.enable_huggingface_checkpointing() and pulls in three private symbols (_LIBRARY_CLASSES_FOR_PATCHING, _PATCHED_CLASSES, patch_pretrained_methods) just to skip the _from_config patch. Please use the public API, or — if the _from_config patch genuinely breaks AlpamayoR1 — add a public option to skip it rather than reaching into module internals from an example. Also, calling this at top-level import is a surprising side effect; prefer calling it from main().

cjluo-nv · 2026-06-01T23:16:44Z

+
+
+def patch_teacher_forced_flow_loss_forward() -> None:
+    """Attach teacher_forced_flow_loss_forward to AlpamayoR1 if missing.


Bot comment.

Module-level monkey-patching of an upstream class is fragile: the body assumes specific attrs/methods on AlpamayoR1 (action_in_proj signature, expert_non_causal_attention, fuse_traj_tokens, action_space.traj_to_action, action_space.get_action_space_dims, vlm.model.rope_deltas, etc.). If the public nvlabs/alpamayo drifts, this will silently produce wrong gradients/loss for AutoQuantize. Please (a) pin an Alpamayo SHA in the README setup, and (b) add explicit hasattr assertions before patching so a drift fails loudly.

cjluo-nv · 2026-06-01T23:16:44Z

+    max_generation_length: int,
+    calibration_traj_samples: int,
+    device: str,
+):


Bot comment.

Docstring says "Tries common column names; falls back to index if needed", but the code only handles a literal key column and will KeyError on anything else. Either implement the fallback (e.g. try clip_id/clipid/index) or rewrite the docstring to match the actual behavior.

cjluo-nv · 2026-06-01T23:16:45Z

+from modelopt.torch.export.quant_utils import get_quant_config
+from modelopt.torch.opt.plugins.huggingface import (
+    _LIBRARY_CLASSES_FOR_PATCHING,
+    _PATCHED_CLASSES,


Bot comment.

assert torch.ops.tensorrt.quantize_op.default inside a try/except that only logs a warning on failure: this won't actually stop a broken install, and the assert form is also unusual here. Consider a plain attribute check + clearer error message, or remove this block (mtq.quantize will fail loudly later anyway).

cjluo-nv · 2026-06-01T23:16:45Z

+                return_tensors="pt",
+            )
+            model_inputs = {
+                "tokenized_data": inputs,


Bot comment.

disabled_layers here uses fnmatch wildcards (per auto_quantize docs), but the literal module names appended below (_name) are not wildcards. The corresponding exclusion in quantize_model uses f"{_name}.*" — please be consistent. If you want exact-name matching here, prepend/append * or document the difference; otherwise an unwrapped name like vlm.foo.bar may not match the way you expect against quantizer names like vlm.foo.bar.weight_quantizer.

cjluo-nv · 2026-06-01T23:16:45Z

+
+```bash
+git clone https://github.com/nvlabs/alpamayo
+pip install ./alpamayo


Bot comment.

Setup says git clone https://github.com/nvlabs/alpamayo without pinning a commit. Given that this example monkey-patches an internal method onto AlpamayoR1, please pin a known-good SHA so future upstream changes don't silently break calibration.

rohansjoshi added 3 commits June 1, 2026 22:45

First commit

d96a411

Signed-off-by: Rohan Joshi <rohjoshi@nvidia.com>

Fixed real quant path

579a10d

Signed-off-by: Rohan Joshi <rohjoshi@nvidia.com>

Added README and calibration clips

6cc978e

Signed-off-by: Rohan Joshi <rohjoshi@nvidia.com>

rohansjoshi requested a review from a team as a code owner June 1, 2026 22:49

rohansjoshi requested a review from jingyu-ml June 1, 2026 22:49

rohansjoshi requested a review from cjluo-nv June 1, 2026 22:50

coderabbitai Bot reviewed Jun 1, 2026

View reviewed changes

cjluo-nv reviewed Jun 1, 2026

View reviewed changes



		def patch_teacher_forced_flow_loss_forward() -> None:
		"""Attach teacher_forced_flow_loss_forward to AlpamayoR1 if missing.

Conversation

rohansjoshi commented Jun 1, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

cjluo-nv Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

cjluo-nv Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

cjluo-nv Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

cjluo-nv Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

cjluo-nv Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

cjluo-nv Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

cjluo-nv Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rohansjoshi commented Jun 1, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading

codecov Bot commented Jun 1, 2026 •

edited

Loading