Skip to content

Add Alpamayo-1 example#1594

Open
rohansjoshi wants to merge 3 commits into
mainfrom
rohjoshi/alpamayo-example
Open

Add Alpamayo-1 example#1594
rohansjoshi wants to merge 3 commits into
mainfrom
rohjoshi/alpamayo-example

Conversation

@rohansjoshi
Copy link
Copy Markdown
Contributor

@rohansjoshi rohansjoshi commented Jun 1, 2026

What does this PR do?

Type of change: ? New example

Adds example for Alpamayo-1 quantization with ModelOpt (FP8, NVFP4, AutoQuant)

Usage

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅ / ❌ / N/A
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
  • Did you write any new necessary tests?: ✅ / ❌ / N/A
  • Did you update Changelog?: ✅ / ❌ / N/A
  • Did you get Claude approval on this PR?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

  • Documentation

    • Added comprehensive guide for Alpamayo model quantization, including setup and usage instructions.
  • New Features

    • Quantization tool supporting multiple compression formats for optimized model deployment.
    • Automatic quantization mode with intelligent per-layer format selection.
    • Calibration dataset preparation workflow from parquet data sources.

Signed-off-by: Rohan Joshi <rohjoshi@nvidia.com>
Signed-off-by: Rohan Joshi <rohjoshi@nvidia.com>
Signed-off-by: Rohan Joshi <rohjoshi@nvidia.com>
@rohansjoshi rohansjoshi requested a review from a team as a code owner June 1, 2026 22:49
@rohansjoshi rohansjoshi requested a review from jingyu-ml June 1, 2026 22:49
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds a complete Alpamayo-R1 quantization example for ModelOpt, including documentation and a 688-line CLI script that supports both post-training quantization (FP8/NVFP4) and AutoQuantize search modes with joint VLM+diffusion calibration.

Changes

Alpamayo-R1 Quantization Example

Layer / File(s) Summary
Documentation and overview
examples/alpamayo/README.md
README documenting the Alpamayo 1 quantization example, calibration dataset expectations, fake-quant vs real-quant modes, precision exceptions (vision tower, action-projection heads), and AutoQuantize behavior with effective-bits budgeting.
Script initialization and utilities
examples/alpamayo/quantize.py
Module scaffolding, global constants for processor pixel constraints, and utility functions for chat message construction, processor configuration, and recursive device/dtype casting of nested data structures.
Calibration infrastructure and patching
examples/alpamayo/quantize.py
HuggingFace checkpointing patch enabling ModelOpt state restoration during from_pretrained/save_pretrained, and monkeypatch of teacher_forced_flow_loss_forward that computes diffusion-style calibration outputs (v_pred/v_target) using cached VLM prompt KV and non-autoregressive expert prediction.
Calibration data pipeline
examples/alpamayo/quantize.py
Joint calibration forward loop that iterates over clip IDs, loads frame/history/future tensors from parquet dataset, applies processor chat template, moves inputs to target device, and runs VLM rollout trajectory sampling; includes parquet clip ID reader with deduplication and column name flexibility.
Quantization implementations
examples/alpamayo/quantize.py
Two quantization modes: ModelOpt PTQ with FP8/NVFP4 configs, vision tower and dimension-based exclusions for real-quant GEMM constraints, and configurable calibration loop; and ModelOpt AutoQuantize search that runs differentiable forward passes, minimizes MSE velocity field loss between predicted and target, and respects layer exclusions under effective-bits constraint.
CLI orchestration and checkpoint export
examples/alpamayo/quantize.py
Main CLI entry point that parses arguments for model checkpoint, quantization mode (fp8/nvfp4/auto), and calibration parameters; loads model and processor, reads calibration clip IDs from parquet with truncation, selects PTQ or AutoQuantize path (building joint calibration loop when needed), saves quantized model with processor/config, optionally compresses to packed real-quant weights via mtq.compress, and writes hf_quant_config.json.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add Alpamayo-1 example' directly and clearly describes the main change: adding a new example for Alpamayo-1 quantization including README and quantize.py script.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed PR adds only README and quantize.py example. No critical security anti-patterns found: no unsafe torch.load, numpy.load, trust_remote_code, eval/exec, nosec comments, or new PIP dependencies.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch rohjoshi/alpamayo-example

Comment @coderabbitai help to get the list of available commands and usage tips.

@rohansjoshi rohansjoshi requested a review from cjluo-nv June 1, 2026 22:50
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 3

🧹 Nitpick comments (1)
examples/alpamayo/quantize.py (1)

655-681: ⚡ Quick win

Guard checkpoint writes to rank 0 for distributed-safe execution.

Current save/write operations are unconditional; in multi-rank runs, each rank may write the same files concurrently.

🛡️ Proposed fix
+    is_rank0 = (
+        not torch.distributed.is_available()
+        or not torch.distributed.is_initialized()
+        or torch.distributed.get_rank() == 0
+    )
@@
-    os.makedirs(args.output_dir, exist_ok=True)
-    print(f"Saving quantized checkpoint to {args.output_dir!r} ...")
+    if is_rank0:
+        os.makedirs(args.output_dir, exist_ok=True)
+        print(f"Saving quantized checkpoint to {args.output_dir!r} ...")
@@
-        with torch.inference_mode():
-            model.save_pretrained(args.output_dir)
-        processor.save_pretrained(args.output_dir)
-        model.config.save_pretrained(args.output_dir)
+        if is_rank0:
+            with torch.inference_mode():
+                model.save_pretrained(args.output_dir)
+            processor.save_pretrained(args.output_dir)
+            model.config.save_pretrained(args.output_dir)
@@
-        with torch.inference_mode():
-            model.save_pretrained(args.output_dir)
-
-        processor.save_pretrained(args.output_dir)
-        model.config.save_pretrained(args.output_dir)
-
-        quant_cfg = get_quant_config(model)
-        with open(os.path.join(args.output_dir, "hf_quant_config.json"), "w") as f:
-            json.dump(quant_cfg, f)
+        if is_rank0:
+            with torch.inference_mode():
+                model.save_pretrained(args.output_dir)
+            processor.save_pretrained(args.output_dir)
+            model.config.save_pretrained(args.output_dir)
+            quant_cfg = get_quant_config(model)
+            with open(os.path.join(args.output_dir, "hf_quant_config.json"), "w") as f:
+                json.dump(quant_cfg, f)
+
+    if torch.distributed.is_available() and torch.distributed.is_initialized():
+        torch.distributed.barrier()

As per coding guidelines "Develop with distributed processing in mind ... guarding shared side effects such as file writes ... against race conditions between ranks."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/alpamayo/quantize.py` around lines 655 - 681, The saves under both
the real_quant and else branches (mtq.compress + model.save_pretrained,
processor.save_pretrained, model.config.save_pretrained, and writing
hf_quant_config.json from get_quant_config) must be guarded so only the main
process/rank 0 performs file writes in distributed runs; detect distributed mode
via torch.distributed.is_initialized() and torch.distributed.get_rank() (or the
project’s existing is_main_process helper) and wrap all save_pretrained calls
and the json file write in a conditional that only runs on rank 0 to avoid
concurrent writes to args.output_dir.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/alpamayo/quantize.py`:
- Around line 346-355: The function read_clip_ids_from_parquet currently indexes
cols_lower["key"] which can KeyError; update it to search cols_lower for common
column names (e.g., "key", "clip_id", "clipid", "id", "filename") and use the
first match if present, otherwise fall back to using df.index.astype(str). After
obtaining the list, convert to strings and deduplicate while preserving first
occurrence order (use a seen set/ordered iteration); reference
read_clip_ids_from_parquet and the cols_lower variable to locate the change.
- Around line 27-55: Add an explicit __all__ list at module top to define the
public API; create a single-line assignment __all__ = [...] after the imports
that enumerates the module's public helpers and re-exports (e.g.,
"load_physical_aiavdataset", "AlpamayoR1", "to_special_token",
"create_forward_loop", "get_dataset_dataloader", "export_hf_checkpoint",
"get_quant_config", "mtq", "patch_pretrained_methods", "logger" — adjust to the
actual names this module exposes) and keep it updated when symbols change so
star-import behavior is explicit.
- Around line 500-512: The prints in the hot loop (inside forward_step's
tensor-stat print and loss_func) call .item() on tensors every iteration,
causing unwanted CUDA synchronizations; wrap these debug prints so they only run
when args.debug (and optionally only on the main rank) is true, and compute
.item() values only inside that conditional; update the debug logging in
forward_step (the v_pred/v_target print) and in loss_func (loss print) to be
gated by args.debug and a rank check (e.g., only on rank 0 or when not
distributed) so normal runs avoid per-batch .item() calls and GPU syncs.

---

Nitpick comments:
In `@examples/alpamayo/quantize.py`:
- Around line 655-681: The saves under both the real_quant and else branches
(mtq.compress + model.save_pretrained, processor.save_pretrained,
model.config.save_pretrained, and writing hf_quant_config.json from
get_quant_config) must be guarded so only the main process/rank 0 performs file
writes in distributed runs; detect distributed mode via
torch.distributed.is_initialized() and torch.distributed.get_rank() (or the
project’s existing is_main_process helper) and wrap all save_pretrained calls
and the json file write in a conditional that only runs on rank 0 to avoid
concurrent writes to args.output_dir.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 81978cc3-c66c-4b24-8be8-fe303abf7d0d

📥 Commits

Reviewing files that changed from the base of the PR and between 905259f and 6cc978e.

⛔ Files ignored due to path filters (1)
  • examples/alpamayo/0417_16rows_train_set_for_calibration_25.10.parquet is excluded by !**/*.parquet
📒 Files selected for processing (2)
  • examples/alpamayo/README.md
  • examples/alpamayo/quantize.py

Comment on lines +27 to +55
import argparse
import collections.abc
import copy
import json
import logging
import os
from pathlib import Path
from typing import Any

import einops
import pandas as pd
import torch
from alpamayo_r1.load_physical_aiavdataset import load_physical_aiavdataset
from alpamayo_r1.models.alpamayo_r1 import AlpamayoR1
from alpamayo_r1.models.token_utils import to_special_token
from tqdm import tqdm
from transformers import AutoProcessor, AutoTokenizer

import modelopt.torch.quantization as mtq
from modelopt.torch.export import export_hf_checkpoint
from modelopt.torch.export.quant_utils import get_quant_config
from modelopt.torch.opt.plugins.huggingface import (
_LIBRARY_CLASSES_FOR_PATCHING,
_PATCHED_CLASSES,
patch_pretrained_methods,
)
from modelopt.torch.utils.dataset_utils import create_forward_loop, get_dataset_dataloader

logger = logging.getLogger(__name__)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add explicit module exports with __all__.

This module introduces several public helpers but does not define __all__, which makes star-import behavior implicit.

♻️ Proposed fix
+__all__ = [
+    "create_message",
+    "get_processor",
+    "to_device",
+    "read_clip_ids_from_parquet",
+    "quantize_model",
+    "auto_quantize_model",
+    "main",
+]

As per coding guidelines "**/*.py: Define the public API with __all__ at the top of each Python module..."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import argparse
import collections.abc
import copy
import json
import logging
import os
from pathlib import Path
from typing import Any
import einops
import pandas as pd
import torch
from alpamayo_r1.load_physical_aiavdataset import load_physical_aiavdataset
from alpamayo_r1.models.alpamayo_r1 import AlpamayoR1
from alpamayo_r1.models.token_utils import to_special_token
from tqdm import tqdm
from transformers import AutoProcessor, AutoTokenizer
import modelopt.torch.quantization as mtq
from modelopt.torch.export import export_hf_checkpoint
from modelopt.torch.export.quant_utils import get_quant_config
from modelopt.torch.opt.plugins.huggingface import (
_LIBRARY_CLASSES_FOR_PATCHING,
_PATCHED_CLASSES,
patch_pretrained_methods,
)
from modelopt.torch.utils.dataset_utils import create_forward_loop, get_dataset_dataloader
logger = logging.getLogger(__name__)
import argparse
import collections.abc
import copy
import json
import logging
import os
from pathlib import Path
from typing import Any
import einops
import pandas as pd
import torch
from alpamayo_r1.load_physical_aiavdataset import load_physical_aiavdataset
from alpamayo_r1.models.alpamayo_r1 import AlpamayoR1
from alpamayo_r1.models.token_utils import to_special_token
from tqdm import tqdm
from transformers import AutoProcessor, AutoTokenizer
import modelopt.torch.quantization as mtq
from modelopt.torch.export import export_hf_checkpoint
from modelopt.torch.export.quant_utils import get_quant_config
from modelopt.torch.opt.plugins.huggingface import (
_LIBRARY_CLASSES_FOR_PATCHING,
_PATCHED_CLASSES,
patch_pretrained_methods,
)
from modelopt.torch.utils.dataset_utils import create_forward_loop, get_dataset_dataloader
__all__ = [
"create_message",
"get_processor",
"to_device",
"read_clip_ids_from_parquet",
"quantize_model",
"auto_quantize_model",
"main",
]
logger = logging.getLogger(__name__)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/alpamayo/quantize.py` around lines 27 - 55, Add an explicit __all__
list at module top to define the public API; create a single-line assignment
__all__ = [...] after the imports that enumerates the module's public helpers
and re-exports (e.g., "load_physical_aiavdataset", "AlpamayoR1",
"to_special_token", "create_forward_loop", "get_dataset_dataloader",
"export_hf_checkpoint", "get_quant_config", "mtq", "patch_pretrained_methods",
"logger" — adjust to the actual names this module exposes) and keep it updated
when symbols change so star-import behavior is explicit.

Comment on lines +346 to +355
def read_clip_ids_from_parquet(parquet_path: str) -> list[str]:
"""
Reads clip_ids from parquet. Tries common column names; falls back to index if needed.
Returns clip_ids as a list of strings (unique, preserving first occurrence order).
"""
parquet_path = str(parquet_path)
df = pd.read_parquet(parquet_path)
cols_lower = {c.lower(): c for c in df.columns}
clip_ids = df[cols_lower["key"]].astype(str).tolist()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

read_clip_ids_from_parquet can raise KeyError despite documented fallbacks.

Line 354 directly indexes cols_lower["key"], but the docstring says common names and index fallback are handled. This can hard-fail on valid parquet schemas.

🐛 Proposed fix
 def read_clip_ids_from_parquet(parquet_path: str) -> list[str]:
@@
     parquet_path = str(parquet_path)
     df = pd.read_parquet(parquet_path)
     cols_lower = {c.lower(): c for c in df.columns}
-    clip_ids = df[cols_lower["key"]].astype(str).tolist()
+    candidate_cols = ("key", "clip_id", "clipid", "id")
+    selected_col = next((cols_lower[c] for c in candidate_cols if c in cols_lower), None)
+    if selected_col is not None:
+        clip_ids = df[selected_col].astype(str).tolist()
+    else:
+        clip_ids = df.index.astype(str).tolist()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/alpamayo/quantize.py` around lines 346 - 355, The function
read_clip_ids_from_parquet currently indexes cols_lower["key"] which can
KeyError; update it to search cols_lower for common column names (e.g., "key",
"clip_id", "clipid", "id", "filename") and use the first match if present,
otherwise fall back to using df.index.astype(str). After obtaining the list,
convert to strings and deduplicate while preserving first occurrence order (use
a seen set/ordered iteration); reference read_clip_ids_from_parquet and the
cols_lower variable to locate the change.

Comment on lines +500 to +512
print(
f"[autoquant-fwd] v_pred: finite={torch.isfinite(v_pred).all().item()} "
f"min={v_pred.min().item():.4g} max={v_pred.max().item():.4g} "
f"abs_mean={v_pred.abs().mean().item():.4g} | "
f"v_target: finite={torch.isfinite(v_target).all().item()} "
f"min={v_target.min().item():.4g} max={v_target.max().item():.4g}"
)
return out

def loss_func(output, batch):
loss = torch.nn.functional.mse_loss(output["v_pred"], output["v_target"])
print(f"[autoquant-loss] loss={loss.item():.6g} finite={torch.isfinite(loss).item()}")
return loss
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail
ls -la
fd -t f "examples/alpamayo/quantize.py" . || true
sed -n '450,560p' examples/alpamayo/quantize.py
rg -n "args\.debug|logger\.|print_rank_0|warn_rank_0|autoquant-fwd|\[autoquant-loss\]" examples/alpamayo/quantize.py
rg -n "def forward_step|def loss_func" examples/alpamayo/quantize.py

Repository: NVIDIA/Model-Optimizer

Length of output: 7167


🏁 Script executed:

#!/bin/bash
set -euo pipefail
sed -n '480,530p' examples/alpamayo/quantize.py
sed -n '530,590p' examples/alpamayo/quantize.py
rg -n "autoquant-fwd|autoquant-loss|print\(|logger\.|print_rank_0|warn_rank_0|args\.debug" examples/alpamayo/quantize.py
rg -n "def forward_step|def loss_func" examples/alpamayo/quantize.py

Repository: NVIDIA/Model-Optimizer

Length of output: 5442


🏁 Script executed:

sed -n '1,120p' examples/alpamayo/quantize.py
rg -n "print_rank_0|warn_rank_0" examples/alpamayo/quantize.py || true
rg -n "^logger\\b|logger\\s*=|getLogger" examples/alpamayo/quantize.py || true
sed -n '380,520p' examples/alpamayo/quantize.py

Repository: NVIDIA/Model-Optimizer

Length of output: 10148


Unconditional .item()-based tensor stats in AutoQuantize hot loop trigger CUDA syncs (even when args.debug is false).

forward_step() and loss_func() print every iteration and call .item() on torch.isfinite(...).all(), min()/max()/mean(), and loss, forcing per-batch CPU-GPU synchronization. Gate these prints behind args.debug (and use rank-aware logging if distributed).

⚡ Proposed fix
     def forward_step(runtime_model, data):
         with torch.autocast("cuda", dtype=torch.bfloat16):
             out = runtime_model.teacher_forced_flow_loss_forward(data=data)
-        v_pred, v_target = out["v_pred"], out["v_target"]
-        print(
-            f"[autoquant-fwd] v_pred: finite={torch.isfinite(v_pred).all().item()} "
-            f"min={v_pred.min().item():.4g} max={v_pred.max().item():.4g} "
-            f"abs_mean={v_pred.abs().mean().item():.4g} | "
-            f"v_target: finite={torch.isfinite(v_target).all().item()} "
-            f"min={v_target.min().item():.4g} max={v_target.max().item():.4g}"
-        )
+        if args.debug:
+            v_pred, v_target = out["v_pred"], out["v_target"]
+            if not bool(torch.isfinite(v_pred).all() and torch.isfinite(v_target).all()):
+                logger.warning("[autoquant-fwd] Non-finite tensor detected.")
         return out
 
     def loss_func(output, batch):
         loss = torch.nn.functional.mse_loss(output["v_pred"], output["v_target"])
-        print(f"[autoquant-loss] loss={loss.item():.6g} finite={torch.isfinite(loss).item()}")
+        if args.debug and not bool(torch.isfinite(loss)):
+            logger.warning("[autoquant-loss] Non-finite loss detected.")
         return loss
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/alpamayo/quantize.py` around lines 500 - 512, The prints in the hot
loop (inside forward_step's tensor-stat print and loss_func) call .item() on
tensors every iteration, causing unwanted CUDA synchronizations; wrap these
debug prints so they only run when args.debug (and optionally only on the main
rank) is true, and compute .item() values only inside that conditional; update
the debug logging in forward_step (the v_pred/v_target print) and in loss_func
(loss print) to be gated by args.debug and a rank check (e.g., only on rank 0 or
when not distributed) so normal runs avoid per-batch .item() calls and GPU
syncs.

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.60%. Comparing base (f0d2237) to head (6cc978e).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1594      +/-   ##
==========================================
+ Coverage   73.23%   73.60%   +0.36%     
==========================================
  Files         479      479              
  Lines       52435    52435              
==========================================
+ Hits        38401    38595     +194     
+ Misses      14034    13840     -194     
Flag Coverage Δ
examples 41.67% <ø> (+0.85%) ⬆️
unit 53.62% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Collaborator

@cjluo-nv cjluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

New ~700-line example. A few real concerns:

  1. Likely bug — backslash-escaped special tokens. In create_message, the f-strings use <\|traj_history_start\|>, <\|traj_history\|>, <\|traj_history_end\|>, <\|cot_start\|>. In Python \| is not a recognized escape sequence, so the literal string contains a backslash before each pipe. Qwen-family special tokens are <|...|> without backslashes — passing the backslash form to apply_chat_template will not match the tokenizer's added special tokens and these will tokenize as ordinary text, silently corrupting calibration. Looks like a copy-paste from a markdown-escaped source. (Bonus: these will start emitting SyntaxWarning: invalid escape sequence on newer Python.)

  2. Reaches into private API instead of using the public one. enable_huggingface_checkpointing_patch() re-implements mto.enable_huggingface_checkpointing() by importing the private _LIBRARY_CLASSES_FOR_PATCHING, _PATCHED_CLASSES, and patch_pretrained_methods from modelopt.torch.opt.plugins.huggingface, just to skip the _from_config patch. If the only goal is to skip _from_config, please make that an option on the public API (or document why this divergence is needed) instead of duplicating the loop with private symbols. Also, the file does this at import time as a top-level side effect, which is surprising.

  3. Monkey-patching AlpamayoR1.teacher_forced_flow_loss_forward at module import time is acknowledged in a docstring but it's a strong coupling: the body is a verbatim copy of an internal training-fork method, and any drift in nvlabs/alpamayo (e.g. action_in_proj signature, expert_non_causal_attention, action_space API) will silently produce wrong calibration. At minimum: assert that the public AlpamayoR1 exposes the methods/attrs this body relies on, and pin a known-good Alpamayo SHA in the README setup section instead of git clone of main.

  4. read_clip_ids_from_parquet doesn't match its docstring. The docstring says it "tries common column names; falls back to index if needed", but the implementation hard-codes cols_lower["key"] and will KeyError on any parquet without a literal key column. Either implement the documented fallback or simplify the docstring.

  5. Binary parquet checked into the repo. A .parquet calibration set is being added to examples/alpamayo/. The repo doesn't appear to use git-LFS (.gitattributes is missing). Even if this file is small, distributing calibration data through the example tree is unusual for this repo — most examples download from HF on demand. Please confirm the file size is small and that there are no licensing/redistribution restrictions on the AV calibration clips list, or fetch it at runtime instead.

  6. No tests / no CHANGELOG entry. The PR template's test and changelog checkboxes are still unfilled. New examples in this repo (vlm_ptq, etc.) typically don't get unit tests, but at least a CHANGELOG entry under "New Features" is expected.

  7. Minor: args.parquet default is the bare filename, joined to script_dir; if the user passes an absolute path it works (pathlib handles that), but a plain relative path in the user's CWD will be silently resolved against the script's directory, which is surprising. Consider documenting this or using Path(args.parquet) as-is when absolute/exists, else fall back to script_dir / args.parquet.

{
"role": "system",
"content": [
{
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot comment.

These tokens contain literal backslashes (<\|traj_history_start\|>, etc.) — \| is not an escape sequence in Python, so the string literal keeps the backslash. Qwen-family special tokens are <|traj_history_start|> (no backslash); passing the backslash form to apply_chat_template will tokenize them as ordinary text and silently break calibration. Please drop the backslashes (and on 3.12+ this also avoids the SyntaxWarning: invalid escape sequence).

"""Recursively cast data into the specified device, dtype."""
if isinstance(data, torch.Tensor):
data = data.to(
device=device,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot comment.

This duplicates mto.enable_huggingface_checkpointing() and pulls in three private symbols (_LIBRARY_CLASSES_FOR_PATCHING, _PATCHED_CLASSES, patch_pretrained_methods) just to skip the _from_config patch. Please use the public API, or — if the _from_config patch genuinely breaks AlpamayoR1 — add a public option to skip it rather than reaching into module internals from an example. Also, calling this at top-level import is a surprising side effect; prefer calling it from main().



def patch_teacher_forced_flow_loss_forward() -> None:
"""Attach teacher_forced_flow_loss_forward to AlpamayoR1 if missing.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot comment.

Module-level monkey-patching of an upstream class is fragile: the body assumes specific attrs/methods on AlpamayoR1 (action_in_proj signature, expert_non_causal_attention, fuse_traj_tokens, action_space.traj_to_action, action_space.get_action_space_dims, vlm.model.rope_deltas, etc.). If the public nvlabs/alpamayo drifts, this will silently produce wrong gradients/loss for AutoQuantize. Please (a) pin an Alpamayo SHA in the README setup, and (b) add explicit hasattr assertions before patching so a drift fails loudly.

max_generation_length: int,
calibration_traj_samples: int,
device: str,
):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot comment.

Docstring says "Tries common column names; falls back to index if needed", but the code only handles a literal key column and will KeyError on anything else. Either implement the fallback (e.g. try clip_id/clipid/index) or rewrite the docstring to match the actual behavior.

from modelopt.torch.export.quant_utils import get_quant_config
from modelopt.torch.opt.plugins.huggingface import (
_LIBRARY_CLASSES_FOR_PATCHING,
_PATCHED_CLASSES,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot comment.

assert torch.ops.tensorrt.quantize_op.default inside a try/except that only logs a warning on failure: this won't actually stop a broken install, and the assert form is also unusual here. Consider a plain attribute check + clearer error message, or remove this block (mtq.quantize will fail loudly later anyway).

return_tensors="pt",
)
model_inputs = {
"tokenized_data": inputs,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot comment.

disabled_layers here uses fnmatch wildcards (per auto_quantize docs), but the literal module names appended below (_name) are not wildcards. The corresponding exclusion in quantize_model uses f"{_name}.*" — please be consistent. If you want exact-name matching here, prepend/append * or document the difference; otherwise an unwrapped name like vlm.foo.bar may not match the way you expect against quantizer names like vlm.foo.bar.weight_quantizer.


```bash
git clone https://github.com/nvlabs/alpamayo
pip install ./alpamayo
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot comment.

Setup says git clone https://github.com/nvlabs/alpamayo without pinning a commit. Given that this example monkey-patches an internal method onto AlpamayoR1, please pin a known-good SHA so future upstream changes don't silently break calibration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants