Skip to content

Commit f69fc7a

Browse files
authored
Add support and documentation for AnyModel checkpoints with Nemo evaluator (#894)
This PR adds Nemo Evaluator support to the AnyModel branch. It includes documentation and a deployment script that allow for evaluation of AnyModel Puzzletron checkpoints with Nemo Evaluator. We assume development on a GPU node, following the current tutorial style, so we don't rely on Slurm-based deployment/evaluation, but instead use direct evaluation via `eval-factory run_eval`. --------- Signed-off-by: jrausch <jrausch@nvidia.com>
1 parent d918c59 commit f69fc7a

4 files changed

Lines changed: 758 additions & 9 deletions

File tree

.pre-commit-config.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,7 @@ repos:
109109
examples/speculative_decoding/main.py|
110110
examples/speculative_decoding/medusa_utils.py|
111111
examples/speculative_decoding/server_generate.py|
112+
examples/puzzletron/evaluation/hf_deployable_anymodel\.py|
112113
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_.*\.py|
113114
)$
114115

examples/puzzletron/README.md

Lines changed: 30 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -15,11 +15,11 @@ In this example, we compress the [Llama-3.1-8B-Instruct](https://huggingface.co/
1515
1616
## Environment
1717

18-
- Install Model-Optimizer in editable mode with the corresponding dependencies:
18+
- Install Model-Optimizer in editable mode with the corresponding dependencies (run from the repo root):
1919

2020
```bash
2121
pip install -e .[hf,puzzletron]
22-
pip install -r requirements.txt
22+
pip install -r examples/puzzletron/requirements.txt
2323
```
2424

2525
- For this example we are using 2x NVIDIA H100 80GB HBM3 to show multi-GPU steps. You can use also use s single GPU.
@@ -229,16 +229,38 @@ The plot shows how token accuracy changes with different compression rates. High
229229
230230
## Evaluation
231231
232-
Once the model is ready, you can evaluate it using [Language Model Evaluation Harness](https://pypi.org/project/lm-eval/). For example, run the following to evaluate the model on [Massive Multitask Language Understanding](https://huggingface.co/datasets/cais/mmlu) benchmark.
232+
Evaluate AnyModel checkpoints by deploying a local OpenAI-compatible completions endpoint and running benchmarks against it.
233+
234+
**1. Deploy the model (2 GPUs example):**
235+
236+
```bash
237+
# Install the AnyModel-patched deployable (first time only: backs up the original)
238+
# /opt/Export-Deploy is the default path in NeMo containers — adjust if needed
239+
cp /opt/Export-Deploy/nemo_deploy/llm/hf_deployable.py /opt/Export-Deploy/nemo_deploy/llm/hf_deployable.py.bak
240+
cp examples/puzzletron/evaluation/hf_deployable_anymodel.py /opt/Export-Deploy/nemo_deploy/llm/hf_deployable.py
241+
242+
# Start the server (blocks while running — use a separate terminal)
243+
ray start --head --num-gpus 2 --port 6379 --disable-usage-stats
244+
python /opt/Export-Deploy/scripts/deploy/nlp/deploy_ray_hf.py \
245+
--model_path path/to/checkpoint \
246+
--model_id anymodel-hf \
247+
--num_gpus 2 --num_gpus_per_replica 2 --num_cpus_per_replica 16 \
248+
--trust_remote_code --port 8083 --device_map "auto" --cuda_visible_devices "0,1"
249+
```
250+
251+
**2. Run MMLU:**
233252
234253
```bash
235-
lm_eval --model hf \
236-
--model_args pretrained=path/to/model,dtype=bfloat16,trust_remote_code=true,parallelize=True \
237-
--tasks mmlu \
238-
--num_fewshot 5 \
239-
--batch_size 4
254+
eval-factory run_eval \
255+
--eval_type mmlu \
256+
--model_id anymodel-hf \
257+
--model_type completions \
258+
--model_url http://0.0.0.0:8083/v1/completions/ \
259+
--output_dir examples/puzzletron/evals/mmlu_anymodel
240260
```
241261
262+
For a quick debug run, add `--overrides "config.params.limit_samples=5"`.
263+
242264
## Inference Performance Benchmarking
243265
244266
Now let's evaluate how much speedup we get with the compressed model in terms of throughput and latency.

0 commit comments

Comments
 (0)