Skip to content

Commit 7b88e66

Browse files
committed
Simplify readme
Signed-off-by: jrausch <jrausch@nvidia.com>
1 parent 0190f75 commit 7b88e66

2 files changed

Lines changed: 29 additions & 52 deletions

File tree

examples/puzzletron/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -229,10 +229,7 @@ The plot shows how token accuracy changes with different compression rates. High
229229
230230
## Evaluation
231231
232-
Evaluate AnyModel checkpoints using [lm-eval](https://github.com/EleutherAI/lm-evaluation-harness) directly — no deployment server or Ray needed. The wrapper script handles the heterogeneous layer loading automatically.
233-
234-
> **Note:** NeMo containers ship `nvidia_lm_eval`, an NVIDIA fork that occupies the same
235-
> `lm_eval` namespace. If installed, uninstall it first: `pip uninstall nvidia-lm-eval -y`
232+
Evaluate AnyModel checkpoints using [lm-eval](https://github.com/EleutherAI/lm-evaluation-harness) directly — no deployment server needed.
236233
237234
```bash
238235
python examples/puzzletron/evaluation/lm_eval_anymodel.py \
@@ -245,6 +242,9 @@ python examples/puzzletron/evaluation/lm_eval_anymodel.py \
245242
246243
For a quick smoke test, add `--limit 10`. All standard [lm-eval flags](https://github.com/EleutherAI/lm-evaluation-harness?tab=readme-ov-file#basic-usage) are supported.
247244
245+
> **Note:** NeMo containers may ship `nvidia-lm-eval` which conflicts with upstream `lm-eval`.
246+
> If so, run `pip uninstall nvidia-lm-eval -y` before installing requirements.
247+
248248
> **Alternative:** For server-based evaluation via an OpenAI-compatible endpoint,
249249
> see [evaluation/nemo_evaluator_instructions.md](./evaluation/nemo_evaluator_instructions.md).
250250

examples/puzzletron/evaluation/nemo_evaluator_instructions.md

Lines changed: 25 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -3,67 +3,44 @@
33
> **Recommended approach:** Use lm-eval for direct evaluation without a
44
> deployment server. See the main [README](../README.md#evaluation) for details.
55
6-
This document describes an alternative evaluation flow using NeMo Evaluator
7-
via `eval-factory`. It deploys the checkpoint as a local OpenAI-style completions
8-
endpoint and runs evaluation against it.
6+
This document describes an alternative evaluation flow using NeMo Evaluator.
7+
It deploys the checkpoint as a local OpenAI-compatible completions endpoint
8+
and runs evaluation against it.
99

1010
## Prerequisites
1111

12-
- NeMo container (e.g. `nemo:25.11`) NeMo Evaluator and NeMo Export-Deploy
13-
- The AnyModel deploy patch: `examples/puzzletron/evaluation/hf_deployable_anymodel.py`
12+
- NeMo container (e.g. `nemo:25.11`) with NeMo Evaluator and NeMo Export-Deploy
13+
- Ray (`pip install -r examples/puzzletron/requirements.txt`)
1414

15-
## Deploy the Model Locally (example: interactive node, 2 GPUs)
15+
## 1. Deploy the model (2 GPUs example)
1616

1717
```bash
18-
# Repo root (not puzzle_dir)
19-
export MODELOPT_WORKDIR=/path/to/Model-Optimizer
20-
export NEMO_EXPORT_DEPLOY_DIR=/opt/Export-Deploy # NeMo container default; adjust if needed
21-
22-
# Choose a checkpoint
23-
export CHECKPOINT_PATH=/path/to/ckpts/teacher
24-
# or a pruned checkpoint:
25-
# export CHECKPOINT_PATH=/path/to/ckpts/ffn_8704_attn_no_op
26-
27-
# First time only: back up the original deployable
28-
cp $NEMO_EXPORT_DEPLOY_DIR/nemo_deploy/llm/hf_deployable.py \
29-
$NEMO_EXPORT_DEPLOY_DIR/nemo_deploy/llm/hf_deployable.py.bak
30-
31-
# Patch the deployable for AnyModel support
32-
cp examples/puzzletron/evaluation/hf_deployable_anymodel.py \
33-
$NEMO_EXPORT_DEPLOY_DIR/nemo_deploy/llm/hf_deployable.py
18+
# Install the AnyModel-patched deployable (first time only: backs up the original)
19+
# /opt/Export-Deploy is the default path in NeMo containers — adjust if needed
20+
cp /opt/Export-Deploy/nemo_deploy/llm/hf_deployable.py /opt/Export-Deploy/nemo_deploy/llm/hf_deployable.py.bak
21+
cp examples/puzzletron/evaluation/hf_deployable_anymodel.py /opt/Export-Deploy/nemo_deploy/llm/hf_deployable.py
3422

23+
# Start the server (blocks while running — use a separate terminal)
3524
ray start --head --num-gpus 2 --port 6379 --disable-usage-stats
36-
37-
# Run in a separate terminal (blocks while server is up)
38-
python $NEMO_EXPORT_DEPLOY_DIR/scripts/deploy/nlp/deploy_ray_hf.py \
39-
--model_path $CHECKPOINT_PATH \
40-
--model_id anymodel-hf \
41-
--num_replicas 1 \
42-
--num_gpus 2 \
43-
--num_gpus_per_replica 2 \
44-
--num_cpus_per_replica 16 \
45-
--trust_remote_code \
46-
--port 8083 \
47-
--device_map "auto" \
48-
--cuda_visible_devices "0,1"
25+
python /opt/Export-Deploy/scripts/deploy/nlp/deploy_ray_hf.py \
26+
--model_path path/to/checkpoint \
27+
--model_id anymodel-hf \
28+
--num_gpus 2 --num_gpus_per_replica 2 --num_cpus_per_replica 16 \
29+
--trust_remote_code --port 8083 --device_map "auto" --cuda_visible_devices "0,1"
4930
```
5031

51-
`deploy_ray_hf.py` runs a long-lived server. Keep it running in another terminal
52-
or background it (e.g., tmux) while you run eval. Adjust GPU counts and
53-
`cuda_visible_devices` to match your node.
32+
Adjust GPU counts and `cuda_visible_devices` to match your node.
5433

55-
## Run MMLU
34+
## 2. Run MMLU
5635

5736
```bash
5837
eval-factory run_eval \
59-
--eval_type mmlu \
60-
--model_id anymodel-hf \
61-
--model_type completions \
62-
--model_url http://0.0.0.0:8083/v1/completions/ \
63-
--output_dir $PUZZLE_DIR/evals/mmlu_anymodel \
64-
--overrides "config.params.parallelism=2,config.params.task=mmlu,config.params.extra.tokenizer=$CHECKPOINT_PATH,config.params.extra.tokenizer_backend=huggingface,config.params.request_timeout=6000"
38+
--eval_type mmlu \
39+
--model_id anymodel-hf \
40+
--model_type completions \
41+
--model_url http://0.0.0.0:8083/v1/completions/ \
42+
--output_dir examples/puzzletron/evals/mmlu_anymodel \
43+
--overrides "config.params.task=mmlu,config.params.extra.tokenizer=path/to/checkpoint,config.params.extra.tokenizer_backend=huggingface"
6544
```
6645

67-
For a quick debug run, add `,config.params.limit_samples=5` to the `--overrides` list.
68-
69-
Results can be viewed in the generated `results.yml` file.
46+
For a quick debug run, add `,config.params.limit_samples=5` to `--overrides`.

0 commit comments

Comments
 (0)