Skip to content

Commit 32d69ab

Browse files
committed
Simplify puzzletron eval readme
Signed-off-by: jrausch <jrausch@nvidia.com>
1 parent 26f37ae commit 32d69ab

2 files changed

Lines changed: 20 additions & 52 deletions

File tree

examples/puzzletron/README.md

Lines changed: 19 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -199,70 +199,37 @@ block_14: attention no_op ffn intermediate_3072
199199
200200
## Evaluation
201201
202-
### Local Evaluation with NeMo Evaluator (AnyModel)
202+
Evaluate AnyModel checkpoints by deploying a local OpenAI-compatible completions endpoint and running benchmarks against it.
203203
204-
AnyModel checkpoints are currently supported via the patched NeMo Evaluator deployable
205-
in [`examples/puzzletron/evaluation/`](./examples/puzzletron/evaluation/). This deploys a local OpenAI-style completions endpoint that evaluation can be run against.
206-
207-
> **Note:** This flow requires Ray. If it is missing, install it in the container/venv:
208-
>
209-
> ```bash
210-
> pip install ray
211-
> ```
212-
>
213-
**Deploy the model locally on an interactive node (2 GPUs example):**
204+
**1. Deploy the model (2 GPUs example):**
214205
215206
```bash
216-
# Repo root (not puzzle_dir)
217-
export MODELOPT_WORKDIR=/path/to/Model-Optimizer
218-
export NEMO_EXPORT_DEPLOY_DIR=/opt/Export-Deploy #When using a NeMo container, this is where Export-Deploy is located. Adjust, if needed
219-
220-
# Example 1: teacher checkpoint
221-
export CHECKPOINT_PATH=/path/to/ckpts/teacher
222-
223-
# Example 2: pruned checkpoint (solution_0)
224-
# for pruned checkpoints, for example:
225-
export CHECKPOINT_PATH=/path/to/ckpts/ffn_8704_attn_no_op
226-
227-
# First time only: back up the original deployable
228-
cp $NEMO_EXPORT_DEPLOY_DIR/nemo_deploy/llm/hf_deployable.py $NEMO_EXPORT_DEPLOY_DIR/nemo_deploy/llm/hf_deployable.py.bak
207+
# Install the AnyModel-patched deployable (first time only: backs up the original)
208+
# /opt/Export-Deploy is the default path in NeMo containers — adjust if needed
209+
cp /opt/Export-Deploy/nemo_deploy/llm/hf_deployable.py /opt/Export-Deploy/nemo_deploy/llm/hf_deployable.py.bak
210+
cp examples/puzzletron/evaluation/hf_deployable_anymodel.py /opt/Export-Deploy/nemo_deploy/llm/hf_deployable.py
229211

230-
cp examples/puzzletron/evaluation/hf_deployable_anymodel.py $NEMO_EXPORT_DEPLOY_DIR/nemo_deploy/llm/hf_deployable.py
212+
# Start the server (blocks while running — use a separate terminal)
231213
ray start --head --num-gpus 2 --port 6379 --disable-usage-stats
232-
233-
# Run in a separate terminal (blocks while server is up)
234-
python $NEMO_EXPORT_DEPLOY_DIR/scripts/deploy/nlp/deploy_ray_hf.py \
235-
--model_path $CHECKPOINT_PATH \
236-
--model_id anymodel-hf \
237-
--num_replicas 1 \
238-
--num_gpus 2 \
239-
--num_gpus_per_replica 2 \
240-
--num_cpus_per_replica 16 \
241-
--trust_remote_code \
242-
--port 8083 \
243-
--device_map "auto" \
244-
--cuda_visible_devices "0,1"
214+
python /opt/Export-Deploy/scripts/deploy/nlp/deploy_ray_hf.py \
215+
--model_path path/to/checkpoint \
216+
--model_id anymodel-hf \
217+
--num_gpus 2 --num_gpus_per_replica 2 --num_cpus_per_replica 16 \
218+
--trust_remote_code --port 8083 --device_map "auto" --cuda_visible_devices "0,1"
245219
```
246220
247-
Note: `deploy_ray_hf.py` runs a long-lived server. Keep it running in another terminal
248-
or background it (e.g., tmux) while you run eval. Adjust GPU counts and `cuda_visible_devices` to
249-
match your node.
250-
251-
**Run MMLU (full run on the interactive node):**
221+
**2. Run MMLU:**
252222
253223
```bash
254224
eval-factory run_eval \
255-
--eval_type mmlu \
256-
--model_id anymodel-hf \
257-
--model_type completions \
258-
--model_url http://0.0.0.0:8083/v1/completions/ \
259-
--output_dir $PUZZLE_DIR/evals/mmlu_anymodel \
260-
--overrides "config.params.parallelism=2,config.params.task=mmlu,config.params.extra.tokenizer=$CHECKPOINT_PATH,config.params.extra.tokenizer_backend=huggingface,config.params.request_timeout=6000"
225+
--eval_type mmlu \
226+
--model_id anymodel-hf \
227+
--model_type completions \
228+
--model_url http://0.0.0.0:8083/v1/completions/ \
229+
--output_dir examples/puzzletron/evals/mmlu_anymodel
261230
```
262231
263-
Note: For a quick debug run, add `,config.params.limit_samples=5` to the `--overrides` list.
264-
265-
Results can be viewed in the generated `results.yml` file.
232+
For a quick debug run, add `--overrides "config.params.limit_samples=5"`.
266233
267234
## Inference Performance Benchmarking
268235
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
11
lm-eval==0.4.10
22
math-verify
3+
ray

0 commit comments

Comments
 (0)