|
3 | 3 | > **Recommended approach:** Use lm-eval for direct evaluation without a |
4 | 4 | > deployment server. See the main [README](../README.md#evaluation) for details. |
5 | 5 |
|
6 | | -This document describes an alternative evaluation flow using NeMo Evaluator |
7 | | -via `eval-factory`. It deploys the checkpoint as a local OpenAI-style completions |
8 | | -endpoint and runs evaluation against it. |
| 6 | +This document describes an alternative evaluation flow using NeMo Evaluator. |
| 7 | +It deploys the checkpoint as a local OpenAI-compatible completions endpoint |
| 8 | +and runs evaluation against it. |
9 | 9 |
|
10 | 10 | ## Prerequisites |
11 | 11 |
|
12 | | -- NeMo container (e.g. `nemo:25.11`) NeMo Evaluator and NeMo Export-Deploy |
13 | | -- The AnyModel deploy patch: `examples/puzzletron/evaluation/hf_deployable_anymodel.py` |
| 12 | +- NeMo container (e.g. `nemo:25.11`) with NeMo Evaluator and NeMo Export-Deploy |
| 13 | +- Ray (`pip install -r examples/puzzletron/requirements.txt`) |
14 | 14 |
|
15 | | -## Deploy the Model Locally (example: interactive node, 2 GPUs) |
| 15 | +## 1. Deploy the model (2 GPUs example) |
16 | 16 |
|
17 | 17 | ```bash |
18 | | -# Repo root (not puzzle_dir) |
19 | | -export MODELOPT_WORKDIR=/path/to/Model-Optimizer |
20 | | -export NEMO_EXPORT_DEPLOY_DIR=/opt/Export-Deploy # NeMo container default; adjust if needed |
21 | | - |
22 | | -# Choose a checkpoint |
23 | | -export CHECKPOINT_PATH=/path/to/ckpts/teacher |
24 | | -# or a pruned checkpoint: |
25 | | -# export CHECKPOINT_PATH=/path/to/ckpts/ffn_8704_attn_no_op |
26 | | - |
27 | | -# First time only: back up the original deployable |
28 | | -cp $NEMO_EXPORT_DEPLOY_DIR/nemo_deploy/llm/hf_deployable.py \ |
29 | | - $NEMO_EXPORT_DEPLOY_DIR/nemo_deploy/llm/hf_deployable.py.bak |
30 | | - |
31 | | -# Patch the deployable for AnyModel support |
32 | | -cp examples/puzzletron/evaluation/hf_deployable_anymodel.py \ |
33 | | - $NEMO_EXPORT_DEPLOY_DIR/nemo_deploy/llm/hf_deployable.py |
| 18 | +# Install the AnyModel-patched deployable (first time only: backs up the original) |
| 19 | +# /opt/Export-Deploy is the default path in NeMo containers — adjust if needed |
| 20 | +cp /opt/Export-Deploy/nemo_deploy/llm/hf_deployable.py /opt/Export-Deploy/nemo_deploy/llm/hf_deployable.py.bak |
| 21 | +cp examples/puzzletron/evaluation/hf_deployable_anymodel.py /opt/Export-Deploy/nemo_deploy/llm/hf_deployable.py |
34 | 22 |
|
| 23 | +# Start the server (blocks while running — use a separate terminal) |
35 | 24 | ray start --head --num-gpus 2 --port 6379 --disable-usage-stats |
36 | | - |
37 | | -# Run in a separate terminal (blocks while server is up) |
38 | | -python $NEMO_EXPORT_DEPLOY_DIR/scripts/deploy/nlp/deploy_ray_hf.py \ |
39 | | - --model_path $CHECKPOINT_PATH \ |
40 | | - --model_id anymodel-hf \ |
41 | | - --num_replicas 1 \ |
42 | | - --num_gpus 2 \ |
43 | | - --num_gpus_per_replica 2 \ |
44 | | - --num_cpus_per_replica 16 \ |
45 | | - --trust_remote_code \ |
46 | | - --port 8083 \ |
47 | | - --device_map "auto" \ |
48 | | - --cuda_visible_devices "0,1" |
| 25 | +python /opt/Export-Deploy/scripts/deploy/nlp/deploy_ray_hf.py \ |
| 26 | + --model_path path/to/checkpoint \ |
| 27 | + --model_id anymodel-hf \ |
| 28 | + --num_gpus 2 --num_gpus_per_replica 2 --num_cpus_per_replica 16 \ |
| 29 | + --trust_remote_code --port 8083 --device_map "auto" --cuda_visible_devices "0,1" |
49 | 30 | ``` |
50 | 31 |
|
51 | | -`deploy_ray_hf.py` runs a long-lived server. Keep it running in another terminal |
52 | | -or background it (e.g., tmux) while you run eval. Adjust GPU counts and |
53 | | -`cuda_visible_devices` to match your node. |
| 32 | +Adjust GPU counts and `cuda_visible_devices` to match your node. |
54 | 33 |
|
55 | | -## Run MMLU |
| 34 | +## 2. Run MMLU |
56 | 35 |
|
57 | 36 | ```bash |
58 | 37 | eval-factory run_eval \ |
59 | | - --eval_type mmlu \ |
60 | | - --model_id anymodel-hf \ |
61 | | - --model_type completions \ |
62 | | - --model_url http://0.0.0.0:8083/v1/completions/ \ |
63 | | - --output_dir $PUZZLE_DIR/evals/mmlu_anymodel \ |
64 | | - --overrides "config.params.parallelism=2,config.params.task=mmlu,config.params.extra.tokenizer=$CHECKPOINT_PATH,config.params.extra.tokenizer_backend=huggingface,config.params.request_timeout=6000" |
| 38 | + --eval_type mmlu \ |
| 39 | + --model_id anymodel-hf \ |
| 40 | + --model_type completions \ |
| 41 | + --model_url http://0.0.0.0:8083/v1/completions/ \ |
| 42 | + --output_dir examples/puzzletron/evals/mmlu_anymodel \ |
| 43 | + --overrides "config.params.task=mmlu,config.params.extra.tokenizer=path/to/checkpoint,config.params.extra.tokenizer_backend=huggingface" |
65 | 44 | ``` |
66 | 45 |
|
67 | | -For a quick debug run, add `,config.params.limit_samples=5` to the `--overrides` list. |
68 | | - |
69 | | -Results can be viewed in the generated `results.yml` file. |
| 46 | +For a quick debug run, add `,config.params.limit_samples=5` to `--overrides`. |
0 commit comments