33> ** Recommended approach:** Use lm-eval for direct evaluation without a
44> deployment server. See the main [ README] ( ../README.md#evaluation ) for details.
55
6- This document describes an alternative evaluation flow using NeMo Evaluator.
7- It deploys the checkpoint as a local OpenAI-compatible completions endpoint
8- and runs evaluation against it.
6+ Evaluate AnyModel checkpoints by deploying a local OpenAI-compatible completions endpoint and running benchmarks against it.
97
10- ## Prerequisites
11-
12- - NeMo container (e.g. ` nemo:25.11 ` ) with NeMo Evaluator and NeMo Export-Deploy
13- - Ray (` pip install -r examples/puzzletron/requirements.txt ` )
14-
15- ## 1. Deploy the model (2 GPUs example)
8+ ** 1. Deploy the model (2 GPUs example):**
169
1710``` bash
1811# Install the AnyModel-patched deployable (first time only: backs up the original)
@@ -29,18 +22,15 @@ python /opt/Export-Deploy/scripts/deploy/nlp/deploy_ray_hf.py \
2922 --trust_remote_code --port 8083 --device_map " auto" --cuda_visible_devices " 0,1"
3023```
3124
32- Adjust GPU counts and ` cuda_visible_devices ` to match your node.
33-
34- ## 2. Run MMLU
25+ ** 2. Run MMLU:**
3526
3627``` bash
3728eval-factory run_eval \
3829 --eval_type mmlu \
3930 --model_id anymodel-hf \
4031 --model_type completions \
4132 --model_url http://0.0.0.0:8083/v1/completions/ \
42- --output_dir examples/puzzletron/evals/mmlu_anymodel \
43- --overrides " config.params.task=mmlu,config.params.extra.tokenizer=path/to/checkpoint,config.params.extra.tokenizer_backend=huggingface"
33+ --output_dir examples/puzzletron/evals/mmlu_anymodel
4434```
4535
46- For a quick debug run, add ` , config.params.limit_samples=5` to ` --overrides ` .
36+ For a quick debug run, add ` --overrides " config.params.limit_samples=5" ` .
0 commit comments