You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add support and documentation for AnyModel checkpoints with Nemo evaluator (#894)
This PR adds Nemo Evaluator support to the AnyModel branch. It includes
documentation and a deployment script that allow for evaluation of
AnyModel Puzzletron checkpoints with Nemo Evaluator.
We assume development on a GPU node, following the current tutorial
style, so we don't rely on Slurm-based deployment/evaluation, but
instead use direct evaluation via `eval-factory run_eval`.
---------
Signed-off-by: jrausch <jrausch@nvidia.com>
- For this example we are using 2x NVIDIA H100 80GB HBM3 to show multi-GPU steps. You can use also use s single GPU.
@@ -229,16 +229,38 @@ The plot shows how token accuracy changes with different compression rates. High
229
229
230
230
## Evaluation
231
231
232
-
Once the model is ready, you can evaluate it using [Language Model Evaluation Harness](https://pypi.org/project/lm-eval/). For example, run the following to evaluate the model on [Massive Multitask Language Understanding](https://huggingface.co/datasets/cais/mmlu) benchmark.
232
+
Evaluate AnyModel checkpoints by deploying a local OpenAI-compatible completions endpoint and running benchmarks against it.
233
+
234
+
**1. Deploy the model (2 GPUs example):**
235
+
236
+
```bash
237
+
# Install the AnyModel-patched deployable (first time only: backs up the original)
238
+
# /opt/Export-Deploy is the default path in NeMo containers — adjust if needed
0 commit comments