Feat/ollama model#1322
Conversation
|
Hey could someone review this PR? It just adds the ollama interface to models/ to run local evals with text and image. I tested evals such as gsm8k with few_shot, vqa_val_lite and mme. If there is anything I missed just let me know! |
kcz358
left a comment
There was a problem hiding this comment.
Hi, thank you for the contribution. It seems like this model just inherit oai with some change on the host or base url, where everything can also be configured using the original openai class itself? If this is just a self hosted oai server, I think can just use the original oai chat models instead of create a new ollama model
|
Yes, that actually makes a lot of sense. The only reason for the push I see now is that with the ollama model available more people would be aware that they can use it, because I dont think everyone is aware they can use the openai model with a ollama self-host. |
Summary
ollamachat backend for local inference through Ollama's OpenAI-compatible/v1API.generate_untilevals with Ollama models.loglikelihoodnon-support.In scope
lmms_eval/models/chat/ollama.py."ollama": "Ollama"inAVAILABLE_CHAT_TEMPLATE_MODELS.test/models/test_ollama.pyfor backend registration and constructor behavior.generate_untilthrough Ollama's OpenAI-compatible chat completions endpoint.Out of scope
loglikelihoodsupport; Ollama returns generated-token logprobs, not prompt/continuation likelihoods required by lmms-eval.Validation
uv --cache-dir .\.uv-cache run --with pytest python -m pytest test/models/test_ollama.py -v| sample size:N=7 tests| key metrics:7 passed| result:passuv --cache-dir .\.uv-cache run python -m lmms_eval --model ollama --model_args model_version=smollm2:135m --include_path C:\tmp\lmms_tasks --tasks gsm8k_ollama_1shot --limit 8 --batch_size 2| sample size:N=8| key metrics:flexible-extract exact_match=0.125, strict-match exact_match=0.000| result:passuv --cache-dir .\.uv-cache run python -m lmms_eval --model ollama --model_args model_version=<vision-model> --tasks ok_vqa_val2014_lite --limit 1 --batch_size 1| sample size:N=1| key metrics:vision generate_until completed and produced metrics table| result:passRisk / Compatibility
Type of Change