Description
The bench command accepts --model <name> and records it in
BenchmarkResult.model, but does not forward it to the agents under
test. Agents fall back to OllamaProvider._DEFAULT_MODEL (or whatever
the env var OLLAMA_MODEL is set to).
This makes the --model flag misleading: the result JSON's model field
reflects what was requested on the CLI, not what was actually used in
inference.
Surfaced during PR 2 health-check work. The workaround is to set
OLLAMA_MODEL env var before invoking the CLI.
Steps to reproduce
- Run
industrial-agents bench --suite all --provider ollama --model llama3.1:8b
on a system where Ollama's default model is llama3.2:1b
- Inspect the result JSON —
"model": "llama3.1:8b" is logged
- But the agent's actual inference calls hit
llama3.2:1b
Expected behaviour
The --model flag should be propagated to the LLM provider so the agent
uses the requested model. Result JSON's model field should reflect
ground truth — the model actually used in inference, not the one
requested.
Actual behaviour
The --model flag is stored in BenchmarkResult.model metadata but is
not forwarded to agents' llm.complete() calls. The result JSON's
model field is therefore misleading.
Likely cause: get_llm_provider("ollama") constructs an OllamaProvider
instance that ignores the model argument, or the agents instantiate
their own provider with default settings.
Suggested fix: Thread the resolved model name from the CLI through the
agent construction path. Update OllamaProvider.complete() (and the
other provider classes) so the model kwarg overrides the instance
default.
Framework version
v0.1.0-pre (bench/iabench-retrieval-hallucination @ 2d2834f)
LLM provider
ollama
Environment
Windows 11, Python 3.12
Description
The
benchcommand accepts--model <name>and records it inBenchmarkResult.model, but does not forward it to the agents undertest. Agents fall back to
OllamaProvider._DEFAULT_MODEL(or whateverthe env var
OLLAMA_MODELis set to).This makes the
--modelflag misleading: the result JSON'smodelfieldreflects what was requested on the CLI, not what was actually used in
inference.
Surfaced during PR 2 health-check work. The workaround is to set
OLLAMA_MODELenv var before invoking the CLI.Steps to reproduce
industrial-agents bench --suite all --provider ollama --model llama3.1:8bon a system where Ollama's default model is
llama3.2:1b"model": "llama3.1:8b"is loggedllama3.2:1bExpected behaviour
The
--modelflag should be propagated to the LLM provider so the agentuses the requested model. Result JSON's
modelfield should reflectground truth — the model actually used in inference, not the one
requested.
Actual behaviour
The
--modelflag is stored inBenchmarkResult.modelmetadata but isnot forwarded to agents'
llm.complete()calls. The result JSON'smodelfield is therefore misleading.Likely cause:
get_llm_provider("ollama")constructs anOllamaProviderinstance that ignores the model argument, or the agents instantiate
their own provider with default settings.
Suggested fix: Thread the resolved model name from the CLI through the
agent construction path. Update
OllamaProvider.complete()(and theother provider classes) so the
modelkwarg overrides the instancedefault.
Framework version
v0.1.0-pre (bench/iabench-retrieval-hallucination @ 2d2834f)
LLM provider
ollama
Environment
Windows 11, Python 3.12