Skip to content

--model CLI flag not forwarded to agent llm.complete() calls #14

Description

@adris-misra

Description

The bench command accepts --model <name> and records it in
BenchmarkResult.model, but does not forward it to the agents under
test. Agents fall back to OllamaProvider._DEFAULT_MODEL (or whatever
the env var OLLAMA_MODEL is set to).

This makes the --model flag misleading: the result JSON's model field
reflects what was requested on the CLI, not what was actually used in
inference.

Surfaced during PR 2 health-check work. The workaround is to set
OLLAMA_MODEL env var before invoking the CLI.

Steps to reproduce

  1. Run industrial-agents bench --suite all --provider ollama --model llama3.1:8b
    on a system where Ollama's default model is llama3.2:1b
  2. Inspect the result JSON — "model": "llama3.1:8b" is logged
  3. But the agent's actual inference calls hit llama3.2:1b

Expected behaviour

The --model flag should be propagated to the LLM provider so the agent
uses the requested model. Result JSON's model field should reflect
ground truth — the model actually used in inference, not the one
requested.

Actual behaviour

The --model flag is stored in BenchmarkResult.model metadata but is
not forwarded to agents' llm.complete() calls. The result JSON's
model field is therefore misleading.

Likely cause: get_llm_provider("ollama") constructs an OllamaProvider
instance that ignores the model argument, or the agents instantiate
their own provider with default settings.

Suggested fix: Thread the resolved model name from the CLI through the
agent construction path. Update OllamaProvider.complete() (and the
other provider classes) so the model kwarg overrides the instance
default.

Framework version

v0.1.0-pre (bench/iabench-retrieval-hallucination @ 2d2834f)

LLM provider

ollama

Environment

Windows 11, Python 3.12

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions