Description
The test test_bench_exits_nonzero invokes the full industrial-agents bench CLI to verify non-zero exit on failure. Since IA-4 and IA-6 now
make real LLM calls, this test runs the entire benchmark pipeline when
Ollama is reachable locally — taking minutes or appearing to hang.
In CI (no Ollama), the bench fails fast with a connection error → exit
code 1 → the test's assertion passes. So CI is unaffected, but local
pytest tests/unit -q is slow or hangs.
Surfaced during PR #15 (IA-4/IA-6 implementation).
Steps to reproduce
- Ensure Ollama is running locally (e.g. llama3.2:1b pulled and serving)
- From the repo root: pytest tests/unit -q
- Observe the run stall at test_bench_exits_nonzero while it executes
the full IA-4 + IA-6 benchmark against the live model
Expected behaviour
A unit test should never invoke the real benchmark pipeline or make live
LLM calls. The test should mock the CLI's bench invocation so it verifies
the non-zero exit path deterministically, in milliseconds, regardless of
whether Ollama is running.
Actual behaviour
The test invokes the real industrial-agents bench command. With Ollama
running, this executes IA-4 and IA-6 (real LLM calls), taking minutes or
hanging. Without Ollama (CI), it fails fast with a connection error, so
the assertion coincidentally passes — masking the design flaw.
Suggested fix: mock the bench invocation in this test, OR mark it
@pytest.mark.integration and exclude it from the default unit run.
Framework version
v0.1.0-pre (bench/iabench-synthesis-cost @ 66bc8fe)
LLM provider
ollama
Environment
Windows 11, Python 3.12
Description
The test
test_bench_exits_nonzeroinvokes the fullindustrial-agents benchCLI to verify non-zero exit on failure. Since IA-4 and IA-6 nowmake real LLM calls, this test runs the entire benchmark pipeline when
Ollama is reachable locally — taking minutes or appearing to hang.
In CI (no Ollama), the bench fails fast with a connection error → exit
code 1 → the test's assertion passes. So CI is unaffected, but local
pytest tests/unit -qis slow or hangs.Surfaced during PR #15 (IA-4/IA-6 implementation).
Steps to reproduce
the full IA-4 + IA-6 benchmark against the live model
Expected behaviour
A unit test should never invoke the real benchmark pipeline or make live
LLM calls. The test should mock the CLI's bench invocation so it verifies
the non-zero exit path deterministically, in milliseconds, regardless of
whether Ollama is running.
Actual behaviour
The test invokes the real
industrial-agents benchcommand. With Ollamarunning, this executes IA-4 and IA-6 (real LLM calls), taking minutes or
hanging. Without Ollama (CI), it fails fast with a connection error, so
the assertion coincidentally passes — masking the design flaw.
Suggested fix: mock the bench invocation in this test, OR mark it
@pytest.mark.integration and exclude it from the default unit run.
Framework version
v0.1.0-pre (bench/iabench-synthesis-cost @ 66bc8fe)
LLM provider
ollama
Environment
Windows 11, Python 3.12