Summary
test_run_transformers[query_clarification] times out after 180s during torch.nn.Linear.forward (matrix multiplication) on CPU/MPS without GPU.
Context
Discovered during PR #1294 (aLoRA input_ids fix) while verifying the full test suite. This is a separate pre-existing issue that also exists on upstream/main without the fix.
Details
- Model:
ibm-granite/granite-4.1-3b
- Context: 4 conversation turns + 12 retrieved documents (~2000+ tokens)
- Generation:
max_completion_tokens: 512
- Hardware: Apple Silicon (MPS), no CUDA
The test gets stuck in F.linear() doing a forward pass on a ~2000-token context. This is raw compute, not a code bug.
Why it matters
The test is already gated on CI (gh_run == 1 xfails it), but it blocks local full-suite runs and any developer running pytest -m "not qualitative" without realizing this one test will take 3+ minutes on CPU.
Proposed fixes
- Add
@pytest.mark.slow to exclude it from the fast loop
- Add a CPU-only skip:
pytest.skip("query_clarification takes >180s on CPU")
- Increase the timeout for this specific test
Summary
test_run_transformers[query_clarification]times out after 180s duringtorch.nn.Linear.forward(matrix multiplication) on CPU/MPS without GPU.Context
Discovered during PR #1294 (aLoRA input_ids fix) while verifying the full test suite. This is a separate pre-existing issue that also exists on
upstream/mainwithout the fix.Details
ibm-granite/granite-4.1-3bmax_completion_tokens: 512The test gets stuck in
F.linear()doing a forward pass on a ~2000-token context. This is raw compute, not a code bug.Why it matters
The test is already gated on CI (
gh_run == 1xfails it), but it blocks local full-suite runs and any developer runningpytest -m "not qualitative"without realizing this one test will take 3+ minutes on CPU.Proposed fixes
@pytest.mark.slowto exclude it from the fast looppytest.skip("query_clarification takes >180s on CPU")