Skip to content

test: query_clarification times out on CPU during full test run #1296

@planetf1

Description

@planetf1

Summary

test_run_transformers[query_clarification] times out after 180s during torch.nn.Linear.forward (matrix multiplication) on CPU/MPS without GPU.

Context

Discovered during PR #1294 (aLoRA input_ids fix) while verifying the full test suite. This is a separate pre-existing issue that also exists on upstream/main without the fix.

Details

  • Model: ibm-granite/granite-4.1-3b
  • Context: 4 conversation turns + 12 retrieved documents (~2000+ tokens)
  • Generation: max_completion_tokens: 512
  • Hardware: Apple Silicon (MPS), no CUDA

The test gets stuck in F.linear() doing a forward pass on a ~2000-token context. This is raw compute, not a code bug.

Why it matters

The test is already gated on CI (gh_run == 1 xfails it), but it blocks local full-suite runs and any developer running pytest -m "not qualitative" without realizing this one test will take 3+ minutes on CPU.

Proposed fixes

  1. Add @pytest.mark.slow to exclude it from the fast loop
  2. Add a CPU-only skip: pytest.skip("query_clarification takes >180s on CPU")
  3. Increase the timeout for this specific test

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions