test: query_clarification times out on CPU during full test run

## Summary

`test_run_transformers[query_clarification]` times out after 180s during `torch.nn.Linear.forward` (matrix multiplication) on CPU/MPS without GPU.

## Context

Discovered during PR #1294 (aLoRA input_ids fix) while verifying the full test suite. This is a separate pre-existing issue that also exists on `upstream/main` without the fix.

## Details

- **Model:** `ibm-granite/granite-4.1-3b`
- **Context:** 4 conversation turns + 12 retrieved documents (~2000+ tokens)
- **Generation:** `max_completion_tokens: 512`
- **Hardware:** Apple Silicon (MPS), no CUDA

The test gets stuck in `F.linear()` doing a forward pass on a ~2000-token context. This is raw compute, not a code bug.

## Why it matters

The test is already gated on CI (`gh_run == 1` xfails it), but it blocks local full-suite runs and any developer running `pytest -m "not qualitative"` without realizing this one test will take 3+ minutes on CPU.

## Proposed fixes

1. Add `@pytest.mark.slow` to exclude it from the fast loop
2. Add a CPU-only skip: `pytest.skip("query_clarification takes >180s on CPU")`
3. Increase the timeout for this specific test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: query_clarification times out on CPU during full test run #1296

Summary

Context

Details

Why it matters

Proposed fixes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

test: query_clarification times out on CPU during full test run #1296

Description

Summary

Context

Details

Why it matters

Proposed fixes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions