test: context_relevance_alora and uncertainty_alora fail on CPU — peft cannot compute aLoRA offsets without input_ids

## Summary

`test_run_transformers[context_relevance_alora]` and `test_run_transformers[uncertainty_alora]` fail on CPU-only hardware (confirmed on macOS, `upstream/main` at `fe7ce3c7`). The other two aLoRA formatter tests (`answerability_answerable_alora`, `requirement_check_alora`) pass on the same hardware.

The test file carries an explicit comment: **"THIS TEST DOES NOT REQUIRE A GPU."** These failures are therefore a bug, not a deliberate skip.

## Observed failures

**`context_relevance_alora`**
```
json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 48 (char 47)
```
Model output is truncated mid-JSON: `{ "context_relevance": "relevant", "response": "As of my last update in October` — the model stops generating before closing the string.

**`uncertainty_alora`**
```
AssertionError: assert {'certainty': 0.061...} == approx({'certainty': 0.829 ± 0.1})
Max absolute difference: 0.769
```
The model returns a score of `"0"` rather than the expected value.

## Root cause

peft emits the following warning for all aLoRA tests on CPU:

```
UserWarning: Cannot calculate aLoRA offsets during generate as input_ids are not available. Disabling aLoRA.
```

When aLoRA offsets cannot be computed, peft silently falls back to standard LoRA. For `answerability` and `requirement_check` this degraded output is still close enough to pass the assertions; `context_relevance` and `uncertainty` are sensitive enough to the missing prefix optimisation that they produce wrong or truncated output.

## Why input_ids are unavailable

`_generate_from_intrinsic` in `mellea/backends/huggingface.py` calls `generate_with_transformers` (from `granite_formatters`) with a `generate_input` dict that includes `input_tokens`. However, peft's aLoRA hook fires inside `model.generate()` at the point where forward is called — if `generate_with_transformers` passes the inputs as `inputs_embeds` rather than `input_ids` at that stage, peft cannot recover the token IDs it needs to compute prefix offsets, even though the token IDs were available earlier in the pipeline.

The `input_tokens` are already on `generate_input` and also stored on `output._process` / `output._post_process` via `input_ids=generate_input["input_tokens"]`. The gap is whether they reach peft's hook inside `model.generate()`.

## Possible fix

Investigate whether `generate_with_transformers` (or the call site in `_generate_from_intrinsic`) can be modified to pass `input_ids` into the `model.generate()` call alongside any `inputs_embeds`, so peft's aLoRA hook has what it needs on CPU. Alternatively, if `generate_with_transformers` intentionally embeds first, the backend may need to retain `input_ids` and forward them separately via `generate()`'s `input_ids` kwarg.

## Reproduction

```bash
uv run pytest \
  "test/formatters/granite/test_intrinsics_formatters.py::test_run_transformers[context_relevance_alora]" \
  "test/formatters/granite/test_intrinsics_formatters.py::test_run_transformers[uncertainty_alora]" \
  --tb=short -q
```

Fails on macOS (CPU), expected to pass per the test file's own annotation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: context_relevance_alora and uncertainty_alora fail on CPU — peft cannot compute aLoRA offsets without input_ids #1286

Summary

Observed failures

Root cause

Why input_ids are unavailable

Possible fix

Reproduction

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

test: context_relevance_alora and uncertainty_alora fail on CPU — peft cannot compute aLoRA offsets without input_ids #1286

Description

Summary

Observed failures

Root cause

Why input_ids are unavailable

Possible fix

Reproduction

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions