Skip to content

test: context_relevance_alora and uncertainty_alora fail on CPU — peft cannot compute aLoRA offsets without input_ids #1286

Description

@planetf1

Summary

test_run_transformers[context_relevance_alora] and test_run_transformers[uncertainty_alora] fail on CPU-only hardware (confirmed on macOS, upstream/main at fe7ce3c7). The other two aLoRA formatter tests (answerability_answerable_alora, requirement_check_alora) pass on the same hardware.

The test file carries an explicit comment: "THIS TEST DOES NOT REQUIRE A GPU." These failures are therefore a bug, not a deliberate skip.

Observed failures

context_relevance_alora

json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 48 (char 47)

Model output is truncated mid-JSON: { "context_relevance": "relevant", "response": "As of my last update in October — the model stops generating before closing the string.

uncertainty_alora

AssertionError: assert {'certainty': 0.061...} == approx({'certainty': 0.829 ± 0.1})
Max absolute difference: 0.769

The model returns a score of "0" rather than the expected value.

Root cause

peft emits the following warning for all aLoRA tests on CPU:

UserWarning: Cannot calculate aLoRA offsets during generate as input_ids are not available. Disabling aLoRA.

When aLoRA offsets cannot be computed, peft silently falls back to standard LoRA. For answerability and requirement_check this degraded output is still close enough to pass the assertions; context_relevance and uncertainty are sensitive enough to the missing prefix optimisation that they produce wrong or truncated output.

Why input_ids are unavailable

_generate_from_intrinsic in mellea/backends/huggingface.py calls generate_with_transformers (from granite_formatters) with a generate_input dict that includes input_tokens. However, peft's aLoRA hook fires inside model.generate() at the point where forward is called — if generate_with_transformers passes the inputs as inputs_embeds rather than input_ids at that stage, peft cannot recover the token IDs it needs to compute prefix offsets, even though the token IDs were available earlier in the pipeline.

The input_tokens are already on generate_input and also stored on output._process / output._post_process via input_ids=generate_input["input_tokens"]. The gap is whether they reach peft's hook inside model.generate().

Possible fix

Investigate whether generate_with_transformers (or the call site in _generate_from_intrinsic) can be modified to pass input_ids into the model.generate() call alongside any inputs_embeds, so peft's aLoRA hook has what it needs on CPU. Alternatively, if generate_with_transformers intentionally embeds first, the backend may need to retain input_ids and forward them separately via generate()'s input_ids kwarg.

Reproduction

uv run pytest \
  "test/formatters/granite/test_intrinsics_formatters.py::test_run_transformers[context_relevance_alora]" \
  "test/formatters/granite/test_intrinsics_formatters.py::test_run_transformers[uncertainty_alora]" \
  --tb=short -q

Fails on macOS (CPU), expected to pass per the test file's own annotation.

Metadata

Metadata

Assignees

Labels

area/adapter-functionsGranite adapter functions: framework and adaptiers including RAG, Guardian, CorebugSomething isn't workingp2Medium/low: minor bugs, niche features, polish, docs, tests, cleanup. Scoped, lower urgency.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions