context_relevance aLoRA adapter: missing max_completion_tokens causes JSONDecodeError in test; consider deprecating

## Summary

`test_run_transformers[context_relevance_alora]` fails with `JSONDecodeError` on GPU hardware. Investigation reveals the root cause is a missing `max_completion_tokens` in the adapter's `io.yaml` on HF Hub, combined with broader questions about whether this adapter should remain in the codebase at all.

## Failure

```
mellea/formatters/granite/intrinsics/output.py:1305: in _transform_choice
    parsed_json = json.loads(content)
json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 48 (char 47)
```

Raw model output (truncated mid-JSON):
```
{ "context_relevance": "relevant", "response": "As of my last update in October
```

The model generates extra content beyond the required JSON (a `"response"` field), then hits the transformers default `max_length=153` and is cut off mid-string.

## Root cause

The `context_relevance/granite-4.0-micro/alora/io.yaml` on `ibm-granite/granitelib-rag-r1.0` has no `max_completion_tokens` parameter:

```yaml
parameters:
  temperature: 0.0
  # max_completion_tokens is absent — every other adapter sets this
```

Every other adapter (`uncertainty`, `requirement-check`, `answerability`, etc.) sets `max_completion_tokens: 15`. Without it, transformers falls back to `max_length=153`, which is enough for most runs but not when the model goes off-script and generates extra fields.

## Additional context from HF README

The top-level `context_relevance/README.md` contains this note:

> *We have consistently found that this adapter does not benchmark noticeably better than a well-crafted prompt on the current evaluation benchmarks. Going forward, for the currently described and benchmarked context relevance use case, we recommend a well-crafted prompt.*

The adapter is effectively deprecated upstream in favour of prompting.

## HPC test results (single run, granite-4.0-micro, p1-r11-n3)

| Test | Result |
|------|--------|
| `context_relevance` (LoRA, non-alora) | ✅ PASS (bare `continue` — no real assertion) |
| `context_relevance_alora` | ❌ FAIL — JSONDecodeError |

Note: the current PR (#1292) has a bare `continue` for `context_relevance` tests that suppresses the assertion comparison, but this does not protect against the upstream `JSONDecodeError` thrown before that branch is reached.

## Options

1. **Fix upstream**: Add `max_completion_tokens: 15` to the HF repo io.yaml — minimal fix, keeps the adapter
2. **Remove the adapter**: Given the upstream deprecation notice, remove `context_relevance_alora` (and possibly `context_relevance` LoRA) from the test combos and from `check_context_relevance()` in `mellea/stdlib/components/intrinsic/rag.py`
3. **xfail for now**: Mark `context_relevance_alora` as `xfail` while the upstream decision is made

## Related

- Issue #1291 (broader intrinsic test flakiness investigation)
- PR #1292


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

context_relevance aLoRA adapter: missing max_completion_tokens causes JSONDecodeError in test; consider deprecating #1301

Summary

Failure

Root cause

Additional context from HF README

HPC test results (single run, granite-4.0-micro, p1-r11-n3)

Options

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Test	Result
`context_relevance` (LoRA, non-alora)	✅ PASS (bare `continue` — no real assertion)
`context_relevance_alora`	❌ FAIL — JSONDecodeError

context_relevance aLoRA adapter: missing max_completion_tokens causes JSONDecodeError in test; consider deprecating #1301

Description

Summary

Failure

Root cause

Additional context from HF README

HPC test results (single run, granite-4.0-micro, p1-r11-n3)

Options

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions