Skip to content

context_relevance aLoRA adapter: missing max_completion_tokens causes JSONDecodeError in test; consider deprecating #1301

@planetf1

Description

@planetf1

Summary

test_run_transformers[context_relevance_alora] fails with JSONDecodeError on GPU hardware. Investigation reveals the root cause is a missing max_completion_tokens in the adapter's io.yaml on HF Hub, combined with broader questions about whether this adapter should remain in the codebase at all.

Failure

mellea/formatters/granite/intrinsics/output.py:1305: in _transform_choice
    parsed_json = json.loads(content)
json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 48 (char 47)

Raw model output (truncated mid-JSON):

{ "context_relevance": "relevant", "response": "As of my last update in October

The model generates extra content beyond the required JSON (a "response" field), then hits the transformers default max_length=153 and is cut off mid-string.

Root cause

The context_relevance/granite-4.0-micro/alora/io.yaml on ibm-granite/granitelib-rag-r1.0 has no max_completion_tokens parameter:

parameters:
  temperature: 0.0
  # max_completion_tokens is absent — every other adapter sets this

Every other adapter (uncertainty, requirement-check, answerability, etc.) sets max_completion_tokens: 15. Without it, transformers falls back to max_length=153, which is enough for most runs but not when the model goes off-script and generates extra fields.

Additional context from HF README

The top-level context_relevance/README.md contains this note:

We have consistently found that this adapter does not benchmark noticeably better than a well-crafted prompt on the current evaluation benchmarks. Going forward, for the currently described and benchmarked context relevance use case, we recommend a well-crafted prompt.

The adapter is effectively deprecated upstream in favour of prompting.

HPC test results (single run, granite-4.0-micro, p1-r11-n3)

Test Result
context_relevance (LoRA, non-alora) ✅ PASS (bare continue — no real assertion)
context_relevance_alora ❌ FAIL — JSONDecodeError

Note: the current PR (#1292) has a bare continue for context_relevance tests that suppresses the assertion comparison, but this does not protect against the upstream JSONDecodeError thrown before that branch is reached.

Options

  1. Fix upstream: Add max_completion_tokens: 15 to the HF repo io.yaml — minimal fix, keeps the adapter
  2. Remove the adapter: Given the upstream deprecation notice, remove context_relevance_alora (and possibly context_relevance LoRA) from the test combos and from check_context_relevance() in mellea/stdlib/components/intrinsic/rag.py
  3. xfail for now: Mark context_relevance_alora as xfail while the upstream decision is made

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions