Summary
test_run_transformers[context_relevance_alora] fails with JSONDecodeError on GPU hardware. Investigation reveals the root cause is a missing max_completion_tokens in the adapter's io.yaml on HF Hub, combined with broader questions about whether this adapter should remain in the codebase at all.
Failure
mellea/formatters/granite/intrinsics/output.py:1305: in _transform_choice
parsed_json = json.loads(content)
json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 48 (char 47)
Raw model output (truncated mid-JSON):
{ "context_relevance": "relevant", "response": "As of my last update in October
The model generates extra content beyond the required JSON (a "response" field), then hits the transformers default max_length=153 and is cut off mid-string.
Root cause
The context_relevance/granite-4.0-micro/alora/io.yaml on ibm-granite/granitelib-rag-r1.0 has no max_completion_tokens parameter:
parameters:
temperature: 0.0
# max_completion_tokens is absent — every other adapter sets this
Every other adapter (uncertainty, requirement-check, answerability, etc.) sets max_completion_tokens: 15. Without it, transformers falls back to max_length=153, which is enough for most runs but not when the model goes off-script and generates extra fields.
Additional context from HF README
The top-level context_relevance/README.md contains this note:
We have consistently found that this adapter does not benchmark noticeably better than a well-crafted prompt on the current evaluation benchmarks. Going forward, for the currently described and benchmarked context relevance use case, we recommend a well-crafted prompt.
The adapter is effectively deprecated upstream in favour of prompting.
HPC test results (single run, granite-4.0-micro, p1-r11-n3)
| Test |
Result |
context_relevance (LoRA, non-alora) |
✅ PASS (bare continue — no real assertion) |
context_relevance_alora |
❌ FAIL — JSONDecodeError |
Note: the current PR (#1292) has a bare continue for context_relevance tests that suppresses the assertion comparison, but this does not protect against the upstream JSONDecodeError thrown before that branch is reached.
Options
- Fix upstream: Add
max_completion_tokens: 15 to the HF repo io.yaml — minimal fix, keeps the adapter
- Remove the adapter: Given the upstream deprecation notice, remove
context_relevance_alora (and possibly context_relevance LoRA) from the test combos and from check_context_relevance() in mellea/stdlib/components/intrinsic/rag.py
- xfail for now: Mark
context_relevance_alora as xfail while the upstream decision is made
Related
Summary
test_run_transformers[context_relevance_alora]fails withJSONDecodeErroron GPU hardware. Investigation reveals the root cause is a missingmax_completion_tokensin the adapter'sio.yamlon HF Hub, combined with broader questions about whether this adapter should remain in the codebase at all.Failure
Raw model output (truncated mid-JSON):
The model generates extra content beyond the required JSON (a
"response"field), then hits the transformers defaultmax_length=153and is cut off mid-string.Root cause
The
context_relevance/granite-4.0-micro/alora/io.yamlonibm-granite/granitelib-rag-r1.0has nomax_completion_tokensparameter:Every other adapter (
uncertainty,requirement-check,answerability, etc.) setsmax_completion_tokens: 15. Without it, transformers falls back tomax_length=153, which is enough for most runs but not when the model goes off-script and generates extra fields.Additional context from HF README
The top-level
context_relevance/README.mdcontains this note:The adapter is effectively deprecated upstream in favour of prompting.
HPC test results (single run, granite-4.0-micro, p1-r11-n3)
context_relevance(LoRA, non-alora)continue— no real assertion)context_relevance_aloraNote: the current PR (#1292) has a bare
continueforcontext_relevancetests that suppresses the assertion comparison, but this does not protect against the upstreamJSONDecodeErrorthrown before that branch is reached.Options
max_completion_tokens: 15to the HF repo io.yaml — minimal fix, keeps the adaptercontext_relevance_alora(and possiblycontext_relevanceLoRA) from the test combos and fromcheck_context_relevance()inmellea/stdlib/components/intrinsic/rag.pycontext_relevance_aloraasxfailwhile the upstream decision is madeRelated