Skip to content

Commit ef493df

Browse files
LEANDERANTONYclaude
andcommitted
fix(jd): bump output token cap to 6000 for dense JDs
Routine n8n-style JDs with 40+ hard skills + 10+ must-haves + 10+ nice-to-haves + a verbose benefits block pack the LLM's structured JSON output close to the previous 4000-token ceiling. The auto-retry safety net caught the long tail but at the cost of a doubled end-to-end parse latency on those JDs (one truncated call + one retry). Bumped jd_llm_parser_service.parse's default max_completion_tokens from 4000 -> 6000 so dense JDs land in a single call. max_output_ tokens is a CEILING not a reservation — short JDs (the typical case) cost exactly the same. Input side is unchanged: there's no explicit input cap at this layer; the only real bound is the model's context window (~128k tokens for gpt-5.4-mini). Even a 5000-word JD is ~7k input tokens, nowhere near that limit. Companion to the next commit's frontend JD unification (paste / upload / load-from-search ALL route through this parser via the existing /workspace/job-description/upload endpoint), which makes dense JDs the routine path rather than an edge case worth burning a retry on. Test ``test_jd_parser_requests_generous_budget_and_enables_retry`` updated from ``>= 4000`` to ``>= 6000`` to lock the new floor. 7/7 jd_llm_parser tests pass. Ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 2145824 commit ef493df

2 files changed

Lines changed: 23 additions & 5 deletions

File tree

src/services/jd_llm_parser_service.py

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -198,10 +198,21 @@ def parse(
198198
# silently falls back to the lower-fidelity deterministic JD
199199
# parser. That degraded JD then feeds fit analysis, tailoring,
200200
# and the cover letter — so the truncation cascades through the
201-
# whole workflow. max_output_tokens is a ceiling, not a
202-
# reservation: raising it is free for ordinary JDs.
201+
# whole workflow.
202+
#
203+
# Bumped from 4000 -> 6000 (2026-05-27) after Phase 2 of the
204+
# JD unification: paste / upload / load-from-search ALL now
205+
# route through this parser, so dense JDs (n8n-style with
206+
# 40+ skills + 10+ must-haves + 10+ nice-to-haves + a verbose
207+
# benefits block) are routine, not edge-case. 6000 absorbs
208+
# those in the first call without hitting the retry path;
209+
# short JDs still cost the same because this is a ceiling,
210+
# not a reservation. Input is uncapped at this layer — the
211+
# only real bound is the model's context window (~128k for
212+
# gpt-5.4-mini), so a 5000-word JD (~7k input tokens) has
213+
# plenty of headroom.
203214
*,
204-
max_completion_tokens: int = 4000,
215+
max_completion_tokens: int = 6000,
205216
) -> dict[str, Any]:
206217
if not jd_text or not str(jd_text).strip():
207218
raise ValueError("Job description text must not be empty.")

tests/test_jd_llm_parser_service.py

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -71,14 +71,21 @@ def test_jd_parser_requests_generous_budget_and_enables_retry():
7171
back to the lower-fidelity deterministic JD parser. That degraded
7272
JD then feeds fit analysis, tailoring, and the cover letter — the
7373
truncation cascades. The parser must request a generous ceiling
74-
AND keep the auto-retry safety net."""
74+
AND keep the auto-retry safety net.
75+
76+
Bumped to >=6000 on 2026-05-27 alongside the JD path unification:
77+
paste / upload / load-from-search now ALL route through this
78+
parser, so dense JDs (n8n-style with 40+ skills + verbose
79+
benefits block) are routine. 6000 absorbs those in one call
80+
without firing the retry path.
81+
"""
7582
recorder = _RecordingOpenAIService()
7683
service = JobDescriptionLLMParserService(openai_service=recorder)
7784

7885
service.parse("Senior AI Engineer — lots of requirements ...")
7986

8087
assert recorder.kwargs is not None
81-
assert recorder.kwargs["max_completion_tokens"] >= 4000
88+
assert recorder.kwargs["max_completion_tokens"] >= 6000
8289
assert recorder.kwargs["allow_output_budget_retry"] is True
8390

8491

0 commit comments

Comments
 (0)