fix(jd): bump output token cap to 6000 for dense JDs

LEANDERANTONY · claude · LEANDERANTONY · commit ef493df4149c · 2026-05-27T21:34:42.000+05:30
Routine n8n-style JDs with 40+ hard skills + 10+ must-haves + 10+
nice-to-haves + a verbose benefits block pack the LLM's structured
JSON output close to the previous 4000-token ceiling. The auto-retry
safety net caught the long tail but at the cost of a doubled
end-to-end parse latency on those JDs (one truncated call + one
retry).

Bumped jd_llm_parser_service.parse's default max_completion_tokens
from 4000 -&gt; 6000 so dense JDs land in a single call. max_output_
tokens is a CEILING not a reservation — short JDs (the typical
case) cost exactly the same.

Input side is unchanged: there's no explicit input cap at this
layer; the only real bound is the model's context window (~128k
tokens for gpt-5.4-mini). Even a 5000-word JD is ~7k input tokens,
nowhere near that limit.

Companion to the next commit's frontend JD unification (paste /
upload / load-from-search ALL route through this parser via the
existing /workspace/job-description/upload endpoint), which makes
dense JDs the routine path rather than an edge case worth burning
a retry on.

Test ``test_jd_parser_requests_generous_budget_and_enables_retry``
updated from ``&gt;= 4000`` to ``&gt;= 6000`` to lock the new floor.

7/7 jd_llm_parser tests pass. Ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/src/services/jd_llm_parser_service.py b/src/services/jd_llm_parser_service.py
@@ -198,10 +198,21 @@ def parse(
         # silently falls back to the lower-fidelity deterministic JD
         # parser. That degraded JD then feeds fit analysis, tailoring,
         # and the cover letter — so the truncation cascades through the
-        # whole workflow. max_output_tokens is a ceiling, not a
-        # reservation: raising it is free for ordinary JDs.
+        # whole workflow.
+        #
+        # Bumped from 4000 -> 6000 (2026-05-27) after Phase 2 of the
+        # JD unification: paste / upload / load-from-search ALL now
+        # route through this parser, so dense JDs (n8n-style with
+        # 40+ skills + 10+ must-haves + 10+ nice-to-haves + a verbose
+        # benefits block) are routine, not edge-case. 6000 absorbs
+        # those in the first call without hitting the retry path;
+        # short JDs still cost the same because this is a ceiling,
+        # not a reservation. Input is uncapped at this layer — the
+        # only real bound is the model's context window (~128k for
+        # gpt-5.4-mini), so a 5000-word JD (~7k input tokens) has
+        # plenty of headroom.
         *,
-        max_completion_tokens: int = 4000,
+        max_completion_tokens: int = 6000,
     ) -> dict[str, Any]:
         if not jd_text or not str(jd_text).strip():
             raise ValueError("Job description text must not be empty.")
diff --git a/tests/test_jd_llm_parser_service.py b/tests/test_jd_llm_parser_service.py
@@ -71,14 +71,21 @@ def test_jd_parser_requests_generous_budget_and_enables_retry():
     back to the lower-fidelity deterministic JD parser. That degraded
     JD then feeds fit analysis, tailoring, and the cover letter — the
     truncation cascades. The parser must request a generous ceiling
-    AND keep the auto-retry safety net."""
+    AND keep the auto-retry safety net.
+
+    Bumped to >=6000 on 2026-05-27 alongside the JD path unification:
+    paste / upload / load-from-search now ALL route through this
+    parser, so dense JDs (n8n-style with 40+ skills + verbose
+    benefits block) are routine. 6000 absorbs those in one call
+    without firing the retry path.
+    """
     recorder = _RecordingOpenAIService()
     service = JobDescriptionLLMParserService(openai_service=recorder)
 
     service.parse("Senior AI Engineer — lots of requirements ...")
 
     assert recorder.kwargs is not None
-    assert recorder.kwargs["max_completion_tokens"] >= 4000
+    assert recorder.kwargs["max_completion_tokens"] >= 6000
     assert recorder.kwargs["allow_output_budget_retry"] is True