Skip to content

Commit d93ddc4

Browse files
LEANDERANTONYclaude
andcommitted
fix(jd): unify paste / upload / load-from-search through LLM parser
The 2026-05-27 jobagent re-verify report caught the JD parser STILL leaking section headers ("REQUIREMENTS / MUST-HAVES") and misclassifying items across sections on Step 03 — even though backend commit 2d75bae had already shipped the section-label artifact scrub + benefits keyword filter in src/services/jd_llm_parser_service.py. Root cause was a frontend architecture issue, not a regression in that backend fix: * JDReview.tsx read its Must-Have / Nice-to-Have data from EITHER analysisState (post-Step-04 LLM pipeline) OR a frontend `buildJobReview` regex over the raw text. It NEVER read from `jobFileState` — the LLM-parsed response from the /workspace/job-description/upload endpoint, which already had the cleaned LLM data sitting unused in state after every file upload. * Pasted text was an even bigger gap: pasting hit nothing but the frontend regex. No backend call at all until Step 04 fired — which a Step-03-only verifier with no résumé loaded can never trigger. Two-phase fix, both in this commit because they only deliver value together (Phase 1 adds the display path; Phase 2 fills it for the paste / load-from-search paths that didn't populate jobFileState before). Phase 1 — JDReview reads jobFileState (frontend wiring only, no new backend calls): * Introduced a unified precedence chain used everywhere derived JD fields are computed: analysisState (Step 04, !stale) > jobFileState > review (regex) * Applies to: heroTitle, heroLocation, heroSource label, hardSkills, softSkills, summaryText, bodySections (Must-Have Themes / Nice-to-Have Signals / etc.), and the new pre-analysis Hard-skills / Years-required metric tiles. * Removed the now-unused summaryHeadlineFromAnalysis helper. * Existing Match-Score tile stays placeholder pre-analysis (jobFileState has no fit_analysis — only Step 04 produces that). Phase 2 — auto-parse pasted / loaded text via the existing upload endpoint (debounced + cached): * New useEffect in WorkspaceShell watches manualJobText. After 1500ms of no changes AND text >=100 chars AND auth signed_in, fires uploadJobDescriptionFile with a synthetic ``pasted.txt`` Blob. The backend extracts text from .txt as a no-op and routes the same build_job_description_from_text_auto path the file-upload UI uses. * Result lands in jobFileState — same state slot, same shape, so Phase 1's display wiring picks it up automatically. * Refs (lastParsedTextRef / parseAbortRef / parseDebounceRef) handle: dedup (skip re-parsing identical text), abort (in-flight requests cancel if the user keeps typing), debounce (timer resets on every keystroke). * Already-uploaded text via the explicit upload UI also gets short-circuited via the ``jobFileState.job_description_text === text`` check — avoids a second redundant parse of the same content. * Failures don't surface a toast (regex preview still renders); quota / auth errors come through the request wrapper's own interceptor. Result: ALL three input paths (paste / upload / load-from-search) hit the LLM parser within ~1.5s of the user settling on text. The deterministic regex remains only as a true fallback for the brief debounce window, auth-loading states, and offline scenarios. tsc + eslint clean on touched files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent ef493df commit d93ddc4

2 files changed

Lines changed: 188 additions & 22 deletions

File tree

frontend/src/components/workspace/JDReview.tsx

Lines changed: 87 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -126,13 +126,6 @@ export type JDReviewProps = {
126126
onClearLoadedJobDescription: () => void;
127127
};
128128

129-
function summaryHeadlineFromAnalysis(
130-
analysis: WorkspaceAnalysisResponse | null,
131-
): string | null {
132-
if (!analysis) return null;
133-
return analysis.jd_summary_view?.summary || null;
134-
}
135-
136129
export function JDReview({
137130
analysisState,
138131
analysisIsStale,
@@ -157,10 +150,41 @@ export function JDReview({
157150
event.target.value = "";
158151
}
159152

153+
// Data-source precedence for every derived field below:
154+
//
155+
// analysisState (Step 04 ran, !stale) > jobFileState (LLM parse
156+
// from upload OR debounced paste-auto-parse) > review (frontend
157+
// deterministic regex from buildJobReview)
158+
//
159+
// Before the 2026-05-27 unification, JDReview only used analysisState
160+
// or review — jobFileState's LLM-parsed requirements + jd_summary_view
161+
// were FETCHED but never displayed. So even after a clean upload, the
162+
// Must-Have / Nice-to-Have panels showed the brittle frontend regex
163+
// (which leaked section headers like "REQUIREMENTS / MUST-HAVES" and
164+
// misclassified items across sections). Now the LLM parse is the
165+
// primary source whenever it's available, with regex only as a true
166+
// fallback when no LLM parse exists yet.
167+
//
168+
// The jobFileState slot is populated by three paths:
169+
// 1. Explicit file upload (handleJobDescriptionUpload)
170+
// 2. Debounced auto-parse of pasted text (Phase 2 in WorkspaceShell)
171+
// 3. Future paths that hit the same /workspace/job-description/upload
172+
// endpoint
173+
// All three produce a WorkspaceJobDescriptionUploadResponse with the
174+
// same { job_description, jd_summary_view } shape, so this component
175+
// doesn't need to know which path populated it.
176+
const llmJobDescription =
177+
analysisState && !analysisIsStale
178+
? analysisState.job_description
179+
: jobFileState?.job_description ?? null;
180+
const llmSummaryView =
181+
analysisState && !analysisIsStale
182+
? analysisState.jd_summary_view
183+
: jobFileState?.jd_summary_view ?? null;
184+
160185
const heroTitle =
161186
activeJob?.title ||
162-
jobFileState?.job_description.title ||
163-
analysisState?.job_description.title ||
187+
llmJobDescription?.title ||
164188
review?.summaryCards.find((card) => card.label === "Target Role")?.value ||
165189
"Job description";
166190

@@ -171,12 +195,13 @@ export function JDReview({
171195

172196
const heroLocation =
173197
activeJob?.location ||
198+
llmJobDescription?.location ||
174199
review?.summaryCards.find((card) => card.label === "Location")?.value ||
175200
"";
176201

177202
const heroSource =
178203
activeJob?.source ||
179-
(jobFileState ? "Uploaded file" : review ? "Pasted text" : "");
204+
(jobFileState ? "Parsed JD" : review ? "Pasted text" : "");
180205

181206
// Hero metrics: prefer the parsed analysisState numbers when fresh,
182207
// fall back to the JobReview computed by `buildJobReview` from the
@@ -197,6 +222,10 @@ export function JDReview({
197222
tone?: "muted";
198223
};
199224
const metrics = ((): HeroMetric[] => {
225+
// Match Score tile only populates from a fresh analysisState (it's
226+
// derived from fit_analysis which doesn't exist on jobFileState).
227+
// jobFileState provides the requirement counts but never a score —
228+
// that requires the full Step 04 pipeline.
200229
if (analysisState && !analysisIsStale) {
201230
const fit = analysisState.fit_analysis;
202231
return [
@@ -224,6 +253,37 @@ export function JDReview({
224253
},
225254
];
226255
}
256+
// Pre-analysis path: prefer jobFileState requirement counts (LLM-
257+
// parsed, accurate) over the review regex counts when both exist.
258+
// Match Score tile still placeholder until analysis runs.
259+
if (llmJobDescription) {
260+
const yrs = llmJobDescription.requirements.experience_requirement
261+
? llmJobDescription.requirements.experience_requirement
262+
.replace(/[^0-9+]/g, "")
263+
.slice(0, 4) || "—"
264+
: "—";
265+
return [
266+
{
267+
label: "Match score",
268+
value: "—",
269+
unit: "",
270+
hint: analysisState && analysisIsStale
271+
? "Re-run analysis (inputs changed)"
272+
: "Run analysis to compute",
273+
tone: "muted",
274+
},
275+
{
276+
label: "Hard skills",
277+
value: String(llmJobDescription.requirements.hard_skills.length),
278+
unit: "",
279+
},
280+
{
281+
label: "Years required",
282+
value: yrs,
283+
unit: "",
284+
},
285+
];
286+
}
227287
if (review) {
228288
return [
229289
{
@@ -259,27 +319,32 @@ export function JDReview({
259319
return [];
260320
})();
261321

322+
// Summary headline: jd_summary_view from LLM is preferred; regex
323+
// "Role Snapshot" is the last resort.
262324
const summaryText =
263-
(analysisState && !analysisIsStale && summaryHeadlineFromAnalysis(analysisState)) ||
325+
llmSummaryView?.summary ||
264326
review?.summarySections.find((section) => section.title === "Role Snapshot")
265327
?.items?.[0] ||
266328
null;
267329

330+
// Skill arrays follow the same precedence — LLM-parsed wins.
268331
const hardSkills =
269-
analysisState && !analysisIsStale
270-
? analysisState.job_description.requirements.hard_skills
271-
: (review?.hardSkills ?? []);
332+
llmJobDescription?.requirements.hard_skills ?? review?.hardSkills ?? [];
272333
const softSkills =
273-
analysisState && !analysisIsStale
274-
? analysisState.job_description.requirements.soft_skills
275-
: (review?.softSkills ?? []);
334+
llmJobDescription?.requirements.soft_skills ?? review?.softSkills ?? [];
276335

336+
// Body sections (Must-Have Themes / Nice-to-Have Signals / etc.)
337+
// prefer the LLM-built jd_summary_view.sections — that's the source
338+
// that gets the section-header scrubbing + benefits-keyword filter
339+
// applied in jd_llm_parser_service.py. Falls through to regex only
340+
// when no LLM parse exists yet (e.g. text was just pasted and the
341+
// debounce hasn't fired yet, or the user is offline).
277342
const bodySections =
278-
analysisState && !analysisIsStale
279-
? analysisState.jd_summary_view.sections
280-
: (review?.summarySections.filter(
281-
(section) => section.title !== "Role Snapshot",
282-
) ?? []);
343+
llmSummaryView?.sections ??
344+
review?.summarySections.filter(
345+
(section) => section.title !== "Role Snapshot",
346+
) ??
347+
[];
283348

284349
const inputBodyVisible = !jobInputCollapsed;
285350

frontend/src/components/workspace/WorkspaceShell.tsx

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -516,6 +516,107 @@ export function WorkspaceShell() {
516516
}
517517
}, [activeJob, jobFileState]);
518518

519+
// ── JD auto-parse via LLM ─────────────────────────────────────────
520+
// Debounced effect that pipes pasted / loaded-from-search JD text
521+
// through the same /workspace/job-description/upload endpoint a
522+
// file upload uses. The endpoint returns the LLM-parsed
523+
// jd_summary_view + requirements; that response lands in
524+
// `jobFileState`, and JDReview's precedence chain
525+
// (analysisState > jobFileState > review) automatically picks it
526+
// up to render the Must-Have Themes / Nice-to-Have Signals panels
527+
// from LLM output instead of the brittle frontend regex.
528+
//
529+
// Why we route paste through the SAME endpoint as upload (instead
530+
// of a new /jd/parse-text route): zero new backend surface, zero
531+
// new tests, and a single LLM-quality contract for ALL three input
532+
// paths (paste / upload / load-from-search). The endpoint takes
533+
// any file via UploadedFilePayloadModel — sending the pasted text
534+
// as a synthetic ``pasted.txt`` blob skips the file-extraction
535+
// step internally (it's already text) and falls straight through
536+
// to build_job_description_from_text_auto.
537+
//
538+
// Guards (all four required before firing):
539+
// 1. authStatus === "signed_in" — the endpoint requires auth.
540+
// 2. text length >= 100 chars — under that, regex is fine and
541+
// we don't want to burn token quota on placeholder text.
542+
// 3. text hash differs from the last successfully-parsed text
543+
// — avoids re-parsing the same content on every render or
544+
// after the user pastes back the same JD they had before.
545+
// 4. text differs from jobFileState.job_description_text — if a
546+
// file upload (or earlier paste-parse) already set
547+
// jobFileState from the same text, skip.
548+
//
549+
// Debounce: 1500 ms after the LAST keystroke. Cancels the prior
550+
// timer + aborts the in-flight request, so a fast typist who pauses
551+
// briefly + resumes never fires multiple parses.
552+
const lastParsedTextRef = useRef<string>("");
553+
const parseDebounceRef = useRef<number | null>(null);
554+
const parseAbortRef = useRef<AbortController | null>(null);
555+
useEffect(() => {
556+
if (parseDebounceRef.current !== null) {
557+
window.clearTimeout(parseDebounceRef.current);
558+
parseDebounceRef.current = null;
559+
}
560+
if (parseAbortRef.current) {
561+
parseAbortRef.current.abort();
562+
parseAbortRef.current = null;
563+
}
564+
const text = manualJobText.trim();
565+
if (!text || text.length < 100) return;
566+
if (authStatus !== "signed_in") return;
567+
if (text === lastParsedTextRef.current) return;
568+
if (jobFileState?.job_description_text?.trim() === text) {
569+
// Already parsed by an upload or earlier paste — sync the cache
570+
// so future renders of the same text don't re-fire.
571+
lastParsedTextRef.current = text;
572+
return;
573+
}
574+
575+
parseDebounceRef.current = window.setTimeout(async () => {
576+
parseDebounceRef.current = null;
577+
const abort = new AbortController();
578+
parseAbortRef.current = abort;
579+
setJobFileUploading(true);
580+
try {
581+
// Use a synthetic .txt blob to reuse the existing upload path.
582+
// The backend extracts text from .txt files as a no-op and
583+
// routes straight into build_job_description_from_text_auto
584+
// (the same LLM path the file-upload UI uses).
585+
const blob = new Blob([text], { type: "text/plain" });
586+
const file = new File([blob], "pasted.txt", { type: "text/plain" });
587+
const response = await uploadJobDescriptionFile(file);
588+
if (abort.signal.aborted) return;
589+
lastParsedTextRef.current = text;
590+
setJobFileState(response);
591+
} catch (error) {
592+
if (abort.signal.aborted) return;
593+
// Don't surface a toast for transient parse failures — the
594+
// regex preview in JDReview is still rendering the user's
595+
// text. A quota / auth error will surface from the request
596+
// wrapper's own interceptor.
597+
void error;
598+
} finally {
599+
if (parseAbortRef.current === abort) {
600+
parseAbortRef.current = null;
601+
}
602+
setJobFileUploading(false);
603+
}
604+
}, 1500);
605+
606+
return () => {
607+
if (parseDebounceRef.current !== null) {
608+
window.clearTimeout(parseDebounceRef.current);
609+
parseDebounceRef.current = null;
610+
}
611+
};
612+
// Intentionally only re-run on manualJobText / authStatus changes.
613+
// jobFileState IS read inside the effect but we don't want updates
614+
// to it to re-trigger the effect (the effect SETS it, which would
615+
// create a loop). The "already-parsed" check above handles the
616+
// stale-jobFileState case safely on the next text change.
617+
// eslint-disable-next-line react-hooks/exhaustive-deps
618+
}, [manualJobText, authStatus]);
619+
519620
const {
520621
savedJobs,
521622
savedJobsLoading,

0 commit comments

Comments
 (0)