Parsers are responsible for extracting structured data from job application pages. There are two types:
- Hardcoded parsers — for Greenhouse, Workday, Lever
- LLM fallback parser — for any other site
Each parser returns the same shape of data:
{
company: "Stripe",
job_title: "Backend Engineer",
job_description: "We are looking for...",
portal: "greenhouse",
resume_filename: "backend_v3.pdf", // or null — filename for display
resume: { // full file for upload, or null
filename: "backend_v3.pdf",
mimetype: "application/pdf",
size: 204800,
base64: "JVBERi0x..."
},
url: "https://boards.greenhouse.io/stripe/jobs/123"
}URL pattern: *.greenhouse.io/* or boards.greenhouse.io/*
Greenhouse has a completely consistent HTML structure across ALL companies that use it.
// parsers/greenhouse.js
export function parse() {
const company = document.querySelector('.company-name')?.innerText?.trim()
?? extractFromTitle('greenhouse');
const job_title = document.querySelector('.app-title h1')?.innerText?.trim()
?? document.querySelector('h1.job-title')?.innerText?.trim();
const job_description = document.querySelector('#content')?.innerText?.trim()
?? document.querySelector('.job-description')?.innerText?.trim();
const resume_filename = getResumeFilename();
return {
company,
job_title,
job_description,
portal: 'greenhouse',
resume_filename,
url: window.location.href
};
}
function getResumeFilename() {
const fileInput = document.querySelector('input[type="file"]');
return fileInput?.files?.[0]?.name ?? null;
}
function extractFromTitle(portal) {
// Fallback: parse company from page title
// "Software Engineer at Stripe - Greenhouse" → "Stripe"
const title = document.title;
const match = title.match(/at (.+?) [-–|]/);
return match?.[1] ?? 'Unknown Company';
}Key selectors for Greenhouse:
| Field | Selector |
|---|---|
| Company | .company-name or page title |
| Job Title | .app-title h1 |
| Job Description | #content or .job-description |
| Resume Input | input[type="file"] |
| Submit Button | input[type="submit"], button[type="submit"] |
URL pattern: *.myworkdayjobs.com/*
Workday is more complex — it's a React SPA. Selectors are less stable but still consistent across companies.
// parsers/workday.js
export function parse() {
const job_title = document.querySelector('[data-automation-id="jobPostingHeader"]')
?.innerText?.trim();
const company = extractCompanyFromWorkdayUrl();
const job_description = document.querySelector('[data-automation-id="jobPostingDescription"]')
?.innerText?.trim();
const resume_filename = getResumeFilename();
return {
company,
job_title,
job_description,
portal: 'workday',
resume_filename,
url: window.location.href
};
}
function extractCompanyFromWorkdayUrl() {
// URL: amazon.myworkdayjobs.com → "Amazon"
const hostname = window.location.hostname;
const company = hostname.split('.')[0];
return company.charAt(0).toUpperCase() + company.slice(1);
}
function getResumeFilename() {
// Workday uses a custom file upload — look for the filename display element
const filenameEl = document.querySelector('[data-automation-id="file-upload-filename"]');
return filenameEl?.innerText?.trim() ?? null;
}Key selectors for Workday:
| Field | Selector |
|---|---|
| Job Title | [data-automation-id="jobPostingHeader"] |
| Company | Extracted from subdomain |
| Job Description | [data-automation-id="jobPostingDescription"] |
| Resume Filename | [data-automation-id="file-upload-filename"] |
| Submit Button | [data-automation-id="bottom-navigation-next-button"] |
Note: Workday uses multi-step forms. Hook into the final submit, not intermediate "Next" buttons.
URL pattern: jobs.lever.co/*
Lever is simpler than Workday. Clean HTML, easy to parse.
// parsers/lever.js
export function parse() {
const job_title = document.querySelector('.posting-headline h2')?.innerText?.trim();
const company = document.querySelector('.main-header-text .posting-categories .sort-by-team')
?.innerText?.trim()
?? extractFromLeverUrl();
const job_description = document.querySelector('.section-wrapper')?.innerText?.trim();
const resume_filename = getResumeFilename();
return {
company,
job_title,
job_description,
portal: 'lever',
resume_filename,
url: window.location.href
};
}
function extractFromLeverUrl() {
// URL: jobs.lever.co/stripe/abc123 → "Stripe"
const parts = window.location.pathname.split('/');
const company = parts[1];
return company.charAt(0).toUpperCase() + company.slice(1);
}
function getResumeFilename() {
const fileInput = document.querySelector('input[type="file"]');
return fileInput?.files?.[0]?.name ?? null;
}Key selectors for Lever:
| Field | Selector |
|---|---|
| Job Title | .posting-headline h2 |
| Company | From URL path |
| Job Description | .section-wrapper |
| Resume Input | input[type="file"] |
| Submit Button | button[type="submit"] |
Used when detectPortal() returns 'unknown'.
Extension detects unknown site
↓
Extracts sanitized DOM (removes scripts, styles, nav, footer)
↓
Sends DOM text to backend /api/parse-llm
↓
Backend sends to Claude API with structured prompt
↓
Claude returns JSON with extracted fields
↓
Extension uses the JSON to populate save popup
↓
User confirms or edits before saving
export function extractCleanDOM() {
const clone = document.body.cloneNode(true);
// Remove noise
['script', 'style', 'nav', 'footer', 'header', 'iframe'].forEach(tag => {
clone.querySelectorAll(tag).forEach(el => el.remove());
});
// Limit to 8000 characters to control token usage
return clone.innerText.slice(0, 8000);
}// backend/routes/parse-llm.js
const prompt = `
You are analyzing the text content of a job application page.
Extract the following fields and return ONLY valid JSON, nothing else.
Text content:
${domText}
Return this exact JSON structure:
{
"company": "company name or null",
"job_title": "job title or null",
"job_description": "full job description text or null",
"confidence": "high | medium | low"
}
Rules:
- If you cannot find a field, use null
- job_description should be the full requirements/responsibilities text
- Do not include application form fields in job_description
- company should be the hiring company, not the ATS platform name
`;
const response = await anthropic.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 1000,
messages: [{ role: 'user', content: prompt }]
});
const result = JSON.parse(response.content[0].text);When LLM parser is used, the popup shows the extracted data with edit fields:
┌─────────────────────────────────┐
│ 🎯 JobTracker │
│ ⚡ Auto-detected (verify below) │
├─────────────────────────────────┤
│ Company: [Stripe ] │
│ Role: [Backend Engineer] │
│ │
│ Resume: backend_v3.pdf ▼ │
│ │
│ [Save Application] │
└─────────────────────────────────┘
User can correct any field before saving. This makes it reliable even when LLM makes a mistake.
To add support for a new portal (e.g. iCIMS):
- Create
extension/content/parsers/icims.js - Export a
parse()function returning the standard data shape - Add URL detection in
detect-portal.js:if (url.includes('icims.com')) return 'icims';
- Import in
content.jsparser map
That's it. The rest of the system works automatically.