Overview
The resume and job listing systems have several consistency issues that prevent proper linking and tracking of tailored resumes. This issue tracks 8 critical inconsistencies that need to be resolved.
Current State
After implementing the URL-to-tailored-resume pipeline (commit 7422648), we now:
- ✅ Fetch job listings from URLs
- ✅ Build resumes from experience log
- ✅ Tailor resumes to jobs
- ✅ Auto-index tailored resumes
- ❌ But don't properly link resumes to jobs
- ❌ And have inconsistent data structures
Issues Identified
1. Resume-Job Linking (CRITICAL)
Problem: Resume metadata has job_listing_id field, but tailor_from_url.py doesn't set it
- Resume index has:
id, name, created_at, updated_at, job_listing_id, is_master, description
- When tailoring,
job_listing_id is always null
- Web UI shows "Linked to job" badge but link is always null
Impact: Can't track which resumes were tailored for which jobs
Fix: Extract job listing ID from fetched job and pass to resume creation
2. Timestamp Format Inconsistency
Problem: Mixed timestamp formats across indices
- Resume index: ISO 8601 without timezone (e.g.,
2025-10-26T12:36:01.645244)
- Job listing index: Mixed formats:
- Some with
Z suffix (UTC): 2025-10-26T15:24:10.294414Z
- Some without:
2025-10-11T20:19:36.840810
Impact: Inconsistent data makes parsing and comparison difficult
Fix: Standardize to ISO 8601 with Z suffix across both systems
3. Missing Fields in Job Listing Index
Problem: Job listing index entries are inconsistent
- Some have:
id, title, company, location, file, created_at, description
- Some have:
id, title, company, location, created_at, updated_at (no file/description)
- Missing:
updated_at (inconsistent), description (inconsistent), file (inconsistent)
Impact: Can't reliably query job listing metadata
Fix: Ensure all job listings have consistent metadata fields
4. Job Listing Model vs fetch_job_listing.py Mismatch
Problem: Two different storage patterns for the same entity
JobListing.create() stores full job data in {id}.json with fields: id, title, company, description, url, location, salary_range, keywords, created_at, updated_at
fetch_job_listing() saves markdown files and only updates index with: id, title, company, location, file, created_at, description
tailor_from_url.py uses fetch_job_listing() instead of JobListing model
Impact: Job data is stored in two different ways; can't use JobListing model consistently
Fix: Use JobListing model in tailor_from_url.py instead of fetch_job_listing()
5. Missing URL Storage in Job Listings
Problem: Job listing index doesn't include URL
- Resume index has
job_listing_id to link back to job
- Job listing index missing
url field (only in full JSON, not in index)
- Can't reconstruct the original job URL from index alone
Impact: Can't trace back to original job posting
Fix: Add url field to job listing index entries
6. Inconsistent ID Generation
Problem: Mixed ID formats across job listings
- Resume IDs: Always UUID (e.g.,
136c188e-659d-49cf-ba0f-983c279e80e7)
- Job listing IDs: Mixed:
- Manual IDs:
credibly-se-iii-20251011
- UUIDs:
644870ea-db70-49b3-9b1f-a1f4887c3b70
Impact: Inconsistent ID format makes it harder to process programmatically
Fix: Always use UUID for consistency
7. Missing Bidirectional Linking
Problem: Linking is one-directional only
- Resume → Job: Resume has
job_listing_id ✓
- Job → Resumes: Job listing has no field to track which resumes were tailored from it ✗
- Can't query "which resumes were tailored for this job?"
Impact: Can't find all tailored resumes for a given job
Fix: Add tailored_resume_ids array to job listing metadata
8. Keyword Extraction Not Stored
Problem: Keywords are extracted but not persisted
JobListing model supports keywords field
fetch_job_listing() doesn't extract or store keywords
tailor_from_url.py extracts keywords but doesn't save them to job listing
- Keywords are lost after tailoring; can't reuse them
Impact: Can't reuse keywords for future tailoring or analysis
Fix: Extract keywords in tailor_from_url.py and update job listing with them
Recommended Fix Strategy
Phase 1: Use JobListing Model Consistently
- Update
tailor_from_url.py to use JobListing model instead of fetch_job_listing()
- Extract job title, company, location from fetched content
- Create job listing entry with full metadata (including URL and keywords)
- Return job listing ID for linking
Phase 2: Implement Bidirectional Linking
- Update Resume model to support
job_listing_id (already exists)
- Update JobListing model to support
tailored_resume_ids array
- When creating tailored resume, update both:
- Resume: set
job_listing_id
- Job listing: append resume ID to
tailored_resume_ids
Phase 3: Standardize Data Formats
- Standardize all timestamps to ISO 8601 with
Z suffix
- Ensure all job listing index entries have consistent fields
- Always use UUID for job listing IDs
- Add URL field to job listing index
Phase 4: Extract and Store Keywords
- Extract keywords from job description in
tailor_from_url.py
- Update job listing with extracted keywords
- Make keywords queryable for future use
Testing
After fixes, verify:
- Tailored resume has correct
job_listing_id
- Job listing has tailored resume ID in
tailored_resume_ids
- All timestamps are consistent format
- All job listing index entries have required fields
- Keywords are stored and retrievable
- Web UI shows proper linking between resumes and jobs
Related Issues
Acceptance Criteria
Overview
The resume and job listing systems have several consistency issues that prevent proper linking and tracking of tailored resumes. This issue tracks 8 critical inconsistencies that need to be resolved.
Current State
After implementing the URL-to-tailored-resume pipeline (commit 7422648), we now:
Issues Identified
1. Resume-Job Linking (CRITICAL)
Problem: Resume metadata has
job_listing_idfield, buttailor_from_url.pydoesn't set itid, name, created_at, updated_at, job_listing_id, is_master, descriptionjob_listing_idis alwaysnullImpact: Can't track which resumes were tailored for which jobs
Fix: Extract job listing ID from fetched job and pass to resume creation
2. Timestamp Format Inconsistency
Problem: Mixed timestamp formats across indices
2025-10-26T12:36:01.645244)Zsuffix (UTC):2025-10-26T15:24:10.294414Z2025-10-11T20:19:36.840810Impact: Inconsistent data makes parsing and comparison difficult
Fix: Standardize to ISO 8601 with
Zsuffix across both systems3. Missing Fields in Job Listing Index
Problem: Job listing index entries are inconsistent
id, title, company, location, file, created_at, descriptionid, title, company, location, created_at, updated_at(no file/description)updated_at(inconsistent),description(inconsistent),file(inconsistent)Impact: Can't reliably query job listing metadata
Fix: Ensure all job listings have consistent metadata fields
4. Job Listing Model vs fetch_job_listing.py Mismatch
Problem: Two different storage patterns for the same entity
JobListing.create()stores full job data in{id}.jsonwith fields:id, title, company, description, url, location, salary_range, keywords, created_at, updated_atfetch_job_listing()saves markdown files and only updates index with:id, title, company, location, file, created_at, descriptiontailor_from_url.pyusesfetch_job_listing()instead ofJobListingmodelImpact: Job data is stored in two different ways; can't use JobListing model consistently
Fix: Use
JobListingmodel intailor_from_url.pyinstead offetch_job_listing()5. Missing URL Storage in Job Listings
Problem: Job listing index doesn't include URL
job_listing_idto link back to joburlfield (only in full JSON, not in index)Impact: Can't trace back to original job posting
Fix: Add
urlfield to job listing index entries6. Inconsistent ID Generation
Problem: Mixed ID formats across job listings
136c188e-659d-49cf-ba0f-983c279e80e7)credibly-se-iii-20251011644870ea-db70-49b3-9b1f-a1f4887c3b70Impact: Inconsistent ID format makes it harder to process programmatically
Fix: Always use UUID for consistency
7. Missing Bidirectional Linking
Problem: Linking is one-directional only
job_listing_id✓Impact: Can't find all tailored resumes for a given job
Fix: Add
tailored_resume_idsarray to job listing metadata8. Keyword Extraction Not Stored
Problem: Keywords are extracted but not persisted
JobListingmodel supportskeywordsfieldfetch_job_listing()doesn't extract or store keywordstailor_from_url.pyextracts keywords but doesn't save them to job listingImpact: Can't reuse keywords for future tailoring or analysis
Fix: Extract keywords in
tailor_from_url.pyand update job listing with themRecommended Fix Strategy
Phase 1: Use JobListing Model Consistently
tailor_from_url.pyto useJobListingmodel instead offetch_job_listing()Phase 2: Implement Bidirectional Linking
job_listing_id(already exists)tailored_resume_idsarrayjob_listing_idtailored_resume_idsPhase 3: Standardize Data Formats
ZsuffixPhase 4: Extract and Store Keywords
tailor_from_url.pyTesting
After fixes, verify:
job_listing_idtailored_resume_idsRelated Issues
Acceptance Criteria