Skip to content

Fix Resume-Job Linking & Data Consistency Issues #57

@BPMSoftwareSolutions

Description

@BPMSoftwareSolutions

Overview

The resume and job listing systems have several consistency issues that prevent proper linking and tracking of tailored resumes. This issue tracks 8 critical inconsistencies that need to be resolved.

Current State

After implementing the URL-to-tailored-resume pipeline (commit 7422648), we now:

  • ✅ Fetch job listings from URLs
  • ✅ Build resumes from experience log
  • ✅ Tailor resumes to jobs
  • ✅ Auto-index tailored resumes
  • But don't properly link resumes to jobs
  • And have inconsistent data structures

Issues Identified

1. Resume-Job Linking (CRITICAL)

Problem: Resume metadata has job_listing_id field, but tailor_from_url.py doesn't set it

  • Resume index has: id, name, created_at, updated_at, job_listing_id, is_master, description
  • When tailoring, job_listing_id is always null
  • Web UI shows "Linked to job" badge but link is always null

Impact: Can't track which resumes were tailored for which jobs

Fix: Extract job listing ID from fetched job and pass to resume creation


2. Timestamp Format Inconsistency

Problem: Mixed timestamp formats across indices

  • Resume index: ISO 8601 without timezone (e.g., 2025-10-26T12:36:01.645244)
  • Job listing index: Mixed formats:
    • Some with Z suffix (UTC): 2025-10-26T15:24:10.294414Z
    • Some without: 2025-10-11T20:19:36.840810

Impact: Inconsistent data makes parsing and comparison difficult

Fix: Standardize to ISO 8601 with Z suffix across both systems


3. Missing Fields in Job Listing Index

Problem: Job listing index entries are inconsistent

  • Some have: id, title, company, location, file, created_at, description
  • Some have: id, title, company, location, created_at, updated_at (no file/description)
  • Missing: updated_at (inconsistent), description (inconsistent), file (inconsistent)

Impact: Can't reliably query job listing metadata

Fix: Ensure all job listings have consistent metadata fields


4. Job Listing Model vs fetch_job_listing.py Mismatch

Problem: Two different storage patterns for the same entity

  • JobListing.create() stores full job data in {id}.json with fields: id, title, company, description, url, location, salary_range, keywords, created_at, updated_at
  • fetch_job_listing() saves markdown files and only updates index with: id, title, company, location, file, created_at, description
  • tailor_from_url.py uses fetch_job_listing() instead of JobListing model

Impact: Job data is stored in two different ways; can't use JobListing model consistently

Fix: Use JobListing model in tailor_from_url.py instead of fetch_job_listing()


5. Missing URL Storage in Job Listings

Problem: Job listing index doesn't include URL

  • Resume index has job_listing_id to link back to job
  • Job listing index missing url field (only in full JSON, not in index)
  • Can't reconstruct the original job URL from index alone

Impact: Can't trace back to original job posting

Fix: Add url field to job listing index entries


6. Inconsistent ID Generation

Problem: Mixed ID formats across job listings

  • Resume IDs: Always UUID (e.g., 136c188e-659d-49cf-ba0f-983c279e80e7)
  • Job listing IDs: Mixed:
    • Manual IDs: credibly-se-iii-20251011
    • UUIDs: 644870ea-db70-49b3-9b1f-a1f4887c3b70

Impact: Inconsistent ID format makes it harder to process programmatically

Fix: Always use UUID for consistency


7. Missing Bidirectional Linking

Problem: Linking is one-directional only

  • Resume → Job: Resume has job_listing_id
  • Job → Resumes: Job listing has no field to track which resumes were tailored from it ✗
  • Can't query "which resumes were tailored for this job?"

Impact: Can't find all tailored resumes for a given job

Fix: Add tailored_resume_ids array to job listing metadata


8. Keyword Extraction Not Stored

Problem: Keywords are extracted but not persisted

  • JobListing model supports keywords field
  • fetch_job_listing() doesn't extract or store keywords
  • tailor_from_url.py extracts keywords but doesn't save them to job listing
  • Keywords are lost after tailoring; can't reuse them

Impact: Can't reuse keywords for future tailoring or analysis

Fix: Extract keywords in tailor_from_url.py and update job listing with them


Recommended Fix Strategy

Phase 1: Use JobListing Model Consistently

  1. Update tailor_from_url.py to use JobListing model instead of fetch_job_listing()
  2. Extract job title, company, location from fetched content
  3. Create job listing entry with full metadata (including URL and keywords)
  4. Return job listing ID for linking

Phase 2: Implement Bidirectional Linking

  1. Update Resume model to support job_listing_id (already exists)
  2. Update JobListing model to support tailored_resume_ids array
  3. When creating tailored resume, update both:
    • Resume: set job_listing_id
    • Job listing: append resume ID to tailored_resume_ids

Phase 3: Standardize Data Formats

  1. Standardize all timestamps to ISO 8601 with Z suffix
  2. Ensure all job listing index entries have consistent fields
  3. Always use UUID for job listing IDs
  4. Add URL field to job listing index

Phase 4: Extract and Store Keywords

  1. Extract keywords from job description in tailor_from_url.py
  2. Update job listing with extracted keywords
  3. Make keywords queryable for future use

Testing

After fixes, verify:

  1. Tailored resume has correct job_listing_id
  2. Job listing has tailored resume ID in tailored_resume_ids
  3. All timestamps are consistent format
  4. All job listing index entries have required fields
  5. Keywords are stored and retrievable
  6. Web UI shows proper linking between resumes and jobs

Related Issues

Acceptance Criteria

  • Resume-job linking works bidirectionally
  • All timestamps standardized to ISO 8601 with Z suffix
  • Job listing index has consistent fields
  • Keywords extracted and stored with job listings
  • Web UI correctly displays resume-job relationships
  • All tests pass
  • Documentation updated

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions