Fix Resume-Job Linking & Data Consistency Issues

## Overview

The resume and job listing systems have several consistency issues that prevent proper linking and tracking of tailored resumes. This issue tracks 8 critical inconsistencies that need to be resolved.

## Current State

After implementing the URL-to-tailored-resume pipeline (commit 7422648), we now:
- ✅ Fetch job listings from URLs
- ✅ Build resumes from experience log
- ✅ Tailor resumes to jobs
- ✅ Auto-index tailored resumes
- ❌ **But don't properly link resumes to jobs**
- ❌ **And have inconsistent data structures**

## Issues Identified

### 1. Resume-Job Linking (CRITICAL)
**Problem**: Resume metadata has `job_listing_id` field, but `tailor_from_url.py` doesn't set it
- Resume index has: `id, name, created_at, updated_at, job_listing_id, is_master, description`
- When tailoring, `job_listing_id` is always `null`
- Web UI shows "Linked to job" badge but link is always null

**Impact**: Can't track which resumes were tailored for which jobs

**Fix**: Extract job listing ID from fetched job and pass to resume creation

---

### 2. Timestamp Format Inconsistency
**Problem**: Mixed timestamp formats across indices
- Resume index: ISO 8601 without timezone (e.g., `2025-10-26T12:36:01.645244`)
- Job listing index: Mixed formats:
  - Some with `Z` suffix (UTC): `2025-10-26T15:24:10.294414Z`
  - Some without: `2025-10-11T20:19:36.840810`

**Impact**: Inconsistent data makes parsing and comparison difficult

**Fix**: Standardize to ISO 8601 with `Z` suffix across both systems

---

### 3. Missing Fields in Job Listing Index
**Problem**: Job listing index entries are inconsistent
- Some have: `id, title, company, location, file, created_at, description`
- Some have: `id, title, company, location, created_at, updated_at` (no file/description)
- Missing: `updated_at` (inconsistent), `description` (inconsistent), `file` (inconsistent)

**Impact**: Can't reliably query job listing metadata

**Fix**: Ensure all job listings have consistent metadata fields

---

### 4. Job Listing Model vs fetch_job_listing.py Mismatch
**Problem**: Two different storage patterns for the same entity
- `JobListing.create()` stores full job data in `{id}.json` with fields: `id, title, company, description, url, location, salary_range, keywords, created_at, updated_at`
- `fetch_job_listing()` saves markdown files and only updates index with: `id, title, company, location, file, created_at, description`
- `tailor_from_url.py` uses `fetch_job_listing()` instead of `JobListing` model

**Impact**: Job data is stored in two different ways; can't use JobListing model consistently

**Fix**: Use `JobListing` model in `tailor_from_url.py` instead of `fetch_job_listing()`

---

### 5. Missing URL Storage in Job Listings
**Problem**: Job listing index doesn't include URL
- Resume index has `job_listing_id` to link back to job
- Job listing index missing `url` field (only in full JSON, not in index)
- Can't reconstruct the original job URL from index alone

**Impact**: Can't trace back to original job posting

**Fix**: Add `url` field to job listing index entries

---

### 6. Inconsistent ID Generation
**Problem**: Mixed ID formats across job listings
- Resume IDs: Always UUID (e.g., `136c188e-659d-49cf-ba0f-983c279e80e7`)
- Job listing IDs: Mixed:
  - Manual IDs: `credibly-se-iii-20251011`
  - UUIDs: `644870ea-db70-49b3-9b1f-a1f4887c3b70`

**Impact**: Inconsistent ID format makes it harder to process programmatically

**Fix**: Always use UUID for consistency

---

### 7. Missing Bidirectional Linking
**Problem**: Linking is one-directional only
- Resume → Job: Resume has `job_listing_id` ✓
- Job → Resumes: Job listing has no field to track which resumes were tailored from it ✗
- Can't query "which resumes were tailored for this job?"

**Impact**: Can't find all tailored resumes for a given job

**Fix**: Add `tailored_resume_ids` array to job listing metadata

---

### 8. Keyword Extraction Not Stored
**Problem**: Keywords are extracted but not persisted
- `JobListing` model supports `keywords` field
- `fetch_job_listing()` doesn't extract or store keywords
- `tailor_from_url.py` extracts keywords but doesn't save them to job listing
- Keywords are lost after tailoring; can't reuse them

**Impact**: Can't reuse keywords for future tailoring or analysis

**Fix**: Extract keywords in `tailor_from_url.py` and update job listing with them

---

## Recommended Fix Strategy

### Phase 1: Use JobListing Model Consistently
1. Update `tailor_from_url.py` to use `JobListing` model instead of `fetch_job_listing()`
2. Extract job title, company, location from fetched content
3. Create job listing entry with full metadata (including URL and keywords)
4. Return job listing ID for linking

### Phase 2: Implement Bidirectional Linking
1. Update Resume model to support `job_listing_id` (already exists)
2. Update JobListing model to support `tailored_resume_ids` array
3. When creating tailored resume, update both:
   - Resume: set `job_listing_id`
   - Job listing: append resume ID to `tailored_resume_ids`

### Phase 3: Standardize Data Formats
1. Standardize all timestamps to ISO 8601 with `Z` suffix
2. Ensure all job listing index entries have consistent fields
3. Always use UUID for job listing IDs
4. Add URL field to job listing index

### Phase 4: Extract and Store Keywords
1. Extract keywords from job description in `tailor_from_url.py`
2. Update job listing with extracted keywords
3. Make keywords queryable for future use

---

## Testing

After fixes, verify:
1. Tailored resume has correct `job_listing_id`
2. Job listing has tailored resume ID in `tailored_resume_ids`
3. All timestamps are consistent format
4. All job listing index entries have required fields
5. Keywords are stored and retrievable
6. Web UI shows proper linking between resumes and jobs

---

## Related Issues
- #6 (Multi-Resume Support)
- #12 (Agent Integration)
- Recent commit: 7422648 (URL-to-tailored-resume pipeline)

## Acceptance Criteria
- [ ] Resume-job linking works bidirectionally
- [ ] All timestamps standardized to ISO 8601 with Z suffix
- [ ] Job listing index has consistent fields
- [ ] Keywords extracted and stored with job listings
- [ ] Web UI correctly displays resume-job relationships
- [ ] All tests pass
- [ ] Documentation updated

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Resume-Job Linking & Data Consistency Issues #57

Overview

Current State

Issues Identified

1. Resume-Job Linking (CRITICAL)

2. Timestamp Format Inconsistency

3. Missing Fields in Job Listing Index

4. Job Listing Model vs fetch_job_listing.py Mismatch

5. Missing URL Storage in Job Listings

6. Inconsistent ID Generation

7. Missing Bidirectional Linking

8. Keyword Extraction Not Stored

Recommended Fix Strategy

Phase 1: Use JobListing Model Consistently

Phase 2: Implement Bidirectional Linking

Phase 3: Standardize Data Formats

Phase 4: Extract and Store Keywords

Testing

Related Issues

Acceptance Criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Fix Resume-Job Linking & Data Consistency Issues #57

Description

Overview

Current State

Issues Identified

1. Resume-Job Linking (CRITICAL)

2. Timestamp Format Inconsistency

3. Missing Fields in Job Listing Index

4. Job Listing Model vs fetch_job_listing.py Mismatch

5. Missing URL Storage in Job Listings

6. Inconsistent ID Generation

7. Missing Bidirectional Linking

8. Keyword Extraction Not Stored

Recommended Fix Strategy

Phase 1: Use JobListing Model Consistently

Phase 2: Implement Bidirectional Linking

Phase 3: Standardize Data Formats

Phase 4: Extract and Store Keywords

Testing

Related Issues

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions