You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue addresses a critical pattern of data quality and consistency issues that have emerged as we add new features. Each new feature (resume tailoring, job linking, RAG integration, etc.) introduces subtle inconsistencies that compound over time:
❌ Resume files saved to wrong directories (data/resumes/resumes/ instead of data/resumes/)
❌ Index files not updated with new entries
❌ Bidirectional linking broken due to path issues
❌ Data quality issues (skills bunched into single strings instead of arrays)
❌ Timestamp format inconsistencies (ISO 8601 with/without Z suffix)
❌ Missing fields in index entries
❌ Inconsistent ID generation (UUIDs vs manual IDs)
❌ No validation of generated resume JSON structure
Root Cause: No comprehensive validation framework or feature tracking system to catch regressions as we evolve the codebase.
📊 Problem Analysis
Pattern of Issues
Every time we add a new feature, we introduce data inconsistencies:
Problem: Tests verify functionality but not data consistency
Example: Test passes:
deftest_tailor_from_job_description():
response=api.post('/api/tailor-from-job-description', {...})
assertresponse.status_code==201# ✅ Passes# But doesn't verify:# - Resume file exists in correct location# - Resume is in index# - Job listing is in index# - Bidirectional linking works# - Data quality is good
Impact: Regressions aren't caught until production
4. Inconsistent Model Instantiation
Problem: Models are instantiated inconsistently across codebase
Example: In src/api/app.py line 1222:
# ❌ WRONG - passes data_dir/resumes instead of data_dirresume_model=Resume(DATA_DIR/"resumes")
# Creates: data/resumes/resumes/# ✅ CORRECT - pass data_dir, let model handle subdirectoryresume_model=Resume(DATA_DIR)
# Creates: data/resumes/
Impact: Files saved to wrong locations, indexes not updated
5. No Guardrails on Core Models
Problem: Changes to Resume/JobListing models aren't validated against existing code
Example: If we change Resume model's create() signature, we need to update:
src/api/app.py (multiple places)
src/tailor.py
src/duplicate_resume.py
All CRUD scripts
All tests
But there's no automated check to catch these
Impact: Silent failures when models change
🛠️ Proposed Solution
Phase 1: Feature Registry & Documentation
Create features.json - Single source of truth for all features
{
"features": {
"multi_resume_support": {
"status": "stable",
"version": "1.0",
"data_models": ["Resume", "JobListing"],
"requirements": {
"Resume": {
"required_fields": ["id", "name", "created_at", "updated_at", "job_listing_id", "is_master", "description"],
"index_fields": ["id", "name", "created_at", "updated_at", "job_listing_id", "is_master", "description"],
"timestamp_format": "ISO 8601 with Z suffix",
"id_format": "UUID"
},
"JobListing": {
"required_fields": ["id", "title", "company", "description", "url", "location", "keywords", "tailored_resume_ids", "created_at", "updated_at"],
"index_fields": ["id", "title", "company", "location", "url", "description", "created_at", "updated_at"],
"timestamp_format": "ISO 8601 with Z suffix",
"id_format": "UUID"
}
},
"tests": ["test_multi_resume.py", "test_multi_resume_api.py"],
"related_issues": ["#6", "#57"]
},
"resume_job_linking": {
"status": "stable",
"version": "1.0",
"data_models": ["Resume", "JobListing"],
"requirements": {
"Resume": {
"bidirectional_linking": "Resume.job_listing_id → JobListing.id"
},
"JobListing": {
"bidirectional_linking": "JobListing.tailored_resume_ids[] ← Resume.id"
}
},
"tests": ["test_multi_resume_api.py"],
"related_issues": ["#57"]
},
"resume_tailoring_from_job_description": {
"status": "stable",
"version": "1.0",
"data_models": ["Resume", "JobListing"],
"requirements": {
"Resume": {
"technical_proficiencies": "Must be object with string values (comma-separated skills)"
},
"JobListing": {
"keywords": "Must be array of strings"
}
},
"tests": ["test_multi_resume_api.py"],
"related_issues": ["#57"]
}
},
"data_models": {
"Resume": {
"file_location": "data/resumes/{id}.json",
"index_location": "data/resumes/index.json",
"fields": {
"id": {"type": "string (UUID)", "required": true},
"name": {"type": "string", "required": true, "unique": true},
"created_at": {"type": "string (ISO 8601 with Z)", "required": true},
"updated_at": {"type": "string (ISO 8601 with Z)", "required": true},
"job_listing_id": {"type": "string (UUID) or null", "required": false},
"is_master": {"type": "boolean", "required": true},
"description": {"type": "string", "required": false}
}
},
"JobListing": {
"file_location": "data/job_listings/{id}.json",
"index_location": "data/job_listings/index.json",
"fields": {
"id": {"type": "string (UUID)", "required": true},
"title": {"type": "string", "required": true},
"company": {"type": "string", "required": true},
"description": {"type": "string", "required": true},
"url": {"type": "string (URL)", "required": false},
"location": {"type": "string", "required": false},
"keywords": {"type": "array of strings", "required": false},
"tailored_resume_ids": {"type": "array of UUIDs", "required": true},
"created_at": {"type": "string (ISO 8601 with Z)", "required": true},
"updated_at": {"type": "string (ISO 8601 with Z)", "required": true}
}
}
},
"validation_rules": {
"timestamp_format": "All timestamps must be ISO 8601 with Z suffix (e.g., 2025-10-26T12:36:01.645244Z)",
"id_format": "All IDs must be UUIDs (e.g., 136c188e-659d-49cf-ba0f-983c279e80e7)",
"index_consistency": "Every file in data/resumes/ must have entry in index.json",
"bidirectional_linking": "If Resume.job_listing_id is set, JobListing.tailored_resume_ids must contain Resume.id",
"unique_names": "Resume names must be unique across all resumes"
}
}
Phase 2: Resume JSON Validator
Create src/validators/resume_validator.py
classResumeValidator:
"""Validates resume JSON structure and data quality."""defvalidate(self, resume_data: Dict) ->Tuple[bool, List[str]]:
"""Validate resume data against schema and rules."""errors= []
# Check required fields# Check field types# Check technical_proficiencies structure# Check experience bullets format# Check timestamp formats# Check data qualityreturnlen(errors) ==0, errors
Phase 3: Data Consistency Tests
Create tests/test_data_consistency.py
classTestDataConsistency:
"""Tests for data consistency across the system."""deftest_resume_file_location(self):
"""Verify resume files are saved to correct location."""# Create resume# Verify file exists at data/resumes/{id}.json# Verify file does NOT exist at data/resumes/resumes/{id}.jsondeftest_resume_index_updated(self):
"""Verify resume index is updated when resume is created."""# Create resume# Verify entry exists in data/resumes/index.jsondeftest_bidirectional_linking(self):
"""Verify resume-job linking is bidirectional."""# Create resume with job_listing_id# Verify Resume.job_listing_id is set# Verify JobListing.tailored_resume_ids contains Resume.iddeftest_timestamp_consistency(self):
"""Verify all timestamps use ISO 8601 with Z suffix."""# Check all resume timestamps# Check all job listing timestampsdeftest_technical_proficiencies_format(self):
"""Verify technical_proficiencies are properly formatted."""# Check that skills are arrays or comma-separated strings# NOT single concatenated strings
classModelInstantiationValidator:
"""Validates correct model instantiation patterns."""@staticmethoddefvalidate_resume_instantiation(data_dir: Path) ->bool:
"""Verify Resume model is instantiated correctly."""# Check that data_dir is passed, not data_dir/resumes# Verify resumes_dir is created correctly# Verify index file is in correct location@staticmethoddefvalidate_job_listing_instantiation(data_dir: Path) ->bool:
"""Verify JobListing model is instantiated correctly."""# Similar checks for JobListing
Phase 5: Pre-Commit Hooks
Create .git/hooks/pre-commit
#!/bin/bash# Run data consistency tests before commit
python -m pytest tests/test_data_consistency.py -v
if [ $?-ne 0 ];thenecho"❌ Data consistency tests failed. Commit aborted."exit 1
fi# Validate features.json
python -c "import json; json.load(open('features.json'))"if [ $?-ne 0 ];thenecho"❌ features.json is invalid JSON. Commit aborted."exit 1
fi
🎯 Overview
This issue addresses a critical pattern of data quality and consistency issues that have emerged as we add new features. Each new feature (resume tailoring, job linking, RAG integration, etc.) introduces subtle inconsistencies that compound over time:
data/resumes/resumes/instead ofdata/resumes/)Root Cause: No comprehensive validation framework or feature tracking system to catch regressions as we evolve the codebase.
📊 Problem Analysis
Pattern of Issues
Every time we add a new feature, we introduce data inconsistencies:
Why This Happens
features.jsontracking what features exist and their requirementsImpact
🔍 Root Causes Identified
1. No Feature Registry (
features.json)Problem: No single source of truth for what features exist and their data requirements
Example: When we added resume-job linking, we didn't document:
job_listing_idfieldtailored_resume_idsarrayImpact: New features don't know what constraints to follow
2. No Resume JSON Validator
Problem: Generated resume JSON isn't validated against a schema
Example: Skills are stored as:
{ "technical_proficiencies": { "skills": ".NET, AES-256, AI, API Gateway, ..." // ❌ Single string! } }Should be:
{ "technical_proficiencies": { "skills": [".NET", "AES-256", "AI", "API Gateway"] // ✅ Array } }Impact: Data quality issues go undetected
3. No Integration Tests for Data Consistency
Problem: Tests verify functionality but not data consistency
Example: Test passes:
Impact: Regressions aren't caught until production
4. Inconsistent Model Instantiation
Problem: Models are instantiated inconsistently across codebase
Example: In
src/api/app.pyline 1222:Impact: Files saved to wrong locations, indexes not updated
5. No Guardrails on Core Models
Problem: Changes to Resume/JobListing models aren't validated against existing code
Example: If we change Resume model's
create()signature, we need to update:src/api/app.py(multiple places)src/tailor.pysrc/duplicate_resume.pyBut there's no automated check to catch these
Impact: Silent failures when models change
🛠️ Proposed Solution
Phase 1: Feature Registry & Documentation
Create
features.json- Single source of truth for all features{ "features": { "multi_resume_support": { "status": "stable", "version": "1.0", "data_models": ["Resume", "JobListing"], "requirements": { "Resume": { "required_fields": ["id", "name", "created_at", "updated_at", "job_listing_id", "is_master", "description"], "index_fields": ["id", "name", "created_at", "updated_at", "job_listing_id", "is_master", "description"], "timestamp_format": "ISO 8601 with Z suffix", "id_format": "UUID" }, "JobListing": { "required_fields": ["id", "title", "company", "description", "url", "location", "keywords", "tailored_resume_ids", "created_at", "updated_at"], "index_fields": ["id", "title", "company", "location", "url", "description", "created_at", "updated_at"], "timestamp_format": "ISO 8601 with Z suffix", "id_format": "UUID" } }, "tests": ["test_multi_resume.py", "test_multi_resume_api.py"], "related_issues": ["#6", "#57"] }, "resume_job_linking": { "status": "stable", "version": "1.0", "data_models": ["Resume", "JobListing"], "requirements": { "Resume": { "bidirectional_linking": "Resume.job_listing_id → JobListing.id" }, "JobListing": { "bidirectional_linking": "JobListing.tailored_resume_ids[] ← Resume.id" } }, "tests": ["test_multi_resume_api.py"], "related_issues": ["#57"] }, "resume_tailoring_from_job_description": { "status": "stable", "version": "1.0", "data_models": ["Resume", "JobListing"], "requirements": { "Resume": { "technical_proficiencies": "Must be object with string values (comma-separated skills)" }, "JobListing": { "keywords": "Must be array of strings" } }, "tests": ["test_multi_resume_api.py"], "related_issues": ["#57"] } }, "data_models": { "Resume": { "file_location": "data/resumes/{id}.json", "index_location": "data/resumes/index.json", "fields": { "id": {"type": "string (UUID)", "required": true}, "name": {"type": "string", "required": true, "unique": true}, "created_at": {"type": "string (ISO 8601 with Z)", "required": true}, "updated_at": {"type": "string (ISO 8601 with Z)", "required": true}, "job_listing_id": {"type": "string (UUID) or null", "required": false}, "is_master": {"type": "boolean", "required": true}, "description": {"type": "string", "required": false} } }, "JobListing": { "file_location": "data/job_listings/{id}.json", "index_location": "data/job_listings/index.json", "fields": { "id": {"type": "string (UUID)", "required": true}, "title": {"type": "string", "required": true}, "company": {"type": "string", "required": true}, "description": {"type": "string", "required": true}, "url": {"type": "string (URL)", "required": false}, "location": {"type": "string", "required": false}, "keywords": {"type": "array of strings", "required": false}, "tailored_resume_ids": {"type": "array of UUIDs", "required": true}, "created_at": {"type": "string (ISO 8601 with Z)", "required": true}, "updated_at": {"type": "string (ISO 8601 with Z)", "required": true} } } }, "validation_rules": { "timestamp_format": "All timestamps must be ISO 8601 with Z suffix (e.g., 2025-10-26T12:36:01.645244Z)", "id_format": "All IDs must be UUIDs (e.g., 136c188e-659d-49cf-ba0f-983c279e80e7)", "index_consistency": "Every file in data/resumes/ must have entry in index.json", "bidirectional_linking": "If Resume.job_listing_id is set, JobListing.tailored_resume_ids must contain Resume.id", "unique_names": "Resume names must be unique across all resumes" } }Phase 2: Resume JSON Validator
Create
src/validators/resume_validator.pyPhase 3: Data Consistency Tests
Create
tests/test_data_consistency.pyPhase 4: Model Instantiation Guardrails
Create
src/validators/model_instantiation_validator.pyPhase 5: Pre-Commit Hooks
Create
.git/hooks/pre-commit✅ Deliverables
Phase 1: Feature Registry
features.jsonwith all features documenteddocs/FEATURES.mdexplaining the registryPhase 2: Resume Validator
src/validators/resume_validator.pyPhase 3: Data Consistency Tests
tests/test_data_consistency.pyPhase 4: Model Guardrails
src/validators/model_instantiation_validator.pyPhase 5: Pre-Commit Hooks
.git/hooks/pre-commit🧪 Testing Strategy
Unit Tests
Integration Tests
Regression Tests
📋 Acceptance Criteria
features.jsoncreated and documents all features🔗 Related Issues
📝 Notes
features.jsonwith requirements🎯 Success Metrics
features.json