Skip to content

Commit f8b89e4

Browse files
pmaydclaude
andcommitted
Fix age calculation to match R pipeline behavior
Add automatic age correction from date of birth (DOB) to match R pipeline's fix_age() function. This ensures data quality by always calculating age from DOB rather than trusting potentially incorrect Excel values. Changes: - Add _fix_age_from_dob() function in clean/patient.py (step 5.5) - Calculate age: tracker_year - birth_year - (1 if tracker_month < birth_month else 0) - Log warnings and track errors via ErrorCollector for all age corrections - Handle missing ages, mismatched ages, and negative ages (set to error value) Validation: - Tested with 2025_06_CDA tracker: 35 age errors properly corrected and tracked - Results now match R output (e.g., patient KH_CD016: 18 years, not 21) - Improvement over R: structured error tracking instead of logging only Also adds: - compare_r_vs_python.py: Comprehensive comparison tool for validation - fastexcel dependency: Required for Excel reading in comparison scripts Fixes critical data quality issue where incorrect ages from Excel were propagated to final datasets. Now matches R pipeline behavior while providing better error tracking and documentation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent 27f4baf commit f8b89e4

4 files changed

Lines changed: 567 additions & 0 deletions

File tree

a4d-python/pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ dependencies = [
2424
"rich>=13.7.0",
2525
"tqdm>=4.66.0",
2626
"python-dateutil>=2.8.0",
27+
"fastexcel>=0.16.0",
2728
]
2829

2930

0 commit comments

Comments
 (0)