Los Ríos Region, Chile (2021–2025)
This project analyzes emergency department visits due to respiratory causes in the Los Ríos Region, Chile, covering the period from 2021 to 2025.
The core objective was to design a reproducible data engineering workflow capable of consolidating fragmented public datasets from DEIS (Departamento de Estadísticas e Información de Salud) into a consistent longitudinal dataset. This involved a deep dive into data auditing to recover information that appeared "lost" due to changes in government reporting standards during the pandemic.
- The Problem: Pandemic-era datasets (2021-2022) lacked regional fields and used a different ID format for hospitals (e.g.,
22-100instead of the modern22100). - The Solution: I implemented a cleaning pipeline using string manipulation to standardize IDs and built a cross-reference dictionary from 2023-2025 data to "inject" missing geographic metadata into the historical series.
- The Problem: Initial filters failed to capture the 2021-2022 peak because the taxonomy changed from text descriptions to ICD-10 codes (U07.1/U07.2).
- The Solution: I conducted a Semantic Audit of the raw CSVs, identifying these shifts. I then refactored the ETL pipeline using Regular Expressions (Regex) to capture all respiratory variations, effectively restoring the pandemic's true epidemiological curve.
| Analysis Type | Key Insight |
|---|---|
| Temporal Evolution | Identified a massive surge in 2022 (Omicron wave) with >3,200 monthly visits, previously hidden by data inconsistencies. |
| Demographics | Applied a Welch’s T-test confirming that the shift in emergency demand toward the 15-64 age group during the pandemic was statistically significant (p < 0.001). |
| Heatmap Logistics | Modeled the "Endemic Baseline" (2023-2025) to identify that High-Resolution Emergency Services (SAR) like Barrios Bajos often absorb higher respiratory loads than the regional base hospital. |
- Language: Python 3.13
- Libraries: Pandas (ETL), Seaborn/Matplotlib (Visualization), SciPy (Inferential Statistics)
- Environment: VS Code on macOS (venv)
Este proyecto analiza las atenciones de urgencia por causas respiratorias en la Región de Los Ríos entre 2021 y 2025. Se enfocó en construir un pipeline de ingeniería de datos capaz de unificar fuentes públicas del DEIS que presentaban fragmentación y cambios estructurales críticos.
- Reconciliación de IDs: Se estandarizaron los identificadores hospitalarios mediante limpieza de strings (remoción de guiones) para permitir el cruce de datos históricos.
- Auditoría Semántica: Se detectó el uso de códigos CIE-10 (U07) para COVID-19 en 2021, lo que permitió recuperar miles de registros que inicialmente no eran capturados por los filtros tradicionales.
- Rigor Estadístico: Se validó la significancia de los cambios demográficos mediante pruebas de hipótesis (p-value < 0.001).
# Clone the repository
git clone <https://github.com/CANOLIO/LOSRIOS_URGENCY2025>
cd <repository_folder>
# Setup environment
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install pandas seaborn matplotlib scipy
# Run the ETL pipeline
python BASE.py

