Skip to content

CANOLIO/LOSRIOS_URGENCY2025

Repository files navigation

LOSRIOS_URGENCY2025

📊 Emergency Respiratory Visits Analysis: A Data Engineering Challenge

Los Ríos Region, Chile (2021–2025)


🇬🇧 English Version

1. Project Overview

This project analyzes emergency department visits due to respiratory causes in the Los Ríos Region, Chile, covering the period from 2021 to 2025.

The core objective was to design a reproducible data engineering workflow capable of consolidating fragmented public datasets from DEIS (Departamento de Estadísticas e Información de Salud) into a consistent longitudinal dataset. This involved a deep dive into data auditing to recover information that appeared "lost" due to changes in government reporting standards during the pandemic.

2. Data Engineering & Investigative Challenges

2.1 The "ID Hyphen" & Geographic Reconciliation

  • The Problem: Pandemic-era datasets (2021-2022) lacked regional fields and used a different ID format for hospitals (e.g., 22-100 instead of the modern 22100).
  • The Solution: I implemented a cleaning pipeline using string manipulation to standardize IDs and built a cross-reference dictionary from 2023-2025 data to "inject" missing geographic metadata into the historical series.

2.2 Semantic Shift & The "U07" Discovery

  • The Problem: Initial filters failed to capture the 2021-2022 peak because the taxonomy changed from text descriptions to ICD-10 codes (U07.1/U07.2).
  • The Solution: I conducted a Semantic Audit of the raw CSVs, identifying these shifts. I then refactored the ETL pipeline using Regular Expressions (Regex) to capture all respiratory variations, effectively restoring the pandemic's true epidemiological curve.

3. Key Findings & Statistical Rigor

Analysis Type Key Insight
Temporal Evolution Identified a massive surge in 2022 (Omicron wave) with >3,200 monthly visits, previously hidden by data inconsistencies.
Demographics Applied a Welch’s T-test confirming that the shift in emergency demand toward the 15-64 age group during the pandemic was statistically significant (p < 0.001).
Heatmap Logistics Modeled the "Endemic Baseline" (2023-2025) to identify that High-Resolution Emergency Services (SAR) like Barrios Bajos often absorb higher respiratory loads than the regional base hospital.

4. Technology Stack

  • Language: Python 3.13
  • Libraries: Pandas (ETL), Seaborn/Matplotlib (Visualization), SciPy (Inferential Statistics)
  • Environment: VS Code on macOS (venv)

🇪🇸 Versión en Español

1. Resumen del Proyecto

Este proyecto analiza las atenciones de urgencia por causas respiratorias en la Región de Los Ríos entre 2021 y 2025. Se enfocó en construir un pipeline de ingeniería de datos capaz de unificar fuentes públicas del DEIS que presentaban fragmentación y cambios estructurales críticos.

2. Desafíos de Ingeniería y Auditoría

  • Reconciliación de IDs: Se estandarizaron los identificadores hospitalarios mediante limpieza de strings (remoción de guiones) para permitir el cruce de datos históricos.
  • Auditoría Semántica: Se detectó el uso de códigos CIE-10 (U07) para COVID-19 en 2021, lo que permitió recuperar miles de registros que inicialmente no eran capturados por los filtros tradicionales.
  • Rigor Estadístico: Se validó la significancia de los cambios demográficos mediante pruebas de hipótesis (p-value < 0.001).

3. Resultados Visuales


📬 Contact / Contacto


5. Reproducibility

# Clone the repository
git clone <https://github.com/CANOLIO/LOSRIOS_URGENCY2025>
cd <repository_folder>

# Setup environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install pandas seaborn matplotlib scipy

# Run the ETL pipeline
python BASE.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages