This repository contains the data processing and visualization workflows for a structural PM2.5 study using PLS-SEM. The model captures multi-day interactions among co-pollutants, meteorological factors, land-use characteristics, built-environment features, and transportation-related drivers. The study provides insights into the effect sizes of different variables, their direct and indirect effects, as well as their diffusion and decay patterns over time.
The study combines observed PM2.5 concentrations from three years of data in southern Arizona (January 2022 to December 2024), collected from Environmental Protection Agency (EPA) sensors, with a wide range of additional datasets.
Data sources include:
-
Meteostat Daily meteorological conditions aggregated using the Meteostat Python API.
-
MODIS Satellite-derived observations of aerosol loading and vegetation indices.
-
Sentinel-5P Satellite-based atmospheric trace gas measurements relevant to oxidation chemistry.
-
NASA MERRA-2 Reanalysis-based atmospheric variables related to vertical mixing, pollutant dispersion, and broader meteorological conditions.
-
FLDAS Land-surface and environmental variables relevant to dispersion and surface conditions.
-
OpenStreetMap (OSM) Transportation and built-environment indicators, along with geospatial measures of proximity to major infrastructure and industrial areas.
-
American Transportation Research Institute (ATRI) via Sun Cloud Freight activity data used to estimate freight intensity near monitoring locations through distance-based road network analysis.
Data/
│
├── Input_Files/
│ ├── Elevations.csv
│ └── Points.json
│
├── Output_Files/
│ ├── 8 Days Data Emission- new.xlsx
│ └── daily_sensor_info_2022_2024.csv
│
├── SmartPLS_Outputs/
│ ├── coefficients.xlsx
│ ├── Indirect_effects.xlsx
│ ├── Latent_Variables_Scores.xlsx
│ ├── Model_data_descriptives.xlsx
│ ├── Static_standardized_on_lag_0_pm.xlsx
│ └── Total_effects.xlsx
│
├── SmartPLS_Processed/
│ ├── coefficients_with_significance_levels.xlsx
│ └── Processed_data_standardized_on_lag_0_pm.xlsx
│
Figures/
│ ├── Figure_1.png
│ ├── Figure_2.png
│ ├── Figure_3.png
│ └── ...
│
Emission_Study.ipynb
requirements.txt
README.md
.gitignore
This repository contains the following notebook:
Contains the partial data collection, processing of PLS-SEM outputs, and visualization workflows. It produces:
- Weather and elevation data for EPA monitoring stations during the study period
- Processed PLS-SEM outputs, including significance tables and total, direct, and indirect effect sizes
- Visualizations of total, direct, and indirect effects
- Visualizations of effect decay patterns for PM2.5 on the final day
- Visualizations of effect decay rates for PM2.5 on the final day
To use this project, you need to have Python installed on your system. You can download Python from the official Python website.
Once you have Python installed, you can install the required dependencies using pip. Run the following command in your terminal or command prompt:
pip install -r requirements.txtRun Emission_Study.ipynb to process data and generate outputs.
This figure presents the total standardized effects of variables on final-day PM2.5 concentrations, incorporating both direct and indirect contributions.
This figure decomposes the total effects of variables into their direct and indirect components, illustrating how they influence PM2.5 concentrations through multiple, and in some cases opposing, pathways.
This figure summarizes diffusion length by showing how standardized effects of predictors attenuate with lag, compared to lag 0, and indicating the lags at which effects drop below diffusion thresholds (2.5%, 5%, and 10%).
For questions or feedback, please email shamshiripour@arizona.edu or danialchekani@arizona.edu


