Refresh runbook, validation checks, and failure handling.
Expected total analyst time: ~20 minutes (pipeline ~3 min + manual review steps).
- Python 3.11+ is active (
python --version) - Virtual environment is activated (
.venv/bin/activate) - Dependencies are installed (
pip install -r requirements.txt) - DSM Plan PDFs are present in
data/raw/dsm_plans/ -
data/raw/nspower_rates_manual.csvexists and is current
python pipeline.py --refreshThis runs extract → transform → validate in sequence. Watch for [FAIL] lines in log output.
After --refresh, check log output for lines containing unmatched program names. Any __UNKNOWN__ program_ids must be resolved by:
- Adding a new row to
sql/program_mapping.csvwith the correctprogram_id,canonical_name, andraw_name_variants - Re-running
python pipeline.py --transform
Open data/interim/targets_reconciled.json and filter for "is_manually_entered": true. Verify the hand-entered values match the source PDF. Document any changes in docs/etl_design.md under "Manual overrides".
python pipeline.py --validateAll five checks must pass. See Validation checks below.
Open dashboard/dashboard.pbix in Power BI Desktop and click Refresh. Spot-check:
- Executive Summary totals match current Annual Report headline
- Plan vs. Actual variance for 2024 electric programs shows +10.4% GJ beat
- Equity page low-income share is ~11.2% of cumulative lifetime savings
Run with python pipeline.py --validate. Failure prints the offending rows and exits non-zero (suitable for CI).
| # | Check | Threshold | Current result |
|---|---|---|---|
| 1 | Total actual_gj per year reconciles to public headline |
±2% | Pass. Max deviation 0.7% (2021) |
| 2 | No nulls in measured columns of fact tables (excluding is_manually_entered = 1 rows) |
zero | Pass |
| 3 | Every fact row has a valid program_id in dim_program |
100% | Pass |
| 4 | Row counts per source per year are non-decreasing on refresh | strict | Pass on most recent refresh |
| 5 | Every active program has valid_to = NULL in program_mapping.csv |
strict | Pass |
- Identify the year with deviation > 2%.
- Re-read the corresponding Annual Report and compare total GJ at the program-rollup level.
- Common causes: unit conversion error (GWh vs GJ), restated figure not captured, new program not in mapping.
- Fix in source data or
program_mapping.csvand re-run--transform --validate.
- Run
SELECT * FROM fact_actuals WHERE actual_gj IS NULL AND is_manually_entered = 0in SQLite. - If the null is genuinely unreported, set
is_manually_entered = 1and populate with NULL explicitly. - If it's a parse failure, fix in the extractor and re-run
--refresh.
- The reconciler will have already logged the unmatched names.
- Add entries to
sql/program_mapping.csvand re-run--transform.
- Inspect which source/year lost rows.
- Check if the raw JSON file was accidentally overwritten or truncated.
- Restore from git (
git checkout data/raw/...) and re-run--extract.
- Open
sql/program_mapping.csvand find the row. - If the program is still active, clear
valid_to. - If the program genuinely ended, set
is_active = 0and ensurefact_actualshas no rows for years aftervalid_to.