Problem Summary
The inference example in examples/PrithviWxC_inference.py fails with AssertionError: There doesn't seem to be any valid data. even after all HuggingFace downloads complete successfully.
Environment
- Python 3.12.0
- Fresh installation following repository instructions
- All dependencies installed via
pip install '.[examples]'
Steps to Reproduce
- Clone repository and install dependencies
- Run
examples/PrithviWxC_inference.py without modifications
- All download steps complete successfully:
Fetching 1 files: 100%|████████| 1/1 [00:02<00:00, 2.43s/it]
Fetching 1 files: 100%|████████| 1/1 [00:02<00:00, 2.88s/it]
# ... all downloads succeed
- Dataset creation fails:
assert len(dataset) > 0, "There doesn't seem to be any valid data."
AssertionError: There doesn't seem to be any valid data.
Root Cause
The example uses 18-hour lead times but only downloads 1 day of data (Jan 1st). This creates an incomplete data dependency:
- Required for 18h forecast: Input data (Jan 1st) + Target data (Jan 2nd) + Day-2 climatology
- Actually downloaded: Input data (Jan 1st) + Day-1 climatology only
- Result:
Merra2Dataset correctly rejects all samples due to missing dependencies
Missing Downloads
The example downloads:
allow_patterns="merra-2/MERRA2_sfc_2020010[1].nc" # Only Jan 1st
allow_patterns="climatology/climate_*_doy001_*.nc" # Only day-1 climatology
But needs:
allow_patterns="merra-2/MERRA2_sfc_2020010[12].nc" # Jan 1st AND 2nd
allow_patterns="climatology/climate_*_doy00[12]_*.nc" # Day-1 AND day-2 climatology
Quick Fix
Add these downloads to the example:
# Add Jan 2nd data
snapshot_download(
repo_id="ibm-nasa-geospatial/Prithvi-WxC-1.0-2300M",
allow_patterns="merra-2/MERRA2_sfc_20200102.nc",
local_dir="../data",
)
snapshot_download(
repo_id="ibm-nasa-geospatial/Prithvi-WxC-1.0-2300M",
allow_patterns="merra-2/MERRA_pres_20200102.nc",
local_dir="../data",
)
# Add day-2 climatology
snapshot_download(
repo_id="ibm-nasa-geospatial/Prithvi-WxC-1.0-2300M",
allow_patterns="climatology/climate_*_doy002_*.nc",
local_dir="../data",
)
Alternative Fix
Use shorter lead times that stay within single day:
lead_times = [6] # Instead of [18]
time_range = ("2020-01-01T06:00:00", "2020-01-01T21:00:00") # Single day
Suggested Improvements
-
Fix the example - Add the missing downloads so it works out-of-the-box
-
Better error messaging - Instead of silent failure, the Merra2Dataset could report:
AssertionError: No valid samples found. Missing data for forecast targets:
- Need data files: MERRA2_sfc_20200102.nc, MERRA_pres_20200102.nc
- Need climatology: climate_*_doy002_*.nc
-
Documentation - Clarify the data dependency relationship:
"For N-hour forecasts, you need N/24 + 1 days of data plus matching climatology"
Impact
This affects anyone trying the official examples for the first time. The issue is confusing because:
- All downloads appear to succeed ✅
- No clear error about what's missing ❌
- Requires deep understanding of the forecasting logic to diagnose ❌
Verification
After applying the quick fix above, the example runs successfully:
Dataset length: 2
🎉 SUCCESS! Full pipeline working
Thanks for this excellent model! This issue just needs a small documentation/example fix to improve the new user experience.
Problem Summary
The inference example in
examples/PrithviWxC_inference.pyfails withAssertionError: There doesn't seem to be any valid data.even after all HuggingFace downloads complete successfully.Environment
pip install '.[examples]'Steps to Reproduce
examples/PrithviWxC_inference.pywithout modificationsRoot Cause
The example uses 18-hour lead times but only downloads 1 day of data (Jan 1st). This creates an incomplete data dependency:
Merra2Datasetcorrectly rejects all samples due to missing dependenciesMissing Downloads
The example downloads:
But needs:
Quick Fix
Add these downloads to the example:
Alternative Fix
Use shorter lead times that stay within single day:
Suggested Improvements
Fix the example - Add the missing downloads so it works out-of-the-box
Better error messaging - Instead of silent failure, the
Merra2Datasetcould report:Documentation - Clarify the data dependency relationship:
Impact
This affects anyone trying the official examples for the first time. The issue is confusing because:
Verification
After applying the quick fix above, the example runs successfully:
Thanks for this excellent model! This issue just needs a small documentation/example fix to improve the new user experience.