Skip to content

Forcings

Manjaree Binjolkar edited this page Sep 5, 2025 · 6 revisions

Meteorological forcing data (e.g., precipitation, temperature) are supplied via NetCDF files and mapped to stream network locations using a lookup table. The forcing inputs are specified in the configuration file (config.yaml) under the forcings section. These inputs drive the model’s processes such as snow accumulation/melt, evapotranspiration, and runoff generation.

The model requires forcing data to be stored in NetCDF format as 3D variables with dimensions [time, lat, lon], while a separate CSV file links each stream to its corresponding grid cell using latitude and longitude indices.


Forcing Inputs

The model expects the following forcing variables:

  • pr (Precipitation)

    • Source File: pr_hourly_AORC_{year}.nc
    • Variable Name in NetCDF: pr
    • Resolution: Hourly (1h)
    • Role: Provides rainfall or snowfall depending on air temperature. Internally scaled from mm/hr to m/min.
  • t2m (Air Temperature)

    • Source File: t2m_daily_avg_AORC_{year}.nc
    • Variable Name in NetCDF: t2m
    • Resolution: Daily (24h)
    • Role: Controls snow accumulation and melt (compared against t_thres and 0°C).

Lookup Tables

Each forcing variable uses its own lookup file to map stream IDs to grid cells in the NetCDF datasets. These CSVs ensure that each stream link receives data from the correct AORC grid point.

  • Precipitation Mapping: Stony_Brook_pr_AORC_lookup.csv

  • Temperature Mapping: Stony_Brook_t2m_AORC_lookup.csv

CSV Columns:

  • stream — Stream ID
  • lat_index — Latitude grid index
  • lon_index — Longitude grid index

Creating Lookup Tables

To map each stream location to the nearest grid cell in the meteorological forcing data, use KDTree with the gridded latitude/longitude from the NetCDF and the hillslope centroid coordinates.

Ensure your forcing NetCDF files contain named dimensions like latitude and longitude, and your spatial parameters CSV has centroid_lat and centroid_lon.


Required Libraries

import os
import numpy as np
import pandas as pd
import xarray as xr
from scipy.spatial import cKDTree

Helper functions

def _find_var(ds, candidates):
    for name in candidates:
        if name in ds.coords:
            return ds[name]
        if name in ds.data_vars:
            return ds[name]
    return None

def get_latlon_mesh_raw(ds):
    lat = _find_var(ds, ("latitude","lat","y"))
    lon = _find_var(ds, ("longitude","lon","x"))
    if lat is None or lon is None:
        raise RuntimeError(f"Could not find lat/lon. Coords: {list(ds.coords)} Vars: {list(ds.data_vars)}")
    latv = np.asarray(lat.values)
    lonv = np.asarray(lon.values)
    if latv.ndim == 1 and lonv.ndim == 1:
        lon2d, lat2d = np.meshgrid(lonv, latv)
    elif latv.ndim == 2 and lonv.ndim == 2:
        lat2d, lon2d = latv, lonv
    else:
        raise RuntimeError(f"Unsupported lat/lon shapes: {latv.shape}, {lonv.shape}")
    return lat2d, lon2d

def build_lookup_like_reference(nc_path, params_csv, out_csv):
    if not os.path.isfile(nc_path):
        raise FileNotFoundError(f"Missing {nc_path}")
    if not os.path.isfile(params_csv):
        raise FileNotFoundError(f"Missing {params_csv}")

    ds = xr.open_dataset(nc_path, engine="netcdf4")
    lat2d, lon2d = get_latlon_mesh_raw(ds)

    centroids = pd.read_csv(params_csv)
    if not REQUIRED_COLS.issubset(centroids.columns):
        raise ValueError(f"{params_csv} must have columns {REQUIRED_COLS}. Got {centroids.columns.tolist()}")

    target_points = centroids[["centroid_lat", "centroid_lon"]].to_numpy()  # raw lon, like reference
    grid_points = np.column_stack((lat2d.ravel(), lon2d.ravel()))
    tree = cKDTree(grid_points)

    distances, flat_indexes = tree.query(target_points)
    lat_idx, lon_idx = np.unravel_index(flat_indexes, lat2d.shape)

    out = pd.DataFrame({
        "stream": centroids["stream"].to_numpy(),
        "lat_index": lat_idx,
        "lon_index": lon_idx
    })
    out.to_csv(out_csv, index=False)
    print(f"[OK] wrote {out_csv} (rows={len(out)})")
    return out

Configuration

# Specify path to .nc files and spatial params file
PARAMS_CSV  = "./Stony_Brook/Stony_Brook_spatial_params.csv"
PR_NC_PATH  = "./Stony_Brook/pr_hourly_AORC_2017.nc"
T2M_NC_PATH = "./Stony_Brook/t2m_daily_avg_AORC_2017.nc"

# Specify the path to lookup .csv files
PR_OUT_CSV  = "./Stony_Brook/Stony_Brook_pr_AORC_lookup.csv"
T2M_OUT_CSV = "./Stony_Brook/Stony_Brook_t2m_AORC_lookup.csv"

# Specify the columns in generated lookup .csv files
REQUIRED_COLS = {"stream", "centroid_lat", "centroid_lon"}

Precipitation Lookup (Stony_Brook_pr_AORC_lookup.csv)

pr_file  = build_lookup_like_reference(PR_NC_PATH,  PARAMS_CSV, PR_OUT_CSV)

Temperatures Lookup (Stony_Brook_t2m_AORC_lookup.csv)

t2m_file = build_lookup_like_reference(T2M_NC_PATH, PARAMS_CSV, T2M_OUT_CSV)

Example:

stream,lat_index,lon_index
420558772,110,215
420557886,112,216

Configuration in config.yaml

The configuration provided supports automatic detection of time chunks and uses the AORC data as follows:

forcings:
  type: "netcdf"                     
  path: "../data/Stony_Brook/"       
  time_chunking: true                # Automatic chunk size calculation

  variables:
    - name: "pr"
      file: "pr_hourly_AORC_{year}.nc"
      var_name: "pr"
      time_resolution: "1h"
      required: true

    - name: "t2m"
      file: "t2m_daily_avg_AORC_{year}.nc"
      var_name: "t2m"
      time_resolution: "24h"
      required: true

The lookup mappings are configured under:

forcing_mappings:
  path: "."
  variables:
    - name: "pr"
      file: "Stony_Brook_pr_AORC_lookup.csv"
    - name: "t2m"
      file: "Stony_Brook_t2m_AORC_lookup.csv"

Example Workflow

  1. For each stream, the model uses lookup files to identify the correct grid cell from the forcing datasets.
  2. Forcing data is loaded chunk by chunk from the NetCDF files to minimize memory usage.
  3. Variables are transferred to the GPU and used by the solver kernels (d_forc_data, c_forc_dt, c_forc_nT) for time-stepped simulation.

Clone this wiki locally