Skip to content

Lightweight Dask metrics#25

Closed
wietzesuijker wants to merge 1 commit into
EOPF-Explorer:mainfrom
wietzesuijker:fix/geozarr-groups
Closed

Lightweight Dask metrics#25
wietzesuijker wants to merge 1 commit into
EOPF-Explorer:mainfrom
wietzesuijker:fix/geozarr-groups

Conversation

@wietzesuijker
Copy link
Copy Markdown
Contributor

@wietzesuijker wietzesuijker commented Aug 20, 2025

Add basic observability to compare runs and debug slowdowns.

What changed

  • Feat (CLI): Local Dask controls
    • --dask-mode {threads,processes,single-threaded}
    • --dask-workers, --dask-threads-per-worker
    • Optional --dask-perf-html <path> for Dask HTML performance report
  • Metrics: Always write to <output>/debug/ (on success and failure)
    • dask_run_summary.json (params + wall time)
    • dask_metrics.json and timestamped dask_metrics_<run_id>_attemptN.json
    • Fields: wall_clock_s, workers, threads_total, tasks_observed, tasks_per_sec, compute_time_s_sum, transfer_time_s_sum, memory_used_bytes, memory_limit_bytes, optional spilled_nbytes, dashboard_link.
    • Task timing parsed defensively across Dask versions.

Dependencies

  • Add: ipykernel, jupyter, bokeh (for testing/diagnostics/UI. I should probably ditch .ipynb support, it was useful for testing).

Compat

  • Defaults mirror prior behavior (threads, 4×1)

Example

uv run eopf-geozarr convert --dask-cluster --dask-perf-html out/debug/dask-report-threads.html \  
 "https://objectstore.eodc.eu:2222/e05ab01a9d56408d82ac32d69a5aae2a:202505-s02msil2a/18/products/cpm_v256/S2B_MSIL2A_20250518T112119_N0511_R037_T29RLL_20250518T140519.zarr" \
  ./S2B_MSIL2A_20250518_T29RLL_geozarr.zarr --verbose \
  --groups measurements/reflectance/r10m measurements/reflectance/r20m measurements/reflectance/r60m
# Inspect OUT_DIR/debug/{dask_run_summary.json,dask_metrics.json}

Somewhat worryingly, this example took four attempts the last time I tried (it skips data already processed), the last attempt produced this dask_metrics.json

{
  "status": "ok",
  "run_id": "20250820-145816",
  "attempt": 4,
  "dask_enabled": true,
  "mode": "threads",
  "wall_clock_s": 416.0143037919988,
  "workers": 0,
  "threads_total": 0,
  "tasks_observed": 260,
  "tasks_per_sec": 0.6249785106667781,
  "compute_time_s_sum": 1444.2037508487701,
  "transfer_time_s_sum": 0.023849010467529297,
  "memory_used_bytes": 0,
  "memory_limit_bytes": 0,
  "dashboard_link": "http://192.168.0.12:8787/status"
}

the uv.lock diff is gross, I haven't used uv much, sorry for that.

_*edit:
Deprecated for now, I dropped this earlier "fix", which may actually be more of a feature than a bug, i.e., one should specify the correct groups.

563e1f5

  • Prevent errors on optional/missing groups (563e1f5)
  • Fix: Only iterate over groups that actually exist in the DataTree (skips missing/empty) (563e1f5)_

@wietzesuijker wietzesuijker marked this pull request as draft August 20, 2025 20:14
@wietzesuijker wietzesuijker changed the title Robust group handling + lightweight Dask metrics Lightweight Dask metrics Aug 21, 2025
@wietzesuijker wietzesuijker deleted the fix/geozarr-groups branch August 21, 2025 12:20
@wietzesuijker wietzesuijker restored the fix/geozarr-groups branch August 21, 2025 12:21
@wietzesuijker wietzesuijker reopened this Aug 21, 2025
@wietzesuijker wietzesuijker deleted the fix/geozarr-groups branch August 21, 2025 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant