Skip to content

feat: log convert metrics to benchmark local runs#26

Closed
wietzesuijker wants to merge 7 commits into
EOPF-Explorer:mainfrom
wietzesuijker:feat/convert-metrics
Closed

feat: log convert metrics to benchmark local runs#26
wietzesuijker wants to merge 7 commits into
EOPF-Explorer:mainfrom
wietzesuijker:feat/convert-metrics

Conversation

@wietzesuijker
Copy link
Copy Markdown
Contributor

Add basic observability to compare runs and debug slowdowns.

What changed

  • Feat (CLI): Local Dask controls
    • --dask-mode {threads,processes,single-threaded}
    • --dask-workers, --dask-threads-per-worker
    • Optional --dask-perf-html <path> for Dask HTML performance report
  • Metrics: Always write to <output>/debug/ (on success and failure)
    • dask_run_summary.json (params + wall time)
    • dask_metrics.json and timestamped dask_metrics_<run_id>_attemptN.json
    • Fields: wall_clock_s, workers, threads_total, tasks_observed, tasks_per_sec, compute_time_s_sum, transfer_time_s_sum, memory_used_bytes, memory_limit_bytes, optional spilled_nbytes, dashboard_link.
    • Task timing parsed defensively across Dask versions.

Dependencies

  • Add: ipykernel, jupyter, bokeh (for testing/diagnostics/UI. I should probably ditch .ipynb support, it was useful for testing).

Compat

  • Defaults mirror prior behavior (threads, 4×1)

Example

uv run eopf-geozarr convert --dask-cluster --dask-perf-html out/debug/dask-report-threads.html \  
 "https://objectstore.eodc.eu:2222/e05ab01a9d56408d82ac32d69a5aae2a:202505-s02msil2a/18/products/cpm_v256/S2B_MSIL2A_20250518T112119_N0511_R037_T29RLL_20250518T140519.zarr" \
  ./S2B_MSIL2A_20250518_T29RLL_geozarr.zarr --verbose \
  --groups measurements/reflectance/r10m measurements/reflectance/r20m measurements/reflectance/r60m
# Inspect OUT_DIR/debug/{dask_run_summary.json,dask_metrics.json}

Somewhat worryingly, this example took four attempts the last time I tried (it skips data already processed), the last attempt produced this dask_metrics.json

{
  "status": "ok",
  "run_id": "20250820-145816",
  "attempt": 4,
  "dask_enabled": true,
  "mode": "threads",
  "wall_clock_s": 416.0143037919988,
  "workers": 0,
  "threads_total": 0,
  "tasks_observed": 260,
  "tasks_per_sec": 0.6249785106667781,
  "compute_time_s_sum": 1444.2037508487701,
  "transfer_time_s_sum": 0.023849010467529297,
  "memory_used_bytes": 0,
  "memory_limit_bytes": 0,
  "dashboard_link": "http://192.168.0.12:8787/status"
}

the uv.lock diff looks kinda gross, I haven't used uv much, sorry for that.

wietzesuijker and others added 7 commits August 21, 2025 07:54
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 5.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v4...v5)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '5'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [actions/setup-python](https://github.com/actions/setup-python) from 4 to 6.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](actions/setup-python@v4...v6)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@wietzesuijker
Copy link
Copy Markdown
Contributor Author

Closing as stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant