Skip to content

Add asynchronous I/O for JLD2Writer, NetCDFWriter, ZarrWriter, and Checkpointer#5573

Open
glwagner wants to merge 20 commits into
mainfrom
glw/async-io
Open

Add asynchronous I/O for JLD2Writer, NetCDFWriter, ZarrWriter, and Checkpointer#5573
glwagner wants to merge 20 commits into
mainfrom
glw/async-io

Conversation

@glwagner
Copy link
Copy Markdown
Member

@glwagner glwagner commented May 11, 2026

This PR adds the ability to write output asynchronously while computation continues on the GPU. The goal is to hide the cost of writing from CPU to disk; if this works as intended, I/O cost should reduce to 1) the cost of computing diagnostics on the GPU and 2) the cost of the GPU-CPU transfer.

Description

AbstractOutputWriter{A} is now parameterized by a singleton tag (Synchronous or Asynchronous). Async writers run the disk-write phase on a background task via Threads.@spawn while the GPU continues computing; the GPU→CPU copy still happens synchronously on the main thread so output content is identical to the sync case. A single in-flight task per writer is enforced via wait_for_async_writes!, which is called automatically at the end of run!.

Each writer that supports async opts in by specializing prepare_async_write (fetch + GPU→CPU) and commit_async_write! (disk write). Sync write_output! runs both inline; async write_output! runs prepare inline and spawns commit.

Coverage

Enabled via asynchronous=true keyword on:

  • JLD2Writer
  • NetCDFWriter
  • ZarrWriter — supported only on non-distributed (serial) runs. Under MPI, asynchronous=true is downgraded to Synchronous with a warning so the MPI-collective write path stays on the main thread.
  • Checkpointer — supported only on GPU models. Prepare materializes the prognostic state to CPU via Adapt.adapt(Array, …). On CPU the snapshot copy would be pure overhead with no GPU→CPU work to hide, so asynchronous=true warns and falls back to Synchronous.

Default remains false for all four.

Tests

  • Byte-identity sync vs async for each writer over a short simulation.
  • Concurrency / non-blocking commit: a test-only commit_delay field on each writer makes commit_async_write! sleep on a known duration. The overlap test (JLD2 path; the spawn framework is shared) asserts that with commit_delay = 0.5s, write_output! returns in ~ms while the spawned task is still running, and wait_for_async_writes! blocks for ~commit_delay.
  • Exception propagation: a test-only commit_throws field makes commit_async_write! raise a user-supplied exception on the worker task. The test asserts the exception surfaces on the main thread at the next sync point — either an explicit wait_for_async_writes! or the implicit wait at the start of the next write_output! — and that writer.task is cleared so the writer remains usable.
  • CPU fallback for ZarrWriter+MPI and Checkpointer+CPU: warns and downgrades to Synchronous.

glwagner and others added 2 commits May 11, 2026 12:51
Parameterize `AbstractOutputWriter{A}` by a singleton tag (`Synchronous` or
`Asynchronous`). Async writers run the disk-write phase on a background task
via `Threads.@spawn` while the GPU continues computing; the GPU→CPU copy still
happens synchronously on the main thread so output content is identical to
the sync case. A single in-flight task per writer is enforced via
`wait_for_async_writes!`, which is called automatically at the end of `run!`.

Each writer that supports async opts in by specializing `prepare_async_write`
(fetch + GPU→CPU) and `commit_async_write!` (disk write). Sync `write_output!`
runs both inline; async `write_output!` runs prepare inline and spawns commit.

Enabled via `asynchronous=true` keyword on `JLD2Writer` and `NetCDFWriter`;
default remains `false`. `Checkpointer` stays synchronous (infrequent and
correctness-critical). Tests verify byte-identical output between sync and
async modes for both writers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@glwagner glwagner marked this pull request as draft May 11, 2026 18:52
glwagner and others added 2 commits May 12, 2026 22:37
Two NetCDFWriter file-size doctests were off by 0.1 KiB and one
BoundaryConditionOperation doctest expected mean=1.0842e-19 where the
operation now produces exactly 0.0. Both predate this branch but block
the docs job.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The `slope_limiter` field now shows the fully qualified
`Oceananigans.TurbulenceClosures.FluxTapering{Float64}` rather than the
unqualified `FluxTapering{Float64}`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
glwagner and others added 7 commits May 13, 2026 09:38
File sizes printed by `show(::NetCDFWriter)` and `show(::JLD2Writer)` drift
by ~0.1 KiB across HDF5/NCDatasets minor version updates, breaking doctests
that aren't actually testing anything load-bearing about file size. Filter
the `file size: X.X UNIT` line so the doctests stay green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
My local run shows the type printed with `Oceananigans.TurbulenceClosures.`
prefix but CI may produce the unqualified form depending on namespace
state. Revert the doctest to the unqualified form (the original) and add
a doctestfilter that strips the qualifier so both forms match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Local doctest run passes (modulo GPU-only) but CI keeps failing.
Temporarily disable checkdocs to determine whether a missing-docstring
warning is the cause.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The :none bisection didn't change the docs build outcome, so checkdocs
wasn't the issue. Restore the standard setting.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After many failed iterations adjusting doctests, restore the doctests to
their pre-PR state. If docs still fails, the failure is rooted in this
PR's core code changes (not the doctest expected outputs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@glwagner glwagner marked this pull request as ready for review May 21, 2026 21:07
@glwagner glwagner requested a review from vopikamm May 21, 2026 21:07
glwagner and others added 3 commits May 21, 2026 15:09
# Conflicts:
#	ext/OceananigansNCDatasetsExt/OceananigansNCDatasetsExt.jl
#	src/Oceananigans.jl
Parameterize `ZarrWriter` by the I/O mode tag (`Synchronous` /
`Asynchronous`), specialize `prepare_async_write` / `commit_async_write!`
on the serial path, and gate `Synchronous` for distributed-MPI runs so
the MPI-collective write path stays on the main thread.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@glwagner glwagner changed the title Add asynchronous I/O for JLD2Writer and NetCDFWriter Add asynchronous I/O for JLD2Writer, NetCDFWriter, and ZarrWriter May 21, 2026
glwagner and others added 5 commits May 21, 2026 15:51
Each output writer gets a `commit_delay::Float64` field (default 0.0) that
`commit_async_write!` sleeps on. This is a test-only knob for injecting
deterministic latency into the disk-write phase.

The new `test_jld2_async_overlap` uses it to prove the main thread doesn't
block on disk I/O: with `commit_delay = 0.5s`, `write_output!` returns in
~4ms, the spawned task is still running afterwards, and
`wait_for_async_writes!` blocks for ~500ms. Together these rule out the
"main thread did the work" failure mode.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each writer gets a `commit_throws::Union{Nothing,Exception}` field (default
`nothing`). When set, `commit_async_write!` raises it — letting tests assert
that worker-task failures surface on the main thread at the next sync point.

The new `test_jld2_async_exception_propagation` exercises both paths:
explicit `wait_for_async_writes!` and the implicit wait at the start of the
next `write_output!`. After either, `writer.task` is cleared, so the writer
remains usable once the error is observed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`Checkpointer` is parameterized by `Mode` and accepts an `asynchronous` kwarg.
Prepare materializes the prognostic state to CPU via `Adapt.adapt(Array, …)`
— the same GPU→CPU copy that the synchronous path triggers implicitly
during JLD2 serialization — and commit writes the CPU snapshot.

Async is gated to GPU models. On CPU it would buy nothing (no GPU→CPU work
to overlap) but cost an extra full-state copy, so `asynchronous=true` on a
CPU model warns and falls back to `Synchronous`, matching the
ZarrWriter+MPI pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@glwagner glwagner changed the title Add asynchronous I/O for JLD2Writer, NetCDFWriter, and ZarrWriter Add asynchronous I/O for JLD2Writer, NetCDFWriter, ZarrWriter, and Checkpointer May 21, 2026
On CPU, `convert_output` returns the input array unchanged when its eltype
matches `eltype(array_type)` — so async writes would race with the next
`time_step!` mutating that array. We now refuse to build such a writer at
construction time with an `ArgumentError` listing the offending outputs.

Triggered for any output that's an `AbstractField` (or `WindowedTimeAverage`
of one) whose eltype matches the writer's `array_type`. GPU is unaffected:
`convert_output` always copies via the `array_type(undef, ...)` allocation +
`copyto!` path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants