Skip to content

Commit c2c4d1f

Browse files
tomchorclaudeglwagner
authored
Fix broken LagrangianAveraged DynamicSmagorinsky regression tests (#5506)
* Fix broken LagrangianAveraged DynamicSmagorinsky regression tests Two bugs prevented the regression test from exercising the Lagrangian- averaged dynamic Smagorinsky closure at all: 1. Duplicate nested `for` loop in `test_nonhydrostatic_regression.jl` caused the inner loop to shadow the outer one, running each closure 16 times instead of 4 (with no effect on correctness, since the inner variable always won). 2. `ocean_large_eddy_simulation_regression_test.jl` restores model state via direct `.=` assignment rather than `set!`, so `initialize_closure_fields!` was never called with the loaded velocities. For LagrangianAveraging this left 𝒥ᴹᴹ = 0, which permanently disables the closure (ϵ = 0 → 𝒥ᴹᴹ never accumulates). Now calls `initialize_closure_fields!` after state is loaded so the closure is properly bootstrapped from the checkpoint velocities. Note: the LagrangianDynamicSmagorinsky regression test will correctly fail until reference data is regenerated with the fixed code. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * comment out tests we're not testing * better fix * Restore previous_compute_time Ref for bit-exact LagAvg DynamicSmag regression The regression test loader previously only restored `Field`-typed leaves in `model.closure_fields`, skipping the `previous_compute_time::RefValue` scalar. After `set!(model, ...)`, that Ref was left at `clock.time = 0` (because `model.clock.time` is updated only later). At the first time step from the loaded checkpoint, `Δt_lagrangian = clock.time - 0` was catastrophically large, making `ϵ ≈ 1` in `_lagrangian_average_LM_MM!` and effectively discarding the carefully-restored 𝒥 history. Extends the closure-restoration loop to also copy `Base.RefValue` leaves, so `previous_compute_time` is set to the value from the checkpoint. With this, the LagrangianAveragedDynamicSmagorinsky regression test reproduces the reference iteration to machine epsilon (~1e-15) over 10 time steps. Also: - Auto-detect new (`simulation/model/...`) vs legacy (root) checkpoint paths in `get_fields_from_checkpoint`. - Add `load_interior(data, target_size)` helper that auto-detects how many halo layers are saved and unwraps OffsetArrays — removes the brittle hardcoded `[2:end-1, 2:end-1, 2:end-1]` slicing. - Return `closure_fields` (and `pNHS`, harmlessly) from `get_fields_from_checkpoint`. - Fix the regression-data-generation block (which had a stale `simulation.stop_iteration = ...` referring to a `simulation` that was never constructed) so it can be uncommented to regenerate reference data. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * force redownload of the data if the files have changed sizes * make it work on GPU * uncomment temporarily commented out regions * remove unnused function * Make rayleigh_benard regression's get_fields_from_checkpoint destructure explicit `get_fields_from_checkpoint` now returns 5 values (`solution, Gⁿ, G⁻, closure_fields, pNHS`). Rayleigh-Bénard only needs the first three; Julia's destructuring iterates and silently discards extras, so the old `solution, Gⁿ, G⁻ = get_fields_from_checkpoint(...)` works, but it's not self-documenting. Bind the unused values to `_` so the call site explicitly shows that two extras are being ignored. Addresses Copilot review comment on #5506. --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Gregory L. Wagner <wagner.greg@gmail.com>
1 parent 707cf9e commit c2c4d1f

4 files changed

Lines changed: 129 additions & 57 deletions

File tree

test/data_dependencies.jl

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -46,11 +46,28 @@ dh = DataDep("regression_truth_data",
4646

4747
DataDeps.register(dh)
4848

49-
# Invalidate stale DataDeps cache if new files are missing
49+
# Invalidate stale DataDeps cache when any expected reference file is missing
50+
# or has an unexpected size. Persistent CI caches (e.g. buildkite agents)
51+
# may hold reference data from previous artifact uploads; expected sizes are
52+
# the authoritative way to detect that.
53+
#
54+
# Sizes refer to the current files in glwagner/OceananigansArtifacts.jl
55+
# (data_for_regression_tests/). When regenerating reference data, update
56+
# these alongside the new upload.
57+
const _expected_reference_sizes = Dict(
58+
"ocean_large_eddy_simulation_DynamicSmagorinsky_directional_iteration10000.jld2" => 713110,
59+
"ocean_large_eddy_simulation_DynamicSmagorinsky_directional_iteration10010.jld2" => 713110,
60+
"ocean_large_eddy_simulation_DynamicSmagorinsky_lagrangian_iteration10000.jld2" => 1244676,
61+
"ocean_large_eddy_simulation_DynamicSmagorinsky_lagrangian_iteration10010.jld2" => 1244676,
62+
)
63+
5064
dd_path = try; datadep"regression_truth_data"; catch; nothing; end
5165
if dd_path !== nothing
52-
expected = joinpath(dd_path, "ocean_large_eddy_simulation_DynamicSmagorinsky_directional_iteration10000.jld2")
53-
if !isfile(expected)
66+
cache_stale = any(_expected_reference_sizes) do (filename, expected_size)
67+
path = joinpath(dd_path, filename)
68+
!isfile(path) || filesize(path) != expected_size
69+
end
70+
if cache_stale
5471
@info "Regression truth data cache is stale, re-downloading..."
5572
rm(dd_path; recursive=true)
5673
datadep"regression_truth_data"

test/regression_tests/ocean_large_eddy_simulation_regression_test.jl

Lines changed: 77 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,22 @@ using Oceananigans.TurbulenceClosures: AnisotropicMinimumDissipation, Lagrangian
22
using Oceananigans.TimeSteppers: update_state!
33
using Oceananigans.DistributedComputations: cpu_architecture, partition
44

5+
# Extract a window matching `target_size` from the center of `data`,
6+
# auto-detecting and stripping halo layers. Works for any saved halo
7+
# count (e.g. data saved as `parent(field)` with halo=1 is 18³,
8+
# halo=2 is 20³; both reduce to the same 16³ interior). Also unwraps
9+
# OffsetArrays via `parent` (which is what new-format Checkpointer
10+
# files contain). `parent` keeps the underlying storage backend
11+
# (Array or CuArray) intact — important for GPU correctness.
12+
function load_interior(data, target_size)
13+
arr = data isa AbstractArray ? parent(data) : data
14+
sz = size(arr)
15+
Hx = (sz[1] - target_size[1]) ÷ 2
16+
Hy = (sz[2] - target_size[2]) ÷ 2
17+
Hz = (sz[3] - target_size[3]) ÷ 2
18+
return arr[Hx+1:Hx+target_size[1], Hy+1:Hy+target_size[2], Hz+1:Hz+target_size[3]]
19+
end
20+
521
function run_ocean_large_eddy_simulation_regression_test(arch, grid_type, closure)
622
if first(closure) isa SmagorinskyLilly
723
name = "ocean_large_eddy_simulation_SmagorinskyLilly"
@@ -73,7 +89,7 @@ function run_ocean_large_eddy_simulation_regression_test(arch, grid_type, closur
7389
u₀(x, y, z) = sqrt(abs(Qᵘ)) * 1e-3 * Ξ(z)
7490
set!(model, u=u₀, w=u₀, T=T₀, S=35)
7591
76-
simulation.stop_iteration = spinup_steps-test_steps
92+
simulation = Simulation(model, Δt=Δt, stop_iteration=spinup_steps-test_steps)
7793
run!(simulation)
7894
7995
checkpointer = Checkpointer(model, schedule = IterationInterval(test_steps), prefix = name,
@@ -93,35 +109,31 @@ function run_ocean_large_eddy_simulation_regression_test(arch, grid_type, closur
93109
datadep_path = "regression_truth_data/" * name * "_iteration$spinup_steps.jld2"
94110
initial_filename = @datadep_str datadep_path
95111

96-
solution₀, Gⁿ₀, G⁻₀ = get_fields_from_checkpoint(initial_filename)
112+
solution₀, Gⁿ₀, G⁻₀, closure₀, pNHS₀ = get_fields_from_checkpoint(initial_filename)
97113

98114
Nz = grid.Nz
99115

100116
cpu_arch = cpu_architecture(architecture(grid))
101117

102-
u₀ = partition(ArrayType(solution₀.u[2:end-1, 2:end-1, 2:end-1]), cpu_arch, size(u))
103-
v₀ = partition(ArrayType(solution₀.v[2:end-1, 2:end-1, 2:end-1]), cpu_arch, size(v))
104-
w₀ = partition(ArrayType(solution₀.w[2:end-1, 2:end-1, 2:end-1]), cpu_arch, size(w))
105-
T₀ = partition(ArrayType(solution₀.T[2:end-1, 2:end-1, 2:end-1]), cpu_arch, size(T))
106-
S₀ = partition(ArrayType(solution₀.S[2:end-1, 2:end-1, 2:end-1]), cpu_arch, size(S))
107-
108-
Gⁿu₀ = partition(ArrayType(Gⁿ₀.u)[2:end-1, 2:end-1, 2:end-1], cpu_arch, size(u))
109-
Gⁿv₀ = partition(ArrayType(Gⁿ₀.v)[2:end-1, 2:end-1, 2:end-1], cpu_arch, size(v))
110-
Gⁿw₀ = partition(ArrayType(Gⁿ₀.w)[2:end-1, 2:end-1, 2:end-1], cpu_arch, size(w))
111-
GⁿT₀ = partition(ArrayType(Gⁿ₀.T)[2:end-1, 2:end-1, 2:end-1], cpu_arch, size(T))
112-
GⁿS₀ = partition(ArrayType(Gⁿ₀.S)[2:end-1, 2:end-1, 2:end-1], cpu_arch, size(S))
113-
114-
G⁻u₀ = partition(ArrayType(G⁻₀.u)[2:end-1, 2:end-1, 2:end-1], cpu_arch, size(u))
115-
G⁻v₀ = partition(ArrayType(G⁻₀.v)[2:end-1, 2:end-1, 2:end-1], cpu_arch, size(v))
116-
G⁻w₀ = partition(ArrayType(G⁻₀.w)[2:end-1, 2:end-1, 2:end-1], cpu_arch, size(w))
117-
G⁻T₀ = partition(ArrayType(G⁻₀.T)[2:end-1, 2:end-1, 2:end-1], cpu_arch, size(T))
118-
G⁻S₀ = partition(ArrayType(G⁻₀.S)[2:end-1, 2:end-1, 2:end-1], cpu_arch, size(S))
119-
120-
interior(model.velocities.u) .= u₀
121-
interior(model.velocities.v) .= v₀
122-
interior(model.velocities.w) .= w₀
123-
interior(model.tracers.T) .= T₀
124-
interior(model.tracers.S) .= S₀
118+
u₀ = partition(ArrayType(load_interior(solution₀.u, size(u))), cpu_arch, size(u))
119+
v₀ = partition(ArrayType(load_interior(solution₀.v, size(v))), cpu_arch, size(v))
120+
w₀ = partition(ArrayType(load_interior(solution₀.w, size(w))), cpu_arch, size(w))
121+
T₀ = partition(ArrayType(load_interior(solution₀.T, size(T))), cpu_arch, size(T))
122+
S₀ = partition(ArrayType(load_interior(solution₀.S, size(S))), cpu_arch, size(S))
123+
124+
Gⁿu₀ = partition(load_interior(ArrayType(Gⁿ₀.u), size(u)), cpu_arch, size(u))
125+
Gⁿv₀ = partition(load_interior(ArrayType(Gⁿ₀.v), size(v)), cpu_arch, size(v))
126+
Gⁿw₀ = partition(load_interior(ArrayType(Gⁿ₀.w), size(w)), cpu_arch, size(w))
127+
GⁿT₀ = partition(load_interior(ArrayType(Gⁿ₀.T), size(T)), cpu_arch, size(T))
128+
GⁿS₀ = partition(load_interior(ArrayType(Gⁿ₀.S), size(S)), cpu_arch, size(S))
129+
130+
G⁻u₀ = partition(load_interior(ArrayType(G⁻₀.u), size(u)), cpu_arch, size(u))
131+
G⁻v₀ = partition(load_interior(ArrayType(G⁻₀.v), size(v)), cpu_arch, size(v))
132+
G⁻w₀ = partition(load_interior(ArrayType(G⁻₀.w), size(w)), cpu_arch, size(w))
133+
G⁻T₀ = partition(load_interior(ArrayType(G⁻₀.T), size(T)), cpu_arch, size(T))
134+
G⁻S₀ = partition(load_interior(ArrayType(G⁻₀.S), size(S)), cpu_arch, size(S))
135+
136+
set!(model, u=u₀, v=v₀, w=w₀, T=T₀, S=S₀)
125137

126138
interior(model.timestepper.Gⁿ.u) .= Gⁿu₀
127139
interior(model.timestepper.Gⁿ.v) .= Gⁿv₀
@@ -135,6 +147,40 @@ function run_ocean_large_eddy_simulation_regression_test(arch, grid_type, closur
135147
interior(model.timestepper.G⁻.T) .= G⁻T₀
136148
interior(model.timestepper.G⁻.S) .= G⁻S₀
137149

150+
# Restore closure prognostic state (𝒥ᴸᴹ, 𝒥ᴹᴹ, ..., previous_compute_time)
151+
# so the closure picks up where the spinup left off rather than bootstrapping
152+
# fresh. Saved structure: Tuple (one entry per closure) of NamedTuples whose
153+
# leaves are either (data = OffsetArray,) Field structs or scalar Refs.
154+
if closure₀ !== nothing
155+
for (saved, current) in zip(closure₀, model.closure_fields)
156+
saved isa NamedTuple || continue
157+
for name in keys(saved)
158+
hasproperty(current, name) || continue
159+
target = getproperty(current, name)
160+
leaf = saved[name]
161+
if target isa Oceananigans.Fields.Field
162+
arr = leaf isa NamedTuple && haskey(leaf, :data) ? leaf.data :
163+
leaf isa AbstractArray ? leaf : nothing
164+
arr === nothing && continue
165+
# ArrayType(...) transfers to the field's backend (CPU/GPU)
166+
# before broadcast-assigning to the GPU/CPU interior.
167+
interior(target) .= ArrayType(load_interior(arr, size(target)))
168+
elseif target isa Base.RefValue
169+
# e.g. previous_compute_time — restore the scalar so
170+
# Δt_lagrangian = clock.time - previous_compute_time
171+
# matches the running simulation on the very next step.
172+
target[] = leaf
173+
end
174+
end
175+
end
176+
end
177+
178+
# Restore non-hydrostatic pressure so the next step's velocity
179+
# correction starts from the same pressure field as the reference.
180+
if pNHS₀ !== nothing
181+
interior(model.pressures.pNHS) .= ArrayType(load_interior(pNHS₀, size(model.pressures.pNHS)))
182+
end
183+
138184
model.clock.time = spinup_steps * Δt
139185
model.clock.iteration = spinup_steps
140186

@@ -148,19 +194,19 @@ function run_ocean_large_eddy_simulation_regression_test(arch, grid_type, closur
148194
datadep_path = "regression_truth_data/" * name * "_iteration$(spinup_steps+test_steps).jld2"
149195
final_filename = @datadep_str datadep_path
150196

151-
solution₁, Gⁿ₁, G⁻₁ = get_fields_from_checkpoint(final_filename)
197+
solution₁, Gⁿ₁, G⁻₁, _, _ = get_fields_from_checkpoint(final_filename)
152198

153199
test_fields = @allowscalar (u = Array(interior(model.velocities.u)),
154200
v = Array(interior(model.velocities.v)),
155201
w = Array(interior(model.velocities.w)[:, :, 1:nz]),
156202
T = Array(interior(model.tracers.T)),
157203
S = Array(interior(model.tracers.S)))
158204

159-
u₁ = partition(Array(solution₁.u)[2:end-1, 2:end-1, 2:end-1], cpu_arch, size(u))
160-
v₁ = partition(Array(solution₁.v)[2:end-1, 2:end-1, 2:end-1], cpu_arch, size(v))
161-
w₁ = partition(Array(solution₁.w)[2:end-1, 2:end-1, 2:end-2], cpu_arch, size(test_fields.w))
162-
T₁ = partition(Array(solution₁.T)[2:end-1, 2:end-1, 2:end-1], cpu_arch, size(T))
163-
S₁ = partition(Array(solution₁.S)[2:end-1, 2:end-1, 2:end-1], cpu_arch, size(S))
205+
u₁ = partition(load_interior(solution₁.u, size(u)), cpu_arch, size(u))
206+
v₁ = partition(load_interior(solution₁.v, size(v)), cpu_arch, size(v))
207+
w₁ = partition(load_interior(solution₁.w, size(test_fields.w)), cpu_arch, size(test_fields.w))
208+
T₁ = partition(load_interior(solution₁.T, size(T)), cpu_arch, size(T))
209+
S₁ = partition(load_interior(solution₁.S, size(S)), cpu_arch, size(S))
164210

165211
@show size(test_fields.w), size(w₁)
166212

test/regression_tests/rayleigh_benard_regression_test.jl

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,7 @@ function run_rayleigh_benard_regression_test(arch, grid_type)
104104
datadep_path = "regression_truth_data/" * prefix * "_iteration$spinup_steps.jld2"
105105
initial_filename = @datadep_str datadep_path
106106

107-
solution₀, Gⁿ₀, G⁻₀ = get_fields_from_checkpoint(initial_filename)
107+
solution₀, Gⁿ₀, G⁻₀, _, _ = get_fields_from_checkpoint(initial_filename)
108108

109109
cpu_arch = cpu_architecture(architecture(grid))
110110

@@ -156,7 +156,7 @@ function run_rayleigh_benard_regression_test(arch, grid_type)
156156
datadep_path = "regression_truth_data/" * prefix * "_iteration$(spinup_steps+test_steps).jld2"
157157
final_filename = @datadep_str datadep_path
158158

159-
solution₁, Gⁿ₁, G⁻₁ = get_fields_from_checkpoint(final_filename)
159+
solution₁, Gⁿ₁, G⁻₁, _, _ = get_fields_from_checkpoint(final_filename)
160160

161161
test_fields = @allowscalar (u = Array(interior(model.velocities.u)),
162162
v = Array(interior(model.velocities.v)),

test/test_nonhydrostatic_regression.jl

Lines changed: 30 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -8,37 +8,51 @@ using Oceananigans.TurbulenceClosures: LagrangianAveraging
88
function get_fields_from_checkpoint(filename)
99
file = jldopen(filename)
1010

11-
tracers = keys(file["tracers"])
11+
# Auto-detect format: new Checkpointer saves under "simulation/model/...",
12+
# legacy reference data has fields at the file root.
13+
base = haskey(file, "simulation") ? "simulation/model/" : ""
14+
15+
tracers = keys(file["$(base)tracers"])
1216
tracers = Tuple(Symbol(c) for c in tracers)
1317

14-
velocity_fields = (u = file["velocities/u/data"],
15-
v = file["velocities/v/data"],
16-
w = file["velocities/w/data"])
18+
velocity_fields = (u = file["$(base)velocities/u/data"],
19+
v = file["$(base)velocities/v/data"],
20+
w = file["$(base)velocities/w/data"])
1721

1822
tracer_fields =
19-
NamedTuple{tracers}(Tuple(file["tracers/$c/data"] for c in tracers))
23+
NamedTuple{tracers}(Tuple(file["$(base)tracers/$c/data"] for c in tracers))
2024

21-
current_tendency_velocity_fields = (u = file["timestepper/Gⁿ/u/data"],
22-
v = file["timestepper/Gⁿ/v/data"],
23-
w = file["timestepper/Gⁿ/w/data"])
25+
current_tendency_velocity_fields = (u = file["$(base)timestepper/Gⁿ/u/data"],
26+
v = file["$(base)timestepper/Gⁿ/v/data"],
27+
w = file["$(base)timestepper/Gⁿ/w/data"])
2428

2529
current_tendency_tracer_fields =
26-
NamedTuple{tracers}(Tuple(file["timestepper/Gⁿ/$c/data"] for c in tracers))
30+
NamedTuple{tracers}(Tuple(file["$(base)timestepper/Gⁿ/$c/data"] for c in tracers))
2731

28-
previous_tendency_velocity_fields = (u = file["timestepper/G⁻/u/data"],
29-
v = file["timestepper/G⁻/v/data"],
30-
w = file["timestepper/G⁻/w/data"])
32+
previous_tendency_velocity_fields = (u = file["$(base)timestepper/G⁻/u/data"],
33+
v = file["$(base)timestepper/G⁻/v/data"],
34+
w = file["$(base)timestepper/G⁻/w/data"])
3135

3236
previous_tendency_tracer_fields =
33-
NamedTuple{tracers}(Tuple(file["timestepper/G⁻/$c/data"] for c in tracers))
37+
NamedTuple{tracers}(Tuple(file["$(base)timestepper/G⁻/$c/data"] for c in tracers))
38+
39+
# Closure prognostic fields. Only present in new-format checkpoints.
40+
# JLD2 fully deserializes these into the original Tuple/NamedTuple
41+
# structure, so we just return the loaded object as-is.
42+
closure_fields = haskey(file, "$(base)closure_fields") ? file["$(base)closure_fields"] : nothing
43+
44+
# Non-hydrostatic pressure — without restoring this, the next step's
45+
# velocity correction (u -= Δt * ∇p) starts from zero pressure and
46+
# diverges from the running-state reference at O(Δt) per step.
47+
pNHS = haskey(file, "$(base)pressures/pNHS/data") ? file["$(base)pressures/pNHS/data"] : nothing
3448

3549
close(file)
3650

3751
solution = merge(velocity_fields, tracer_fields)
3852
Gⁿ = merge(current_tendency_velocity_fields, current_tendency_tracer_fields)
3953
G⁻ = merge(previous_tendency_velocity_fields, previous_tendency_tracer_fields)
4054

41-
return solution, Gⁿ, G⁻
55+
return solution, Gⁿ, G⁻, closure_fields, pNHS
4256
end
4357

4458
include("regression_tests/thermal_bubble_regression_test.jl")
@@ -70,17 +84,12 @@ include("regression_tests/ocean_large_eddy_simulation_regression_test.jl")
7084
dyn_smag_directional = (DynamicSmagorinsky(averaging=(1, 2)),)
7185
dyn_smag_lagrangian = (DynamicSmagorinsky(averaging=LagrangianAveraging()),)
7286

73-
for (closurename, closure) in [("AnisotropicMinimumDissipation", amd_closure),
74-
("SmagorinskyLilly", smag_closure),
75-
("DirectionalDynamicSmagorinsky", dyn_smag_directional),
76-
("LagrangianDynamicSmagorinsky", dyn_smag_lagrangian)]
7787
for (closurename, closure) in [("AnisotropicMinimumDissipation", amd_closure),
7888
("SmagorinskyLilly", smag_closure),
7989
("DirectionalDynamicSmagorinsky", dyn_smag_directional),
8090
("LagrangianDynamicSmagorinsky", dyn_smag_lagrangian)]
81-
@info " Testing oceanic large eddy simulation regression [$A, $closurename, $grid_type grid]"
82-
run_ocean_large_eddy_simulation_regression_test(arch, grid_type, closure)
83-
end
91+
@info " Testing oceanic large eddy simulation regression [$A, $closurename, $grid_type grid]"
92+
run_ocean_large_eddy_simulation_regression_test(arch, grid_type, closure)
8493
end
8594
end
8695
end

0 commit comments

Comments
 (0)