Skip to content

Adaptive implicit vertical advection#5604

Open
simone-silvestri wants to merge 41 commits into
mainfrom
ss/adaptive-implicit-vertical-advection
Open

Adaptive implicit vertical advection#5604
simone-silvestri wants to merge 41 commits into
mainfrom
ss/adaptive-implicit-vertical-advection

Conversation

@simone-silvestri
Copy link
Copy Markdown
Collaborator

@simone-silvestri simone-silvestri commented May 18, 2026

this PR implements a new advection configuration, an AdaptiveImplicitVerticalAdvection
that wraps around an advection scheme (scheme = AdaptiveImplicitVerticalAdvection(explicit_scheme = WENO(), cfl = 0.7) for example)
constructed by adding a time_discretization kwarg to the advection schemes (by default ExplicitTimeDiscretization()) enabled (for example) by passing AdaptiveImplicitTimeDiscretization(; maximum_explicit_cfl)

WENO(; time_discretization = AdaptiveImplicitTimeDiscretization(cfl = 0.5))

to treat implicitly part of vertical advection.

In particular it split the advection in an explicit part that advects with the scheme using a vertical velocity we = w * max(cfl / scheme.cfl) and an implicit part that advects with an upwind first order scheme using a velocity of wi = w - we.

We reuse the implicit tridiagonal solver to advect implicitly. A caveat (for the moment), is that the vertical velocity itself does not participate in this split. An example of a comparison between a reference, a fully explicit scheme and AIVA can be found in the validation/implicit_vertical_advection folder. It advects a gaussian, these are the results:

julia> include("validation/implicit_vertical_advection/comparison_implicit_explicit_advection.jl"); fig
[ Info: Δz_min = 0.00023788883447073417, w₀ = 1.0
[ Info: AIVA Δt = 0.0023788883447073417 (CFL  10.0), 168 steps
[ Info: Ref  Δτ = 0.00011894441723536708 (CFL  0.5), 3363 steps

=== AIVA stability + correctness validation ===
grid: Nz=128, Δz_min=2.379e-04, Δz_max=2.355e-02
AIVA Δt = 2.3789e-03 (CFL  10.00), reference Δτ = 1.1894e-04 (CFL  0.50)
Final time = 3.9965e-01 (168 AIVA steps = 3360 reference steps)

--- Stability ---
Explicit UpwindBiased(order=5) at AIVA Δt: BLEW UP at step 12 (t = 2.855e-02)
AIVA at large Δt: completed 168 steps, max|c| = 9.965e-01 (initial 9.998e-01)

--- Correctness (vs analytical translation) ---
Analytical: max|c| = 9.988e-01, mass = 1.4122e-01, centroid = -0.1510 (target -0.1503)
Reference:  max|c| = 9.966e-01, mass = 1.4124e-01, centroid = -0.1510, L²-rel = 2.008e-03, L∞ = 2.209e-03
AIVA:       max|c| = 9.965e-01, mass = 1.5073e-01, centroid = -0.1447, L²-rel = 9.845e-02, L∞ = 1.267e-01
AIVA vs reference:= 3.104e-02

--- Mass conservation (∫c dz) ---
Initial:        AIVA = 1.418014e-01, reference = 1.418014e-01
Final:          AIVA = 1.507308e-01, reference = 1.412434e-01
Drift (final/initial  1): AIVA = +6.297e-02, reference = -3.935e-03
Min/max during AIVA run: 1.418014e-01 / 1.507308e-01

and the figure it produces:
image

Comment thread src/Advection/adaptive_implicit_vertical_advection.jl Outdated
@glwagner
Copy link
Copy Markdown
Member

glwagner commented May 19, 2026

Let's avoid a wrapper if at all possible (instead you can add a property to the model). Using a wrapper makes the user interface hard; we can't make it a default option easily.

Generally wrappers should be a method of last resort, only if they are ABSOLUTELY the worst of many other evil options, in my opinion.

@navidcy navidcy added the numerics 🧮 So things don't blow up and boil the lobsters alive label May 19, 2026
@simone-silvestri
Copy link
Copy Markdown
Collaborator Author

I would avoid a property in the model, it's an advection scheme property. Another option is to add an ExplicitTimeDiscretization or VerticallyImplicitTimeDiscretization directly to the advection scheme, even if the implicit component would be only first order upwind.

@glwagner
Copy link
Copy Markdown
Member

I would avoid a property in the model, it's an advection scheme property. Another option is to add an ExplicitTimeDiscretization or VerticallyImplicitTimeDiscretization directly to the advection scheme, even if the implicit component would be only first order upwind.

It's really a model property, because it has to be supported by changing the whole algorithm (ie this cannot be implemented for models that do not have tridiagonal solvers already)

@simone-silvestri
Copy link
Copy Markdown
Collaborator Author

As a model property leads to the usual design problem of having

HydrostaticFreeSurfaceModel(; advection = nothing, implicit_advection = true)

which creates conflicting keyword arguments. I like the wrapper approach which I feel is also quite easy and maintainable but this is a personal preference. The implicit solver is a timestepper property, not a model property, so if we want to be completely clean in the design then the time discretization kwarg should belong to the timestepper. However this requires a redesign of the closures which follow the design I am proposing here for the advection.

@simone-silvestri
Copy link
Copy Markdown
Collaborator Author

So I would be ok in mimicking the closure's approach where the "time discretization" belongs to the advection scheme.

@glwagner
Copy link
Copy Markdown
Member

As a model property leads to the usual design problem of having

HydrostaticFreeSurfaceModel(; advection = nothing, implicit_advection = true)

which creates conflicting keyword arguments. I like the wrapper approach which I feel is also quite easy and maintainable but this is a personal preference. The implicit solver is a timestepper property, not a model property, so if we want to be completely clean in the design then the time discretization kwarg should belong to the timestepper. However this requires a redesign of the closures which follow the design I am proposing here for the advection.

The point is that this is FAR lesser evil than the wrapper .

@glwagner
Copy link
Copy Markdown
Member

Wrappers have created a lot of issues in the past (compared to a simple validation check) and also impact compile time

@simone-silvestri
Copy link
Copy Markdown
Collaborator Author

Otherwise, I am also ok taking the more principled route of redesigning the timestepper to include the vertical time discretization but will require a couple of more PRs

@glwagner
Copy link
Copy Markdown
Member

I would avoid a property in the model, it's an advection scheme property. Another option is to add an ExplicitTimeDiscretization or VerticallyImplicitTimeDiscretization directly to the advection scheme, even if the implicit component would be only first order upwind.

I'm ok with embedding in the advection scheme.

Note, validation is required for ANY approach, so the problem of "invalid combinations" is not an actual differentiator.

@glwagner
Copy link
Copy Markdown
Member

glwagner commented May 19, 2026

Otherwise, I am also ok taking the more principled route of redesigning the timestepper to include the vertical time discretization but will require a couple of more PRs

I don't think you need to change the time-stepper here, or at least our discussion does not bear on whether or not to change the timestepper.

@github-actions
Copy link
Copy Markdown
Contributor

Benchmark Comparison

Benchmark Comparison: PR vs Main

Benchmark PR (pts/s) Main (pts/s) Change
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 58051491.205 57800437.820 +0.4%
EarthOcean_tripolar_180x90x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 26427421.952 20318510.690 +30.1%
EarthOcean_tripolar_720x360x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 60263892.743 60263840.425 +0%
EarthOcean_tripolar_360x180x50_F32_WENOVectorInvariantDefault_WENO7_CATKE_2tr 73317686.595 73336721.808 -0%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_nothing_2tr 101080235.96 101189344.41 -0.1%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE+Biharmonic_2tr 40931141.269 40882213.754 +0.1%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE+GM+Biharmonic_2tr 12490825.024 12495084.490 -0%
EarthOcean_tripolar_360x180x50_F64_nothing_nothing_CATKE_2tr 85682391.282 87057497.431 -1.6%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariant5_WENO5_CATKE_2tr 65251253.789 65230590.323 +0%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariant9_WENO9_CATKE_2tr 44540466.394 43184925.771 +3.1%
EarthOcean_lat_lon_zstar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 46219177.112 46219034.375 +0%
EarthOcean_immersed_lat_lon_zstar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 52969904.663 52912412.764 +0.1%
EarthOcean_tripolar_zstar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 51784968.362 51730971.217 +0.1%
EarthOcean_lat_lon_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 56417990.502 56400090.285 +0%
EarthOcean_immersed_lat_lon_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 59851618.925 59840902.466 +0%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_3tr 54164230.858 54131327.146 +0.1%

NSYS Kernel Profiling

EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr

Kernel Median (ms) Main (ms) Change Instances Avg (ms) Min (ms) Max (ms)
gpu_compute_hydrostatic_free_surface_Gu_ 2.454 2.454 +0.0% 318 2.461 2.447 2.676
gpu_compute_hydrostatic_free_surface_Gv_ 2.171 2.170 +0.0% 318 2.177 2.164 2.366
gpu__rk_substep_turbulent_kinetic_energy_ 1.990 1.990 +0.0% 315 1.994 1.986 2.172
gpu_compute_CATKE_closure_fields_ 1.590 1.590 +0.0% 318 1.594 1.582 1.724
gpu_compute_hydrostatic_free_surface_Gc_ 0.984 0.979 +0.6% 315 0.987 0.980 1.079
gpu_compute_hydrostatic_free_surface_Gc_ 0.979 0.979 +0.1% 315 0.982 0.976 1.073
gpu_compute_hydrostatic_free_surface_Gc_ 0.978 0.979 -0.0% 315 0.981 0.974 1.069
gpu__compute_w_from_continuity_ 0.340 0.339 +0.5% 633 0.340 0.335 0.350
gpu_compute_TKE_diffusivity_ 0.641 0.641 -0.0% 315 0.643 0.635 0.691
gpu__compute_split_explicit_transport_velocities_ 0.487 0.487 +0.0% 315 0.487 0.483 0.492

@simone-silvestri
Copy link
Copy Markdown
Collaborator Author

@glwagner this PR should be ready to review. (I have also checked there is no performance regression for including the time discretization in the advection schemes).

@github-actions
Copy link
Copy Markdown
Contributor

Benchmark Comparison

Benchmark Comparison: PR vs Main

Benchmark PR (pts/s) Main (pts/s) Change
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 57775227.818 55973325.271 +3.2%
EarthOcean_tripolar_180x90x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 25546645.407 25107471.563 +1.7%
EarthOcean_tripolar_720x360x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 60238189.470 60262283.425 -0%
EarthOcean_tripolar_360x180x50_F32_WENOVectorInvariantDefault_WENO7_CATKE_2tr 73198783.587 72789115.797 +0.6%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_nothing_2tr 101262897.50 100976485.78 +0.3%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE+Biharmonic_2tr 40848828.092 40777398.431 +0.2%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE+GM+Biharmonic_2tr 12497678.867 12496401.683 +0%
EarthOcean_tripolar_360x180x50_F64_nothing_nothing_CATKE_2tr 87074966.783 86460791.435 +0.7%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariant5_WENO5_CATKE_2tr 65186814.804 65122910.231 +0.1%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariant9_WENO9_CATKE_2tr 44525282.072 43146090.679 +3.2%
EarthOcean_lat_lon_zstar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 46220036.538 46210335.682 +0%
EarthOcean_immersed_lat_lon_zstar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 52089518.662 53062667.053 -1.8%
EarthOcean_tripolar_zstar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 51800471.035 51781174.539 +0%
EarthOcean_lat_lon_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 56433668.370 56409345.874 +0%
EarthOcean_immersed_lat_lon_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 59847051.992 59589003.848 +0.4%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_3tr 54158988.286 53754758.366 +0.8%

NSYS Kernel Profiling

EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr

Kernel Median (ms) Main (ms) Change Instances Avg (ms) Min (ms) Max (ms)
gpu_compute_hydrostatic_free_surface_Gu_ 2.454 2.454 +0.0% 318 2.463 2.449 2.679
gpu_compute_hydrostatic_free_surface_Gv_ 2.170 2.171 -0.0% 318 2.178 2.163 2.364
gpu__rk_substep_turbulent_kinetic_energy_ 1.989 1.990 -0.0% 315 1.996 1.985 2.171
gpu_compute_CATKE_closure_fields_ 1.591 1.590 +0.0% 318 1.595 1.584 1.726
gpu_compute_hydrostatic_free_surface_Gc_ 0.985 0.979 +0.6% 315 0.988 0.980 1.077
gpu_compute_hydrostatic_free_surface_Gc_ 0.980 0.979 +0.1% 315 0.983 0.976 1.072
gpu_compute_hydrostatic_free_surface_Gc_ 0.979 0.979 -0.0% 315 0.982 0.974 1.071
gpu__compute_w_from_continuity_ 0.338 0.338 -0.2% 633 0.338 0.333 0.346
gpu_compute_TKE_diffusivity_ 0.641 0.641 +0.0% 315 0.643 0.634 0.690
gpu__compute_split_explicit_transport_velocities_ 0.487 0.487 +0.0% 315 0.487 0.483 0.491

Comment thread src/Advection/time_discretization.jl Outdated

A fully-explicit time-discretization of a `TurbulenceClosure`.
"""
struct ExplicitTimeDiscretization <: AbstractTimeDiscretization end
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also an ETD in TurbulenceClosures? Should this go somewhere more general?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same one that is in TurbulenceClosures, I moved it up to Advection.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's weird that "VerticallyImplicitTimeDiscretization" (used for closures) is defined in Advection

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest putting this into TimeSteppers module, which is loaded before Advection:

include("TimeSteppers/TimeSteppers.jl")
include("Advection/Advection.jl")

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me!

Comment thread src/Advection/time_discretization.jl Outdated
@inline bias(u::Number) = ifelse(u > 0, LeftBias, RightBias)

@inline function advective_momentum_flux_Uu(i, j, k, grid, scheme::UpwindScheme, U, u)
@inline function advective_momentum_flux_Uu(i, j, k, grid, scheme::UpwindScheme, ::ETD, U, u)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly confused by this. The time discretization is also inside scheme, right?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah but this is to dispatch, so we unwrap the time_discretization(scheme) and these fluxes are the "Explicit" ones.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another strategy is to define ExplicitUpwindScheme right? This seems a bit simpler. Curious what you think.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could also do like the closures do and just reroute advective_momentum_flux_Uu(i, j, k, grid, scheme, ::ETD, U, u) to advective_momentum_flux_Uu(i, j, k, grid, scheme, U, u) and keep the same code in this file. This would allow us to avoid aliases

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, is that the benefit of using the extra argument? I don't completely grasp the totality of this design

Copy link
Copy Markdown
Collaborator Author

@simone-silvestri simone-silvestri May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the idea here is that the _advective_flux (once the order is decided) reroute to
advective_flux(..., td, ...) so that we select what happens, then if it is adaptive vertically implicit we have this extra step of computing the "explicit" vertical velocity otherwise business as usual. It is true that ETD is business as usual so we can remove the extra argument in case of an ETD which leaves the previous functions untouched. This would basically be the same design as

@inline diffusive_flux_x(i, j, k, grid, ::ATD, args...) = diffusive_flux_x(i, j, k, grid, args...)
@inline diffusive_flux_y(i, j, k, grid, ::ATD, args...) = diffusive_flux_y(i, j, k, grid, args...)
@inline diffusive_flux_z(i, j, k, grid, ::ATD, args...) = diffusive_flux_z(i, j, k, grid, args...)
@inline viscous_flux_ux(i, j, k, grid, ::ATD, args...) = viscous_flux_ux(i, j, k, grid, args...)
@inline viscous_flux_uy(i, j, k, grid, ::ATD, args...) = viscous_flux_uy(i, j, k, grid, args...)
@inline viscous_flux_uz(i, j, k, grid, ::ATD, args...) = viscous_flux_uz(i, j, k, grid, args...)
@inline viscous_flux_vx(i, j, k, grid, ::ATD, args...) = viscous_flux_vx(i, j, k, grid, args...)

In the current state the difference is that we are not removing the extra argument.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in the current state, if we want to define a new advecitve flux, we define it for the "ETD" case with explicit dispatch -- is that right?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that would be the route. Otherwise if we add the rerouting that removes the extra argument when we do not need to explicitly specify ETD

Comment thread src/Advection/time_discretization.jl Outdated
Comment thread src/TimeSteppers/runge_kutta_3.jl Outdated
@github-actions
Copy link
Copy Markdown
Contributor

Benchmark Comparison

Benchmark Comparison: PR vs Main

Benchmark PR (pts/s) Main (pts/s) Change
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 57788393.676 57834951.544 -0.1%
EarthOcean_tripolar_180x90x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 25997835.956 22999766.598 +13%
EarthOcean_tripolar_720x360x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 60269076.764 60263543.812 +0%
EarthOcean_tripolar_360x180x50_F32_WENOVectorInvariantDefault_WENO7_CATKE_2tr 66600567.398 69869673.831 -4.7%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_nothing_2tr 100943014.09 101215811.79 -0.3%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE+Biharmonic_2tr 40903828.745 40926178.673 -0.1%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE+GM+Biharmonic_2tr 12503258.019 12508250.639 -0%
EarthOcean_tripolar_360x180x50_F64_nothing_nothing_CATKE_2tr 87092770.150 87105374.069 -0%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariant5_WENO5_CATKE_2tr 65243439.340 65213050.291 +0%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariant9_WENO9_CATKE_2tr 44541198.724 43191923.795 +3.1%
EarthOcean_lat_lon_zstar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 46208915.872 46208825.236 +0%
EarthOcean_immersed_lat_lon_zstar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 53258924.023 53254949.131 +0%
EarthOcean_tripolar_zstar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 51792956.341 51806339.870 -0%
EarthOcean_lat_lon_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 56413517.017 56397535.747 +0%
EarthOcean_immersed_lat_lon_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 59819489.406 59704921.670 +0.2%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_3tr 54157874.491 54161916.752 -0%

NSYS Kernel Profiling

EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr

Kernel Median (ms) Main (ms) Change Instances Avg (ms) Min (ms) Max (ms)
gpu_compute_hydrostatic_free_surface_Gu_ 2.455 2.456 -0.0% 318 2.466 2.448 2.677
gpu_compute_hydrostatic_free_surface_Gv_ 2.172 2.172 -0.0% 318 2.180 2.164 2.369
gpu__rk_substep_turbulent_kinetic_energy_ 1.989 1.989 -0.0% 315 1.995 1.983 2.175
gpu_compute_CATKE_closure_fields_ 1.588 1.589 -0.0% 318 1.594 1.581 1.721
gpu_compute_hydrostatic_free_surface_Gc_ 0.983 0.978 +0.5% 315 0.987 0.979 1.076
gpu_compute_hydrostatic_free_surface_Gc_ 0.979 0.978 +0.1% 315 0.983 0.976 1.072
gpu_compute_hydrostatic_free_surface_Gc_ 0.978 0.978 +0.0% 315 0.982 0.974 1.071
gpu__compute_w_from_continuity_ 0.339 0.338 +0.1% 633 0.339 0.334 0.348
gpu_compute_TKE_diffusivity_ 0.642 0.643 -0.0% 315 0.644 0.637 0.693
gpu__compute_split_explicit_transport_velocities_ 0.485 0.485 -0.0% 315 0.485 0.480 0.489

@github-actions
Copy link
Copy Markdown
Contributor

Benchmark Comparison

Benchmark Comparison: PR vs Main

Benchmark PR (pts/s) Main (pts/s) Change
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 57838998.969 57865538.475 -0%
EarthOcean_tripolar_180x90x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 26442867.822 26266912.756 +0.7%
EarthOcean_tripolar_720x360x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 60265556.917 60206410.721 +0.1%
EarthOcean_tripolar_360x180x50_F32_WENOVectorInvariantDefault_WENO7_CATKE_2tr 73313802.666 73321082.962 -0%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_nothing_2tr 100585614.25 100606407.66 -0%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE+Biharmonic_2tr 40921885.839 40758296.041 +0.4%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE+GM+Biharmonic_2tr 12502891.061 12503582.715 -0%
EarthOcean_tripolar_360x180x50_F64_nothing_nothing_CATKE_2tr 87184039.839 87239560.941 -0.1%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariant5_WENO5_CATKE_2tr 65138330.232 65237732.831 -0.2%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariant9_WENO9_CATKE_2tr 44532335.214 43132129.368 +3.2%
EarthOcean_lat_lon_zstar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 46239315.022 45476339.071 +1.7%
EarthOcean_immersed_lat_lon_zstar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 53262159.158 53266403.741 -0%
EarthOcean_tripolar_zstar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 51816303.658 51797475.416 +0%
EarthOcean_lat_lon_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 56405053.319 56426105.532 -0%
EarthOcean_immersed_lat_lon_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 59794489.307 59826194.413 -0.1%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_3tr 54084047.204 54085899.505 -0%

NSYS Kernel Profiling

EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr

Kernel Median (ms) Main (ms) Change Instances Avg (ms) Min (ms) Max (ms)
gpu_compute_hydrostatic_free_surface_Gu_ 2.455 2.454 +0.0% 318 2.463 2.447 2.680
gpu_compute_hydrostatic_free_surface_Gv_ 2.170 2.170 +0.0% 318 2.177 2.164 2.366
gpu__rk_substep_turbulent_kinetic_energy_ 1.989 1.989 -0.0% 315 1.995 1.985 2.174
gpu_compute_CATKE_closure_fields_ 1.590 1.590 +0.0% 318 1.595 1.582 1.725
gpu_compute_hydrostatic_free_surface_Gc_ 0.984 0.978 +0.6% 315 0.987 0.979 1.078
gpu_compute_hydrostatic_free_surface_Gc_ 0.980 0.978 +0.1% 315 0.982 0.976 1.073
gpu_compute_hydrostatic_free_surface_Gc_ 0.979 0.978 +0.0% 315 0.981 0.975 1.073
gpu__compute_w_from_continuity_ 0.339 0.339 -0.2% 633 0.339 0.334 0.346
gpu_compute_TKE_diffusivity_ 0.641 0.641 -0.0% 315 0.643 0.636 0.691
gpu__compute_split_explicit_transport_velocities_ 0.487 0.487 +0.0% 315 0.487 0.483 0.491

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmark performance runs preconfigured benchamarks and spits out timing numerics 🧮 So things don't blow up and boil the lobsters alive

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants