We have a 45x regression on the benchmarks from the paper on PlantBiophysics. We are now at ~55 μs instead of the previous 1.2 μs, this is not acceptable.
I did a benchmark on the run! method that we use:
using BenchmarkTools
using PlantBiophysics, PlantMeteo, PlantSimEngine
meteo = Atmosphere(T=33.7, Wind=8.1, P=92.9, Rh=0.4, Ca=722.0)
constants = Constants()
leaf = ModelList(
energy_balance=Monteith(),
photosynthesis=Fvcb(),
stomatal_conductance=Medlyn(0.48, 6.28),
status=(Rₛ=401.0, sky_fraction=0.92, aPPFD=881.27, d=0.02,),
)
@benchmark run!($leaf, $meteo, $constants, nothing; tracked_outputs=nothing, check=false, executor=$SequentialEx())
This takes 55.923 μs ± 121.703 μs.
I did this function to investigate, incrementally adding code, the times each lines add is noted as a comment:
function testfun(leaf, meteo_adjusted, constants)
dep_graph = PlantSimEngine.dep(leaf, nsteps) # From 38.458 μs to 60.875 μs because of this line of code (adds ~22.4 μs)
outputs_preallocated = PlantSimEngine.pre_allocate_outputs(leaf, nothing, nsteps) # From 9.959 μs to 38.458 μs because of this line of code (adds ~28.5 μs)
status_flattened, vector_variables = PlantSimEngine.flatten_status(leaf.status) # From 2.370 μs to 9.959 μs because of this line of code (adds ~7.5 μs)
# The following code takes 2.370 μs:
roots = collect(dep_graph.roots)
for (process, node) in roots
PlantSimEngine.run_node!(leaf, node, 1, status_flattened, meteo_adjusted, constants, nothing)
end
PlantSimEngine.save_results!(status_flattened, outputs_preallocated, 1)
end
There are mainly two problematic lines: computing the dependency graph (dep), and pre-allocating the outputs (pre_allocate_outputs).
Improving the dependency graph is something complex in itself, but pre-allocated the outputs should be a simple step.
We have a 45x regression on the benchmarks from the paper on PlantBiophysics. We are now at ~55 μs instead of the previous 1.2 μs, this is not acceptable.
I did a benchmark on the
run!method that we use:This takes 55.923 μs ± 121.703 μs.
I did this function to investigate, incrementally adding code, the times each lines add is noted as a comment:
There are mainly two problematic lines: computing the dependency graph (
dep), and pre-allocating the outputs (pre_allocate_outputs).Improving the dependency graph is something complex in itself, but pre-allocated the outputs should be a simple step.