Skip to content

WIP: reduced precision for lanczos_opencv#635

Open
bjarthur wants to merge 6 commits into
JuliaMath:masterfrom
bjarthur:bja/lanczosT
Open

WIP: reduced precision for lanczos_opencv#635
bjarthur wants to merge 6 commits into
JuliaMath:masterfrom
bjarthur:bja/lanczosT

Conversation

@bjarthur
Copy link
Copy Markdown

using Float32 for Lanczos is ~2x faster and uses ~1/2x as much memory as the current Float64.

this PR currently uses whatever precision was input as the precision the internal calculations are performed with. i could also imagine specifying the type used for internal computations in the type (e.g. struct Lanczos4OpenCV{T} <: AbstractLanczos end) to separate it from the input.

i'm also curious where there is a more clever way to cast l4_2d_cs at compile time so as not to incur runtime penalities.

let me know what you think and i'll add some tests and docs.

julia> using Interpolations, BenchmarkTools

julia> x=rand(1_000_000);

julia> @benchmark Interpolations._lanczos4_opencv.(x)
BenchmarkTools.Trial: 231 samples with 1 evaluation per sample.
 Range (min … max):  19.061 ms … 83.561 ms  ┊ GC (min … max): 5.34% … 74.95%
 Time  (median):     21.303 ms              ┊ GC (median):    2.90%
 Time  (mean ± σ):   21.654 ms ±  4.135 ms  ┊ GC (mean ± σ):  4.35% ±  5.10%

                       ▅█▄▃                                    
  ▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▃▃▃▃▅▇█████▄▄▄▃▃▃▃▂▂▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▂ ▃
  19.1 ms         Histogram: frequency by time        24.8 ms <

 Memory estimate: 61.05 MiB, allocs estimate: 4.

julia> x=rand(Float32, 1_000_000);

julia> @benchmark Interpolations._lanczos4_opencv.(x)
BenchmarkTools.Trial: 387 samples with 1 evaluation per sample.
 Range (min … max):  12.078 ms … 76.608 ms  ┊ GC (min … max): 0.00% … 83.87%
 Time  (median):     12.695 ms              ┊ GC (median):    3.05%
 Time  (mean ± σ):   12.928 ms ±  3.290 ms  ┊ GC (mean ± σ):  4.56% ±  4.80%

     ▂       ▄▅▇██▇▅▂                                          
  ▆▇▇██▄▆▆▄▇██████████▄█▄▇▄▆▁▄▁▇▄▄▁▄▄▄▇▄▆▁▁▄▁▁▁▁▆▁▁▁▁▁▁▄▄▁▁▁▄ ▇
  12.1 ms      Histogram: log(frequency) by time      14.5 ms <

 Memory estimate: 30.53 MiB, allocs estimate: 4.

@codecov
Copy link
Copy Markdown

codecov Bot commented Oct 24, 2025

Codecov Report

❌ Patch coverage is 94.11765% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 88.15%. Comparing base (7a2d581) to head (9ba58a0).
⚠️ Report is 8 commits behind head on master.

Files with missing lines Patch % Lines
src/lanczos/lanczos_opencv.jl 94.11% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #635      +/-   ##
==========================================
+ Coverage   88.10%   88.15%   +0.05%     
==========================================
  Files          29       29              
  Lines        1908     1925      +17     
==========================================
+ Hits         1681     1697      +16     
- Misses        227      228       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@bjarthur
Copy link
Copy Markdown
Author

second commit makes it slightly faster:

julia> x=rand(Float32, 1_000_000);

julia> @benchmark Interpolations.value_weights.(Ref(Lanczos4OpenCV()), x)
BenchmarkTools.Trial: 429 samples with 1 evaluation per sample.
 Range (min … max):  10.796 ms … 75.222 ms  ┊ GC (min … max): 0.00% … 85.30%
 Time  (median):     11.479 ms              ┊ GC (median):    3.84%
 Time  (mean ± σ):   11.664 ms ±  3.107 ms  ┊ GC (mean ± σ):  5.43% ±  4.66%

               ▂█▆▃▄▄▅▇▃▁                                      
  ▃▃▃▃▃▃▁▃▃▃▄▅▆██████████▆▃▃▃▄▃▁▁▂▃▁▁▃▃▂▁▁▃▁▂▁▃▁▂▂▁▁▁▁▁▁▁▁▂▂▂ ▃
  10.8 ms         Histogram: frequency by time          13 ms <

 Memory estimate: 30.53 MiB, allocs estimate: 5.

@bjarthur bjarthur changed the title reduced precision for lanczos_opencv WIP: reduced precision for lanczos_opencv Oct 27, 2025
@bjarthur bjarthur marked this pull request as draft October 27, 2025 22:09
@bjarthur
Copy link
Copy Markdown
Author

bjarthur commented Oct 27, 2025

the output is qualitatively different for Float32 compared to Float64. :(

specifically, when δx is 0f0 (a 32-bit float) in _lanczos4_opencv, then s0 and c0 are exactly identical, and so cs[4] becomes 0f0 because the numerator is zero. this is not a problem when δx is 0.0 (a 64-bit float), because due to a higher numerical precision somehow s0 and c0 are off by 1 eps, and so the numerator of csis not zero.

i don't know enough about lanczos resampling to fix this in the right way. what i do know though is that this is not a problem if opencv's lanczos implementation is followed more closely (see the last commit of this PR). specifically, iszero is replaced with abs(y) <= 1e-6.

so i'm putting this on the back burner for now. below is the output without the last commit.

@mkitti @timholy @mileslucas

julia> Interpolations.value_weights(Lanczos4OpenCV{Float64}(), 0.0)
(8.837979241208245e-19, -2.8122277740499295e-18, 7.95418131708742e-18, 1.0, -7.95418131708742e-18, 2.8122277740499295e-18, -8.837979241208245e-19, -7.80550006310146e-35)

julia> Interpolations.value_weights(Lanczos4OpenCV{Float32}(), 0f0)
(8.547569f6, -2.7198194f7, 7.692811f7, -0.0f0, -7.692811f7, 2.7198194f7, -8.547569f6, -0.0f0)

@bjarthur
Copy link
Copy Markdown
Author

bjarthur commented May 8, 2026

latest commit preserves the original implementation of lanczos_opencv, and adds a new version which can use Float32 internally to mirror openCV identically.

@bjarthur bjarthur marked this pull request as ready for review May 8, 2026 20:27
@bjarthur
Copy link
Copy Markdown
Author

bjarthur commented May 8, 2026

@mkitti can you please review?

Copy link
Copy Markdown
Collaborator

@mkitti mkitti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few questions on how general this is and where the constants are coming from.

_lanczos4_opencv_faithful(float(T), float(T).(l4_2d_cs), δx)

# main differences with `_lanczos4_opencv` are (1) the criterion for preventing a
# division by zero is `< pi/4 * 1e-6` (instead of `iszero`) and (2) the resulting
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you justify where these new constants are coming from? Is it going to work for Float16?

I wonder if we should use isapprox?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants