Skip to content

Add the TaylorFast SarBp ComputeType and documentation#1192

Merged
tbensonatl merged 2 commits into
mainfrom
tbenson/sarbp-add-taylor-based-variant
May 28, 2026
Merged

Add the TaylorFast SarBp ComputeType and documentation#1192
tbensonatl merged 2 commits into
mainfrom
tbenson/sarbp-add-taylor-based-variant

Conversation

@tbensonatl

Copy link
Copy Markdown
Collaborator

Adds a new SarBp ComputeType, TaylorFast. This variant leverages a Taylor expansion of a differential range function to compute per-pixel per-pulse differential ranges. After algebraic manipulation, the range calculation is low computational cost. TaylorFast currently requires that the PhaseLUTOptimization be set. TaylorFast has slightly lower accuracy relative to Double than FloatFloat, but it is the fastest option on most hardware. In many cases, TaylorFast is faster than the current Float at much higher accuracy.

The documentation contains the full derivation of the TaylorFast approximation along with accuracy considerations. The documentation also notes the inclusion of a property, PropSarBpTaylorFastAddThirdOrder, that makes the approximation more accurate for certain scenarios. The SAR BP example has also been updated to support TaylorFast, along with the optional --taylor-fast-third-order to use the PropSarBpTaylorFastAddThirdOrder property. The unit tests have also been updated to support TaylorFast.

Adds a new SarBp ComputeType, TaylorFast. This variant leverages a
Taylor expansion of a differential range function to compute per-pixel
per-pulse differential ranges. After algebraic manipulation, the range
calculation is low cost. TaylorFast currently requires that the
PhaseLUTOptimization be set. TaylorFast has slightly lower accuracy
relative to Double than FloatFloat, but it is the fastest option on
hardware with both full and reduced rate double-precision.

See the documentation added for TaylorFast for the derivation of the
approximation along with accuracy considerations and the optional
inclusion of a property to enable additional terms in the approximation.

Signed-off-by: Thomas Benson <tbenson@nvidia.com>
@tbensonatl tbensonatl self-assigned this May 27, 2026
@copy-pr-bot

copy-pr-bot Bot commented May 27, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@greptile-apps

greptile-apps Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds SarBpComputeType::TaylorFast, a new SAR backprojection mode that replaces the per-pixel sqrt with a local Taylor expansion of the range function centered on a per-CTA reference pixel. The approximation is mathematically derived in the new RST documentation (second-order by default; optionally third-order via PropSarBpTaylorFastAddThirdOrder). The PR also includes structural refactoring: SarBpSharedMemorySarBpPulseBlockCache, ComputeBinToPixelFloatFloatComputeBinWeightToPixelFloatFloat returning a SarBpBinAndWeight struct, and several cleanup items in the example and tests.

  • Adds TaylorFast kernel path with SarBpTaylorFastSharedMemory (per-pulse unit vector, half-inverse-range, and split integer/fractional reference bin) and SarBpThreadState<TaylorFast,true>; requires PhaseLUTOptimization and is guarded by a compile-time static_assert.
  • Adds PropSarBpTaylorFastAddThirdOrder as a compile-time operator property that instantiates the TaylorFastAddThirdOrder=true kernel variant.
  • Extends the PointTarget unit test to validate TaylorFast (second- and third-order) with a 132×132 image exercising partial-edge CTAs.

Confidence Score: 5/5

The change is safe to merge. The new TaylorFast path is additive, does not modify any existing compute type, and is guarded by a compile-time property and a runtime PhaseLUT check.

The Taylor approximation math is correct and consistent with the RST derivation. Shared memory synchronization is correct. The SarBpTaylorFastSharedMemory struct is 8-byte aligned. The bin decomposition correctly uses floor(n + x) = n + floor(x). No existing compute paths are altered.

No files require special attention.

Important Files Changed

Filename Overview
include/matx/kernels/sar_bp.cuh Core kernel changes: adds SarBpTaylorFastSharedMemory, SarBpThreadState, ComputeBinWeightToPixelTaylorFast, and FillPulseBlockCache lambda. Math, synchronization, and struct alignment are all correct.
include/matx/transforms/sar_bp.h Adds TaylorFast kernel dispatch (PhaseLUT=true only) and extends fp32_bin_path guard to TaylorFast via the existing !=Double check.
include/matx/operators/sar_bp.h Adds TaylorFast enum value and PropSarBpTaylorFastAddThirdOrder property struct; extends SarBpOp with variadic CurrentProps and props<>() method.
test/00_transform/SarBp.cu Adds TaylorFast second-order and third-order test cases to MixedPrecision and PointTarget tests. Image size changed to 132x132 to exercise partial edge CTAs.
examples/sarbp/sarbp.cu Adds --precision taylor_fast and --taylor-fast-third-order CLI options with correct validation and null-guards before cudaFreeHost calls.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[sar_bp_impl] --> B{compute_type?}
    B -->|TaylorFast| C{PhaseLUT on?}
    C -->|No| D[throw: PhaseLUT required]
    C -->|Yes| E[Fill phase LUT double-to-float]
    E --> F[Launch SarBp kernel TaylorFast]
    F --> G[tid==0 selects reference pixel at block center]
    G --> H[syncthreads]
    H --> I[All threads compute local float offset ref_dx/dy/dz]
    I --> J[Pulse block loop]
    J --> K[FillPulseBlockCache: unit vector u, half_inv_R, ref_bin int+frac]
    K --> L[ComputeBinWeightToPixelTaylorFast]
    L --> M[s = dot u d]
    M --> N[q = norm_d_sq minus s_sq]
    N --> O[dR_2nd = s + q times half_inv_R]
    O --> P{AddThirdOrder?}
    P -->|Yes| Q[dR = dR_2nd minus correction term]
    P -->|No| R[dR = dR_2nd]
    Q --> S[bin_loc = ref_bin_frac + dR times dr_inv_f32]
    R --> S
    S --> T[bin_int = ref_bin_int + floor bin_loc]
    T --> U{bin in valid range?}
    U -->|Yes| V[Interpolate range profile, Apply phase LUT, Accumulate]
    U -->|No| W[skip pulse]
Loading

Reviews (2): Last reviewed commit: "Add comments that TaylorFast does not su..." | Re-trigger Greptile

Comment thread include/matx/kernels/sar_bp.cuh
Comment thread include/matx/kernels/sar_bp.cuh
Comment thread test/00_transform/SarBp.cu
TaylorFast does not support cases where the antenna phase center is located at
a pixel that would be used as a reference pixel in the kernel. This is because the
calculated reference range would then be 0 and we divide by that reference range.
We do not test for and handle this condition at run-time as it is uncommon and
would be costly. Other ComputeTypes can support this case if it does occur in
practice.

Signed-off-by: Thomas Benson <tbenson@nvidia.com>
@tbensonatl

Copy link
Copy Markdown
Collaborator Author

/build

@tbensonatl tbensonatl requested a review from cliffburdick May 27, 2026 20:31
@coveralls

Copy link
Copy Markdown

Coverage Status

Coverage is 93.606%tbenson/sarbp-add-taylor-based-variant into main. No base build found for main.

@tbensonatl tbensonatl merged commit 7e427fa into main May 28, 2026
1 check passed
@tbensonatl tbensonatl deleted the tbenson/sarbp-add-taylor-based-variant branch May 28, 2026 21:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants