Add the TaylorFast SarBp ComputeType and documentation#1192
Conversation
Adds a new SarBp ComputeType, TaylorFast. This variant leverages a Taylor expansion of a differential range function to compute per-pixel per-pulse differential ranges. After algebraic manipulation, the range calculation is low cost. TaylorFast currently requires that the PhaseLUTOptimization be set. TaylorFast has slightly lower accuracy relative to Double than FloatFloat, but it is the fastest option on hardware with both full and reduced rate double-precision. See the documentation added for TaylorFast for the derivation of the approximation along with accuracy considerations and the optional inclusion of a property to enable additional terms in the approximation. Signed-off-by: Thomas Benson <tbenson@nvidia.com>
Greptile SummaryThis PR adds
Confidence Score: 5/5The change is safe to merge. The new TaylorFast path is additive, does not modify any existing compute type, and is guarded by a compile-time property and a runtime PhaseLUT check. The Taylor approximation math is correct and consistent with the RST derivation. Shared memory synchronization is correct. The SarBpTaylorFastSharedMemory struct is 8-byte aligned. The bin decomposition correctly uses floor(n + x) = n + floor(x). No existing compute paths are altered. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[sar_bp_impl] --> B{compute_type?}
B -->|TaylorFast| C{PhaseLUT on?}
C -->|No| D[throw: PhaseLUT required]
C -->|Yes| E[Fill phase LUT double-to-float]
E --> F[Launch SarBp kernel TaylorFast]
F --> G[tid==0 selects reference pixel at block center]
G --> H[syncthreads]
H --> I[All threads compute local float offset ref_dx/dy/dz]
I --> J[Pulse block loop]
J --> K[FillPulseBlockCache: unit vector u, half_inv_R, ref_bin int+frac]
K --> L[ComputeBinWeightToPixelTaylorFast]
L --> M[s = dot u d]
M --> N[q = norm_d_sq minus s_sq]
N --> O[dR_2nd = s + q times half_inv_R]
O --> P{AddThirdOrder?}
P -->|Yes| Q[dR = dR_2nd minus correction term]
P -->|No| R[dR = dR_2nd]
Q --> S[bin_loc = ref_bin_frac + dR times dr_inv_f32]
R --> S
S --> T[bin_int = ref_bin_int + floor bin_loc]
T --> U{bin in valid range?}
U -->|Yes| V[Interpolate range profile, Apply phase LUT, Accumulate]
U -->|No| W[skip pulse]
Reviews (2): Last reviewed commit: "Add comments that TaylorFast does not su..." | Re-trigger Greptile |
TaylorFast does not support cases where the antenna phase center is located at a pixel that would be used as a reference pixel in the kernel. This is because the calculated reference range would then be 0 and we divide by that reference range. We do not test for and handle this condition at run-time as it is uncommon and would be costly. Other ComputeTypes can support this case if it does occur in practice. Signed-off-by: Thomas Benson <tbenson@nvidia.com>
|
/build |
Adds a new SarBp ComputeType, TaylorFast. This variant leverages a Taylor expansion of a differential range function to compute per-pixel per-pulse differential ranges. After algebraic manipulation, the range calculation is low computational cost. TaylorFast currently requires that the PhaseLUTOptimization be set. TaylorFast has slightly lower accuracy relative to Double than FloatFloat, but it is the fastest option on most hardware. In many cases, TaylorFast is faster than the current Float at much higher accuracy.
The documentation contains the full derivation of the TaylorFast approximation along with accuracy considerations. The documentation also notes the inclusion of a property, PropSarBpTaylorFastAddThirdOrder, that makes the approximation more accurate for certain scenarios. The SAR BP example has also been updated to support TaylorFast, along with the optional --taylor-fast-third-order to use the PropSarBpTaylorFastAddThirdOrder property. The unit tests have also been updated to support TaylorFast.