Skip to content

Commit 7e427fa

Browse files
authored
Add the TaylorFast SarBp ComputeType and documentation (#1192)
* Add the TaylorFast SarBp ComputeType and documentation Adds a new SarBp ComputeType, TaylorFast. This variant leverages a Taylor expansion of a differential range function to compute per-pixel per-pulse differential ranges. After algebraic manipulation, the range calculation is low cost. TaylorFast currently requires that the PhaseLUTOptimization be set. TaylorFast has slightly lower accuracy relative to Double than FloatFloat, but it is the fastest option on hardware with both full and reduced rate double-precision. See the documentation added for TaylorFast for the derivation of the approximation along with accuracy considerations and the optional inclusion of a property to enable additional terms in the approximation. Signed-off-by: Thomas Benson <tbenson@nvidia.com> * Add comments that TaylorFast does not support a degenerate geometry TaylorFast does not support cases where the antenna phase center is located at a pixel that would be used as a reference pixel in the kernel. This is because the calculated reference range would then be 0 and we divide by that reference range. We do not test for and handle this condition at run-time as it is uncommon and would be costly. Other ComputeTypes can support this case if it does occur in practice. Signed-off-by: Thomas Benson <tbenson@nvidia.com>
1 parent 24c67a9 commit 7e427fa

7 files changed

Lines changed: 1030 additions & 302 deletions

File tree

docs_input/api/signalimage/radar/sar_bp.rst

Lines changed: 365 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,361 @@ namespace as its API is subject to change.
1111
.. doxygenfunction:: sar_bp(const ImageType &initial_image, const RangeProfilesType &range_profiles, const PlatPosType &platform_positions, const VoxLocType &voxel_locations, const RangeToMcpType &range_to_mcp, const SarBpParams &params)
1212
.. doxygenenum:: matx::SarBpComputeType
1313
.. doxygenenum:: matx::SarBpFeature
14+
.. doxygenstruct:: matx::PropSarBpTaylorFastAddThirdOrder
1415
.. doxygenstruct:: matx::SarBpParams
1516
:members:
1617

18+
TaylorFast Range Approximation
19+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
20+
21+
``SarBpComputeType::TaylorFast`` approximates the range from each pulse to each
22+
pixel by expanding the range about a reference pixel for the current thread
23+
block. The goal is to replace most per-pixel range work with arithmetic in a
24+
local coordinate system, while computing the expensive reference quantities once
25+
per pulse and per thread block. By default, ``TaylorFast`` keeps terms through
26+
second order in the local pixel offset. Users can add the third-order term with
27+
the ``PropSarBpTaylorFastAddThirdOrder`` property. The third-order term is most
28+
useful for short stand-off ranges where the second-order approximation may not
29+
provide sufficient accuracy.
30+
31+
The derivation of the Taylor approximation follows.
32+
For pulse :math:`p`, let :math:`a_p` be the platform position and let :math:`m_p` be the
33+
range to the motion compensation point. For image pixel position :math:`x`, the
34+
range quantity used by backprojection is
35+
36+
.. math::
37+
38+
\Delta R_p(x) = \|x - a_p\| - m_p.
39+
40+
For a thread block, choose a reference pixel :math:`x_0` near the center of the
41+
block. Define
42+
43+
.. math::
44+
45+
r_0 = x_0 - a_p, \qquad R_0 = \|r_0\|, \qquad u = \frac{r_0}{R_0},
46+
47+
where :math:`u` is the unit vector pointing from the platform position to the reference pixel. Hereafter, we generally drop
48+
the :math:`p` subscript for brevity. For any pixel in the same thread block, write its local offset as
49+
50+
.. math::
51+
52+
d = x - x_0.
53+
54+
Then the exact range to the pixel is
55+
56+
.. math::
57+
58+
R(d) = \|r_0 + d\| = \|(x_0 - a_p) + (x - x_0)\| = \|x - a_p\|.
59+
60+
Decompose :math:`d` into the component along :math:`u` and the component perpendicular
61+
to :math:`u`:
62+
63+
.. math::
64+
65+
s = u \cdot d.
66+
67+
Here, :math:`s` is the signed scalar projection of :math:`d` along :math:`u` and :math:`u` is the
68+
along-range direction. The perpendicular, or cross-range, vector
69+
is then :math:`d - s u`. The squared norm of this perpendicular component is:
70+
71+
.. math::
72+
73+
\begin{aligned}
74+
\|d - s u\|^2
75+
&= d \cdot d - 2s(u \cdot d) + s^2(u \cdot u) \\
76+
&= \|d\|^2 - 2s(u \cdot d) + s^2 \\
77+
&= \|d\|^2 - 2s^2 + s^2 \\
78+
&= \|d\|^2 - s^2
79+
\end{aligned}
80+
81+
We denote this squared norm term as :math:`q`:
82+
83+
.. math::
84+
85+
q = \|d\|^2 - s^2.
86+
87+
Mathematically, :math:`q` is the squared perpendicular distance from the local pixel
88+
to the pulse-to-reference-pixel line. With these definitions, recall that
89+
:math:`R(d) = \|r_0 + d\|`. Thus,
90+
91+
.. math::
92+
93+
\begin{aligned}
94+
R(d)^2 &= \|r_0 + d\|^2 \\
95+
&= \|R_0 u + d\|^2 = (R_0 u + d) \cdot (R_0 u + d) \\
96+
&= R_0^2 (u \cdot u) + 2 R_0 (u \cdot d) + d \cdot d \\
97+
&= R_0^2 + 2 R_0 s + \|d\|^2 \\
98+
&= R_0^2 + 2 R_0 s + s^2 + q \\
99+
&= (R_0 + s)^2 + q
100+
\end{aligned}
101+
102+
where we used :math:`q = \|d\|^2 - s^2` and thus :math:`\|d\|^2 = s^2 + q`.
103+
104+
Finally,
105+
106+
.. math::
107+
108+
R(d) = \sqrt{(R_0 + s)^2 + q}.
109+
110+
:math:`R(d)` is the exact range to the pixel :math:`x` and the range difference relative to the block reference is
111+
112+
.. math::
113+
114+
\Delta R(d) = R(d) - R_0.
115+
116+
We started this section with the differential range from pixel :math:`x` to the motion compensation range
117+
:math:`m_p`. We can now write this as:
118+
119+
.. math::
120+
121+
\begin{aligned}
122+
\Delta R_p(x) &= \|x - a_p\| - m_p \\
123+
&= R_p(x) - m_p \\
124+
&= (R_0 + \Delta R(d)) - m_p \\
125+
&= \Delta R(d) + (R_0 - m_p)
126+
\end{aligned}
127+
128+
The bin coordinate for pixel :math:`x` is typically:
129+
130+
.. math::
131+
132+
b(x) = \frac{\Delta R_p(x)}{\Delta r} + b_{\mathrm{offset}}
133+
134+
Here :math:`\Delta r` is the range-bin spacing and :math:`b_{\mathrm{offset}}` is the centered
135+
range-bin offset used by the SAR backprojection operator. We can now reformulate this as:
136+
137+
.. math::
138+
139+
\begin{aligned}
140+
b(x) &= \frac{\Delta R_p(x)}{\Delta r} + b_{\mathrm{offset}} \\
141+
&= \frac{\Delta R(d) + (R_0 - m_p)}{\Delta r} + b_{\mathrm{offset}} \\
142+
&= \frac{\Delta R(d)}{\Delta r} + \frac{R_0 - m_p}{\Delta r} + b_{\mathrm{offset}}
143+
\end{aligned}
144+
145+
We can precompute the right-hand terms as:
146+
147+
.. math::
148+
149+
b_0 = \frac{R_0 - m_p}{\Delta r} + b_{\mathrm{offset}}
150+
151+
and thus:
152+
153+
.. math::
154+
155+
b(x) = b_0 + \frac{\Delta R(d)}{\Delta r}
156+
157+
The following sections derive an approximation of :math:`\Delta R(d)` using a Taylor series expansion.
158+
159+
Derivation of the Taylor Expansion
160+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
161+
162+
Starting from
163+
164+
.. math::
165+
166+
R(d) = \sqrt{R_0^2 + 2 R_0 s + \|d\|^2},
167+
168+
factor out :math:`R_0`:
169+
170+
.. math::
171+
172+
R(d)
173+
= R_0
174+
\sqrt{1 + \frac{2s}{R_0} + \frac{\|d\|^2}{R_0^2}}.
175+
176+
Let
177+
178+
.. math::
179+
180+
y = \frac{2s}{R_0} + \frac{\|d\|^2}{R_0^2}.
181+
182+
The scalar Taylor series is applied to the one-variable function
183+
:math:`f(y) = \sqrt{1 + y}` about :math:`y = 0`:
184+
185+
.. math::
186+
187+
f(y) = \sqrt{1 + y}
188+
= 1 + \frac{y}{2} - \frac{y^2}{8} + \frac{y^3}{16}
189+
+ O(y^4),
190+
191+
so
192+
193+
.. math::
194+
195+
R(d)
196+
= R_0
197+
\left(
198+
1 + \frac{y}{2} - \frac{y^2}{8} + \frac{y^3}{16}
199+
\right)
200+
+ O(R_0 y^4).
201+
202+
The order used by ``TaylorFast`` is not the number of scalar Taylor-series
203+
terms retained in :math:`f(y)`. It is the order in the local pixel offset
204+
:math:`d`. Recall that :math:`s = u \cdot d`, so :math:`s` is linear in :math:`d`.
205+
206+
The following table shows how the scalar Taylor-series terms contribute to the
207+
local-offset orders used by the ``TaylorFast`` approximation.
208+
209+
.. list-table::
210+
:header-rows: 1
211+
:widths: 24 42 34
212+
213+
* - Scalar term
214+
- Contribution
215+
- Local-offset order
216+
* - :math:`R_0`
217+
- :math:`R_0`
218+
- Zeroth order. This is the reference range and cancels in :math:`\Delta R(d) = R(d) - R_0`.
219+
* - :math:`R_0 y / 2`
220+
- :math:`s + \dfrac{s^2 + q}{2R_0}`
221+
- First-order term :math:`s`; second-order term :math:`(s^2 + q)/(2R_0)`.
222+
* - :math:`-R_0 y^2 / 8`
223+
- :math:`-\dfrac{s^2}{2R_0} - \dfrac{s(s^2 + q)}{2R_0^2} + O(d^4)`
224+
- Second-order term :math:`-s^2/(2R_0)`; third-order term :math:`-s(s^2 + q)/(2R_0^2)`.
225+
* - :math:`R_0 y^3 / 16`
226+
- :math:`\dfrac{s^3}{2R_0^2} + O(d^4)`
227+
- Third-order term :math:`s^3/(2R_0^2)`.
228+
229+
The first-order local-offset term is therefore
230+
231+
.. math::
232+
233+
s.
234+
235+
The second-order local-offset terms are
236+
237+
.. math::
238+
239+
R_0
240+
\left(
241+
\frac{s^2 + q}{2 R_0^2}
242+
- \frac{4s^2}{8 R_0^2}
243+
\right)
244+
=
245+
\frac{q}{2 R_0}.
246+
247+
The third-order local-offset terms are
248+
249+
.. math::
250+
251+
R_0
252+
\left(
253+
-\frac{4s(s^2 + q)}{8 R_0^3}
254+
+ \frac{8s^3}{16 R_0^3}
255+
\right)
256+
=
257+
-\frac{s q}{2 R_0^2}.
258+
259+
Combining terms gives
260+
261+
.. math::
262+
263+
R(d)
264+
=
265+
R_0
266+
+ s
267+
+ \frac{q}{2 R_0}
268+
- \frac{s q}{2 R_0^2}
269+
+ O\left(\frac{\|d\|^4}{R_0^3}\right).
270+
271+
Equivalently, the local range delta is
272+
273+
.. math::
274+
275+
\Delta R(d)
276+
=
277+
s
278+
+ \frac{q}{2 R_0}
279+
- \frac{s q}{2 R_0^2}
280+
+ O\left(\frac{\|d\|^4}{R_0^3}\right).
281+
282+
283+
Second-Order Approximation
284+
^^^^^^^^^^^^^^^^^^^^^^^^^^
285+
286+
The second-order approximation keeps the linear along-range term and the
287+
quadratic cross-range correction:
288+
289+
.. math::
290+
291+
\Delta R(d)
292+
=
293+
s + \frac{q}{2 R_0}.
294+
295+
We then compute the bin coordinate as
296+
297+
.. math::
298+
299+
b(d) \approx b_0 + \frac{1}{\Delta r}
300+
\left(s + \frac{q}{2R_0}\right).
301+
302+
The leading omitted term is
303+
304+
.. math::
305+
306+
\epsilon(d)
307+
\approx
308+
-\frac{s q}{2 R_0^2}.
309+
310+
This is the default ``TaylorFast`` implementation. It is most accurate when the pixel block
311+
is small relative to the platform stand-off range and when the along-range
312+
offset :math:`s` remains small.
313+
314+
Third-Order Approximation
315+
^^^^^^^^^^^^^^^^^^^^^^^^^
316+
317+
The ``PropSarBpTaylorFastAddThirdOrder`` property instantiates a ``TaylorFast``
318+
kernel variant that also keeps the cubic term:
319+
320+
.. math::
321+
322+
\Delta R(d)
323+
=
324+
s
325+
+ \frac{q}{2 R_0}
326+
- \frac{s q}{2 R_0^2}.
327+
328+
The corresponding bin coordinate is
329+
330+
.. math::
331+
332+
b(d) \approx b_0 + \frac{1}{\Delta r}
333+
\left(
334+
s
335+
+ \frac{q}{2R_0}
336+
- \frac{s q}{2R_0^2}
337+
\right).
338+
339+
With the third-order term included, the leading omitted fourth-order terms are
340+
341+
.. math::
342+
343+
\epsilon(d)
344+
\approx
345+
\frac{s^2 q}{2 R_0^3}
346+
- \frac{q^2}{8 R_0^3}.
347+
348+
Compared to the second-order form, this requires additional per-pixel arithmetic
349+
and will thus have correspondingly lower throughput than the second-order form.
350+
Keeping the third-order term avoids the short-range accuracy loss that occurs when
351+
the image block is large relative to :math:`R_0` or when the platform is close enough
352+
that the :math:`s q / R_0^2` term is no longer negligible.
353+
354+
Accuracy Considerations
355+
^^^^^^^^^^^^^^^^^^^^^^^
356+
357+
The approximation is local to a thread block. Its accuracy depends on the ratio
358+
between the maximum local pixel offset and :math:`R_0`. For satellite-scale data,
359+
:math:`R_0` is typically very large compared to the dimensions of a CUDA thread block, so
360+
the second-order approximation is likely sufficient. It is an assumption that adjacent
361+
pixels are spatially near one another and thus a compact pixel tile has a relatively small
362+
spatial extent.
363+
364+
For shorter stand-off ranges, larger image tiles, or geometries where the look direction varies rapidly across
365+
a thread block, the third-order term becomes more important. The third-order
366+
property uses a separate kernel instantiation, which avoids a run-time order
367+
dispatch inside the backprojection kernel.
368+
17369
Examples
18370
~~~~~~~~
19371

@@ -22,3 +374,16 @@ Examples
22374
:start-after: example-begin sar-bp-1
23375
:end-before: example-end sar-bp-1
24376
:dedent:
377+
378+
The ``PropSarBpTaylorFastAddThirdOrder`` property is a compile-time MatX operator property. The compute type is
379+
selected separately through ``SarBpParams``; the property only adds the
380+
third-order term to a ``TaylorFast`` launch:
381+
382+
.. literalinclude:: ../../../../test/00_transform/SarBp.cu
383+
:language: cpp
384+
:start-after: example-begin sar-bp-2
385+
:end-before: example-end sar-bp-2
386+
:dedent:
387+
388+
The property only affects ``SarBpComputeType::TaylorFast``. Other compute types
389+
continue to use their ordinary kernel instantiations.

examples/sarbp/README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -106,14 +106,16 @@ real/imag, row-major), written to `output_image.raw` in this example.
106106
| `-w {hamming,none}` | Window for range compression (default: hamming) |
107107
| `-b {auto,all,0,N}` | Pulses per processing block. `auto` uses the GPU L2 cache size to choose a block size; `all` and `0` use all pulses (default: auto) |
108108
| `--image-tiles N` | Process the image as N x N tiles during backprojection (default: 1) |
109-
| `--precision {double,float,fltflt,mixed}` | Backprojection compute precision (default: mixed) |
109+
| `--taylor-fast-third-order` | Add the third-order range term when using `--precision taylor_fast` |
110+
| `--precision {double,float,fltflt,mixed,taylor_fast}` | Backprojection compute precision (default: mixed) |
110111
| `--warmup` | Warmup GPU kernels and FFT plans before timed run |
111112

112113
The `--precision` flag controls the arithmetic used by the `sar_bp` operator. For spaceborne SAR, `float` does not provide enough precision to store fractional wavelengths at the range-to-MCP magnitudes (hundreds of km), so pure `float` is not sufficient to produce focused images. The available modes are:
113114

114115
- `double` -- full double-precision arithmetic. Most accurate.
115116
- `mixed` -- double-precision for range computation, single-precision elsewhere. Default. Close to `double` in image quality with slightly higher throughput on GPUs with reduced double-precision throughput. Other than `float`, this is the fastest option on hardware with full-throughput double-precision (e.g., A100, H100/H200, B200).
116117
- `fltflt` -- float-float evaluation using two `float` values for the high-precision range math. Significantly higher throughput on GPUs where `double` throughput is reduced (e.g., RTX PROs, Jetson Orin/Thor, gaming GPUs).
118+
- `taylor_fast` -- local Taylor approximation of the pulse-to-pixel range about a centered per-thread-block reference point. Highest-throughput experimental mode for spaceborne SAR geometries where moderate approximation error is acceptable.
117119
- `float` -- single-precision throughout. Fastest but not accurate enough for most spaceborne data.
118120

119121
## 6. View the Result

0 commit comments

Comments
 (0)