Skip to content

Commit 6f7cdf2

Browse files
committed
Improve misc section
Expand and/or add sections on multithreading, large data, and multiple GPUs.
1 parent f050b9a commit 6f7cdf2

1 file changed

Lines changed: 63 additions & 18 deletions

File tree

docs/misc.rst

Lines changed: 63 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -19,34 +19,78 @@ This will run quick tests of basic CPU/GPU functionality, and report on
1919
the results. As a part of this it will report if GPU functionality is
2020
available.
2121

22-
Setting GPU index
23-
-----------------
22+
23+
Working with large data
24+
-----------------------
25+
26+
In case the data size is so large that it doesn't fit into the GPU memory, ASTRA will
27+
automatically split and process the data in subsets. This functionality is only
28+
available for FP3D, BP3D and FDK algorithms.
29+
30+
**WARNING!** At the moment, if the input/output data is a `linked <data3d.html#link>`_
31+
GPU tensor, the automatic splitting will not work correctly.
32+
33+
**WARNING!** Other GPU libraries, such as PyTorch, often allocate more GPU memory then
34+
they actually need to speed up computations. This significantly reduces the memory
35+
available to ASTRA, and usually it's a good idea to shrink the memory pool on the
36+
external library side (e.g. with ``torch.cuda.empty_cache``) before calling ASTRA if you
37+
get out-of-memory errors.
38+
39+
40+
Choosing the GPU to use
41+
-----------------------
42+
43+
On systems equipped with several GPUs, you can specify which GPU will be used by ASTRA
44+
with:
2445

2546
.. tabs::
2647
.. group-tab:: Python
2748
.. code-block:: python
2849
2950
astra.set_gpu_index(index)
30-
astra.set_gpu_index([index1, index2, ...])
3151
3252
.. group-tab:: MATLAB
3353
.. code-block:: matlab
3454
3555
astra_mex('set_gpu_index', index);
56+
57+
**WARNING!** In `multithreading`_ contexts (Python), the GPU index will be set globally,
58+
so it can't be used to reliably restrict a thread to a given GPU. Instead, you can avoid
59+
calling ``astra.set_gpu_index`` whatsoever, and instead set the desired GPU with an
60+
external library where the device context is thread-local, e.g. using
61+
``torch.cuda.set_device``.
62+
63+
64+
Using several GPUs cooperatively
65+
--------------------------------
66+
67+
ASTRA can utilize several GPUs simultaneously for a single algorithm to speed up the
68+
computation. To do that, you can define the desired set of GPUs to be used:
69+
70+
.. tabs::
71+
.. group-tab:: Python
72+
.. code-block:: python
73+
74+
astra.set_gpu_index([index1, index2, ...])
75+
76+
.. group-tab:: MATLAB
77+
.. code-block:: matlab
78+
3679
astra_mex('set_gpu_index', [index1 index2 ...]);
3780
38-
This lets ASTRA use the GPU with the specified index or indices. Not all ASTRA functionality supports
39-
using multiple GPUs. In that case the GPU specified first will be used.
81+
**WARNING!** At the moment, only FP3D, BP3D and FDK algorithms support this
82+
functionality. For the rest, the first GPU in the specified index list will be used as
83+
the fallback.
4084

4185

4286
Multithreading
4387
--------------
4488

45-
In Python, Astra supports concurrent execution of its algorithms in multiple threads. This
46-
functionality can be used in scenarios such as executing an algorithm with different inputs on
47-
different GPUs simultaneously, or even on the same GPU in case it's under-utilized by default. For
48-
instance, here is how one can compute a fan-parallel projection using multithreaded execution to
49-
accelerate sequential computation:
89+
In Python, Astra supports concurrent execution of its algorithms in multiple threads.
90+
This functionality is useful to process different inputs on different GPUs simultaneously.
91+
Another use case is processing data blocks on the *same* GPU in parallel, which is useful
92+
when a single ASTRA call under-utilizes the GPU. For instance, here is how one can
93+
compute a fan-parallel projection using multithreading:
5094

5195
.. tabs::
5296
.. group-tab:: Python
@@ -57,13 +101,13 @@ accelerate sequential computation:
57101
from concurrent.futures import ThreadPoolExecutor
58102
59103
N = 512
60-
vol_geometry_full = astra.create_vol_geom(N, N, N)
104+
vol_geom_full = astra.create_vol_geom(N, N, N)
61105
vol_geom_slice = astra.create_vol_geom(N, N)
62106
angles = np.linspace(0, 2*np.pi, N, endpoint=False)
63107
proj_geom_slice = astra.create_proj_geom('fanflat', 1.0, N, angles, N, N)
64108
projector = astra.create_projector('cuda', proj_geom_slice, vol_geom_slice)
65109
66-
phantom_id, phantom_data = astra.data3d.shepp_logan(vol_geometry_full)
110+
phantom_id, phantom_data = astra.data3d.shepp_logan(vol_geom_full)
67111
68112
def forward_project_slice(vol_slice):
69113
slice_proj_id, slice_proj_data = astra.create_sino(vol_slice, projector)
@@ -81,9 +125,10 @@ accelerate sequential computation:
81125
**WARNING!** No special care is taken about race conditions, so the user has to ensure that the
82126
outputs of algorithms are not accessed simultaneously.
83127

84-
MATLAB, on the other hand, doesn't support executing external libraries in multithreaded
85-
environments, so the only option is much less lightweight process-based concurrent execution. The
86-
simplest approach is to just start several copies of MATLAB in batch mode.
128+
**WARNING!** MATLAB doesn't support executing external libraries in multithreaded
129+
environments, so the only option is much less lightweight process-based concurrent
130+
execution. The simplest approach is to just start several copies of MATLAB in batch
131+
mode.
87132

88133
Masks
89134
-----
@@ -103,9 +148,9 @@ projection matrix entirely. In other words, it will iteratively try
103148
to match the projection of the non-masked voxels to the non-masked projection
104149
data elements.
105150

106-
NB: MinConstraint/MaxConstraint will affect even masked voxels.
151+
**WARNING!** MinConstraint/MaxConstraint will affect even masked voxels.
107152

108-
NB: FP and BP algorithms (CPU versions) overwrite the output, so the values
153+
**WARNING!** FP and BP algorithms (CPU versions) overwrite the output, so the values
109154
outside the sinogram/reconstruction masks, respectively, will be set to zero
110155
instead of being ignored.
111156

@@ -141,7 +186,7 @@ passed to astra functions such as
141186
id = astra_mex_projector('create', cfg);
142187
143188
The most common usage is for creating algorithm configuration structs. See the
144-
pages for `individual algorithms <algs/index.html>`_for the options they
189+
pages for `individual algorithms <algs/index.html>`_ for the options they
145190
support.
146191

147192

0 commit comments

Comments
 (0)