@@ -19,34 +19,78 @@ This will run quick tests of basic CPU/GPU functionality, and report on
1919the results. As a part of this it will report if GPU functionality is
2020available.
2121
22- Setting GPU index
23- -----------------
22+
23+ Working with large data
24+ -----------------------
25+
26+ In case the data size is so large that it doesn't fit into the GPU memory, ASTRA will
27+ automatically split and process the data in subsets. This functionality is only
28+ available for FP3D, BP3D and FDK algorithms.
29+
30+ **WARNING! ** At the moment, if the input/output data is a `linked <data3d.html#link >`_
31+ GPU tensor, the automatic splitting will not work correctly.
32+
33+ **WARNING! ** Other GPU libraries, such as PyTorch, often allocate more GPU memory then
34+ they actually need to speed up computations. This significantly reduces the memory
35+ available to ASTRA, and usually it's a good idea to shrink the memory pool on the
36+ external library side (e.g. with ``torch.cuda.empty_cache ``) before calling ASTRA if you
37+ get out-of-memory errors.
38+
39+
40+ Choosing the GPU to use
41+ -----------------------
42+
43+ On systems equipped with several GPUs, you can specify which GPU will be used by ASTRA
44+ with:
2445
2546.. tabs ::
2647 .. group-tab :: Python
2748 .. code-block :: python
2849
2950 astra.set_gpu_index(index)
30- astra.set_gpu_index([index1, index2, ... ])
3151
3252 .. group-tab :: MATLAB
3353 .. code-block :: matlab
3454
3555 astra_mex('set_gpu_index', index);
56+
57+ **WARNING! ** In `multithreading `_ contexts (Python), the GPU index will be set globally,
58+ so it can't be used to reliably restrict a thread to a given GPU. Instead, you can avoid
59+ calling ``astra.set_gpu_index `` whatsoever, and instead set the desired GPU with an
60+ external library where the device context is thread-local, e.g. using
61+ ``torch.cuda.set_device ``.
62+
63+
64+ Using several GPUs cooperatively
65+ --------------------------------
66+
67+ ASTRA can utilize several GPUs simultaneously for a single algorithm to speed up the
68+ computation. To do that, you can define the desired set of GPUs to be used:
69+
70+ .. tabs ::
71+ .. group-tab :: Python
72+ .. code-block :: python
73+
74+ astra.set_gpu_index([index1, index2, ... ])
75+
76+ .. group-tab :: MATLAB
77+ .. code-block :: matlab
78+
3679 astra_mex('set_gpu_index', [index1 index2 ...]);
3780
38- This lets ASTRA use the GPU with the specified index or indices. Not all ASTRA functionality supports
39- using multiple GPUs. In that case the GPU specified first will be used.
81+ **WARNING! ** At the moment, only FP3D, BP3D and FDK algorithms support this
82+ functionality. For the rest, the first GPU in the specified index list will be used as
83+ the fallback.
4084
4185
4286Multithreading
4387--------------
4488
45- In Python, Astra supports concurrent execution of its algorithms in multiple threads. This
46- functionality can be used in scenarios such as executing an algorithm with different inputs on
47- different GPUs simultaneously, or even on the same GPU in case it's under-utilized by default. For
48- instance, here is how one can compute a fan-parallel projection using multithreaded execution to
49- accelerate sequential computation :
89+ In Python, Astra supports concurrent execution of its algorithms in multiple threads.
90+ This functionality is useful to process different inputs on different GPUs simultaneously.
91+ Another use case is processing data blocks on the * same * GPU in parallel, which is useful
92+ when a single ASTRA call under-utilizes the GPU. For instance, here is how one can
93+ compute a fan-parallel projection using multithreading :
5094
5195.. tabs ::
5296 .. group-tab :: Python
@@ -57,13 +101,13 @@ accelerate sequential computation:
57101 from concurrent.futures import ThreadPoolExecutor
58102
59103 N = 512
60- vol_geometry_full = astra.create_vol_geom(N, N, N)
104+ vol_geom_full = astra.create_vol_geom(N, N, N)
61105 vol_geom_slice = astra.create_vol_geom(N, N)
62106 angles = np.linspace(0 , 2 * np.pi, N, endpoint = False )
63107 proj_geom_slice = astra.create_proj_geom(' fanflat' , 1.0 , N, angles, N, N)
64108 projector = astra.create_projector(' cuda' , proj_geom_slice, vol_geom_slice)
65109
66- phantom_id, phantom_data = astra.data3d.shepp_logan(vol_geometry_full )
110+ phantom_id, phantom_data = astra.data3d.shepp_logan(vol_geom_full )
67111
68112 def forward_project_slice (vol_slice ):
69113 slice_proj_id, slice_proj_data = astra.create_sino(vol_slice, projector)
@@ -81,9 +125,10 @@ accelerate sequential computation:
81125**WARNING! ** No special care is taken about race conditions, so the user has to ensure that the
82126outputs of algorithms are not accessed simultaneously.
83127
84- MATLAB, on the other hand, doesn't support executing external libraries in multithreaded
85- environments, so the only option is much less lightweight process-based concurrent execution. The
86- simplest approach is to just start several copies of MATLAB in batch mode.
128+ **WARNING! ** MATLAB doesn't support executing external libraries in multithreaded
129+ environments, so the only option is much less lightweight process-based concurrent
130+ execution. The simplest approach is to just start several copies of MATLAB in batch
131+ mode.
87132
88133Masks
89134-----
@@ -103,9 +148,9 @@ projection matrix entirely. In other words, it will iteratively try
103148to match the projection of the non-masked voxels to the non-masked projection
104149data elements.
105150
106- NB: MinConstraint/MaxConstraint will affect even masked voxels.
151+ ** WARNING! ** MinConstraint/MaxConstraint will affect even masked voxels.
107152
108- NB: FP and BP algorithms (CPU versions) overwrite the output, so the values
153+ ** WARNING! ** FP and BP algorithms (CPU versions) overwrite the output, so the values
109154outside the sinogram/reconstruction masks, respectively, will be set to zero
110155instead of being ignored.
111156
@@ -141,7 +186,7 @@ passed to astra functions such as
141186 id = astra_mex_projector('create', cfg);
142187
143188 The most common usage is for creating algorithm configuration structs. See the
144- pages for `individual algorithms <algs/index.html>`_for the options they
189+ pages for `individual algorithms <algs/index.html >`_ for the options they
145190support.
146191
147192
0 commit comments