You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
while (future_image.wait_for(kSpinInterval) == std::future_status::timeout) {
157
+
while (future_image.wait_for(kSpinInterval) ==
158
+
std::future_status::timeout) {
162
159
spinner.Spin();
163
160
}
164
161
}
@@ -190,7 +187,7 @@ Since C++17, many algorithms in the `<algorithm>` and `<numeric>` headers accept
190
187
191
188
Imagine we want to apply a simple filter, e.g. color inversion, to every pixel of that "massive image" we've just loaded. To make it a complete example, we'll add some details to our `Image` struct from before, but we'll still keep it extremely simple.
192
189
193
-
Our image now holds a vector of pixels, with each pixel holding an RGB value. A function for inverting the color of a pixel only needs that pixel as an input and so is completely independent of other pixels. Tasks like this are called [embarrassingly parallel](https://en.wikipedia.org/wiki/Embarrassingly_parallel), which means we don't have to worry about any data collisions during parallel execution. More on that later.
190
+
Our image now holds a vector of pixels, with each pixel holding an RGB value. A function for inverting the color of a pixel only needs that pixel as an input and so is completely independent of other pixels. Tasks like these are called [embarrassingly parallel](https://en.wikipedia.org/wiki/Embarrassingly_parallel), which means we don't have to worry about any data collisions during parallel execution. More on that a bit later.
// A massive 100-megapixel image! Imagine it is filled with useful data.
@@ -310,7 +303,7 @@ int main() {
310
303
}
311
304
```
312
305
313
-
The code didn't change much at all! We only added the `std::execution::par` parameter to the `std::transform` algorithm. We also need to slightly change that compile command from before by adding `-ltbb` to it:
306
+
The code didn't change much at all! We only added the `std::execution::par` parameter to the `std::transform` algorithm as well as the `<execution>` header needed for it. We also need to slightly change that compile command from before by adding `-ltbb` to it:
314
307
315
308
```
316
309
c++ -std=c++17 -O3 main.cpp -ltbb
@@ -339,7 +332,7 @@ Now let's talk about that `std::execution::par` parameter. Similar to launch pol
339
332
> [!NOTE]
340
333
> This is a good time to talk about this `-ltbb` linker option! We also used it in the previous compilation command. The reason why we often need it to enable parallel version of the standard algorithms is because, under the hood, compilers often use **Intel Threading Building Blocks (oneTBB)** as the backend for these parallel algorithms. TBB is an industry-standard library for task-based parallelism but, again, if you're on Apple Clang you'll need to swtich to Clang (non-Apple) or GCC to use it.
341
334
342
-
This also then means that we are not confined to the limits of standard library when we want to write code that runs in parallel. If we need more control than the standard library algorithms provide, for example if we want to specify how many threads to use, we can drop down an abstraction level and use Intel TBB directly. It provides a rich set of algorithms like `tbb::parallel_for`, `tbb::parallel_reduce`, and concurrent data structures.
335
+
This also then means that we are not confined to the limits of standard library when we want to write code that runs in parallel. If we need more control than the standard library algorithms provide, we can drop down an abstraction level and use Intel TBB directly. It provides a rich set of algorithms like `tbb::parallel_for`, `tbb::parallel_reduce`, and concurrent data structures.
343
336
344
337
Let's rewrite our color inversion example using `tbb::parallel_for`. This explicitly tells TBB to split our vector index range into chunks ("blocked ranges") and process them across available worker threads:
345
338
@@ -348,17 +341,18 @@ Let's rewrite our color inversion example using `tbb::parallel_for`. This explic
All in all, TBB gives us explicit control over the chunks, which is very useful for more complex loops where standard library algorithms might not fit perfectly. But, as for this example, we can compile it just as we did before and it should run in about the same time as the parallel version of the standard algorithms, in around 5ms on my machine.
394
+
We can compile this examplejust as we compiled the previous one and it should run in about the same time as the parallel version of the standard algorithms, in around 5ms on my machine.
402
395
403
-
If you want a challenge, go ahead and find a way to only use, say, 2 threads rather than all available ones with this version of the code!
396
+
All in all, TBB is a very powerful library that gives us much more control over how our code runs in parallel. For example, there is no way to select how many threads to use using the standard library parallel algorithms but TBB allows to change that. If you want a small challenge, go ahead and find a way to only use, say, 2 threads rather than all available ones with our TBB example!
404
397
405
398
### Worker Threads and Thread Pools
406
399
So now we know how to kick off long-running tasks and how to use parallel algorithms to process many tiny tasks. Is that it? Not quite. Imagine we receive a stream of thousands tiny images that all need their colors inverted before they can be displayed.
0 commit comments