Skip to content

Commit 654dddf

Browse files
committed
Moving all section headers up on the hierarchy
1 parent 958d83d commit 654dddf

1 file changed

Lines changed: 21 additions & 21 deletions

File tree

README.md

Lines changed: 21 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
StencilStream is a SYCL-based simulation framework to accelerate iterative 2D stencil codes with heterogeneous compute accelerators. With StencilStream, application developers and domain scientists can quickly define their 2D stencil code in a straight-forward fashion and obtain a fully functional and highly performant application, utilizing the available compute accelerators.
44

5-
## 🎯 Design Goals
5+
# 🎯 Design Goals
66

77
There are many stencil acceleration frameworks available. However, many of them use customized toolchains to support domain-specific languages, which makes them both hard to use for real-world applications and hard to extend.
88

@@ -18,68 +18,68 @@ There are many stencil acceleration frameworks available. However, many of them
1818
You don’t need to be a GPU or FPGA expert to get high performance.
1919

2020

21-
## ⚙️ Hardware Platform Support
21+
# ⚙️ Hardware Platform Support
2222

2323
**StencilStream** is built to enable high-performance stencil computations across a diverse range of modern compute architectures. The framework abstracts away low-level hardware details, allowing developers to focus on algorithm design while targeting various platforms with minimal code changes.
2424

2525
To ensure portability and efficiency, StencilStream provides multiple backend implementations optimized for specific hardware. Switching between backends is as simple as linking against a different library in your CMake configuration and including a different header in your code.
2626

2727
StencilStream has been validated on the following accelerator classes:
2828

29-
### FPGAs
29+
## FPGAs
3030

31-
StencilStream was originally developed to bring high-performance computing and FPGAs closer together. As such, StencilStream's FPGA backends are one of the most optimized backends available. The two primary FPGA backends are:
31+
StencilStream was originally developed to bring high-performance computing and FPGAs closer together (see our [FPL'24 publication](https://doi.org/10.1109/FPL64840.2024.00023)). As such, StencilStream's FPGA backends are one of the most optimized backends available. The two primary FPGA backends are:
3232

3333
* **The Tiling backend**: The most versatile FPGA backend, which supports arbitrary grid sizes at the cost of a small performance penalty
3434
* **The Monotile backend**: The most performant FPGA backend, which achieves the highest available throughput by limiting its support to small to medium-size grids.
3535

3636
In addition to this, there is also an experimental multi-FPGA version of the Monotile backend, which utilizes the networking capabilities of high-end FPGAs to scale beyond what a single FPGA can achieve.
3737

38-
### GPUs
38+
## GPUs
3939

40-
With the 4.0.0 release, StencilStream also features a GPU backend that utilizes [Codeplay's oneAPI for NVIDIA GPUs plugin](https://developer.codeplay.com/products/oneapi/nvidia/home/index.html) to achieve high throughput on NVIDIA GPUs. Thanks to a transparent data layout transformation discussed in [our paper](https://doi.org/10.1145/3811257.3811259), the very same stencil code can achieve high performance both on GPUs and FPGAs.
40+
With the 4.0.0 release, StencilStream also features a GPU backend that utilizes [Codeplay's oneAPI for NVIDIA GPUs plugin](https://developer.codeplay.com/products/oneapi/nvidia/home/index.html) to achieve high throughput on NVIDIA GPUs. Thanks to a transparent data layout transformation discussed in [our publication](https://doi.org/10.1145/3811257.3811259), the very same stencil code can achieve high performance both on GPUs and FPGAs.
4141

42-
### CPUs
42+
## CPUs
4343

4444
StencilStream also features a fully functional CPU backend for functional evaluation. Optimizing this backend to reach the full potential of modern CPUs still is a direction for future work.
4545

46-
## Examples
46+
# Examples
4747

4848
We have implemented multiple example applications to show the capabilities of StencilStream in terms of simplicity, expressiveness, and performance. One is a simple sketch to show how to get started, two are common stencil code benchmark, and two are proper applications that use StencilStream's advanced features. They are presented in the following:
4949

50-
### Conway's Game of Life
50+
## Conway's Game of Life
5151

5252
Our implementation of Conway's Game of Life is found in the subfolder [examples/conway](examples/conway/). It reads in the current state of a grid from standard-in, computes a requested number of iterations, and then writes it out again.
5353

54-
### Jacobi
54+
## Jacobi
5555

5656
The Jacobi kernels are very common class of stencil codes commonly used for benchmarking. Our implementation contains multiple versions of it in order to scale the computational complexity of a single transition function.
5757

58-
### HotSpot
58+
## HotSpot
5959

60-
This our implementation of the HotSpot benchmark from the [Rodinia Benchmark Suite](https://rodinia.cs.virginia.edu/doku.php?id=start0), found in the subfolder [examples/hotspot](examples/hotspot/). It is a very common benchmark that goes beyond the relatively simple structure of the Jacobi kernels.
60+
Our implementation of the HotSpot benchmark from the [Rodinia Benchmark Suite](https://rodinia.cs.virginia.edu/doku.php?id=start0) is found in the subfolder [examples/hotspot](examples/hotspot/). It is a very common benchmark that goes beyond the relatively simple structure of the Jacobi kernels.
6161

62-
### FDTD
62+
## FDTD
6363

6464
The FDTD application in [examples/fdtd](examples/fdtd/) is used to simulate the behavior of electro-magnetic waves within micro-cavities. The computed experiment is highly configurable, using configuration files written in JSON. Computationally, it is interesting because it utilizes StencilStream's time-dependent value feature to precompute the source wave and the sub-iterations feature to alternate between a electric and a magnetic field update. Below, you find a rendering of the final magnetic field, computed for the ["Max Grid" experiment](examples/fdtd/experiments/max_grid.json):
6565

6666
![Magnetic field within a micro-cavity, computed by the FDTD app](docs/FDTD.png)
6767

68-
### Convection
68+
## Convection
6969

7070
The convection app, found in [examples/convection](examples/convection/), simulates the convection within Earth's Mantle. It is a port of an example app for the [ParallelStencil.jl framework](https://github.com/omlins/ParallelStencil.jl) and can also be configured using a JSON file. Below, you find the animated output of the [default experiment](examples/convection/experiments/default.json).
7171

7272
![A video showing convection, computed by the Convection app](docs/convection-animation.mp4)
7373

74-
### Performance & FPGA Resource Usage
74+
## Performance & FPGA Resource Usage
7575

7676
![Line plots showing the cell throughput against the input grid size of the Jacobi, HotSpot, FDTD, and Convection examples, using the GPU backend, Monotile FPGA backend, and Tiling FPGA backend](docs/throughput_1x4.png)
7777

78-
A thorough evaluation of each backend's performance is found in [our latest paper on StencilStream 4.0.0](https://doi.org/10.1145/3811257.3811259). As you can see in the performance plot above from this paper (Tim Stöhr et al., CC BY 4.0), all backends achieve very high throughput rates for single-device execution. The highest measured throughput is 176.08 billion cell updates per second for the Jacobi benchmark, achieved by the Tiling FPGA backend on a BittWare 520N accelerator with an Intel Stratix 10 GX 2800 FPGA, which is equivalent to 1.58 TFLOPS. In terms of arithmetic throughput, the highest measured value is 1.84 TFLOPS achieved by the Monotile FPGA backend for the HotSpot benchmark, using the same accelerator.
78+
A thorough evaluation of each backend's performance is found in [our latest publication on StencilStream 4.0.0](https://doi.org/10.1145/3811257.3811259). As you can see in the performance plot above from this publication (Tim Stöhr et al., CC BY 4.0), all backends achieve very high throughput rates for single-device execution. The highest measured throughput is 176.08 billion cell updates per second for the Jacobi benchmark, achieved by the Tiling FPGA backend on a BittWare 520N accelerator with an Intel Stratix 10 GX 2800 FPGA, which is equivalent to 1.58 TFLOPS. In terms of arithmetic throughput, the highest measured value is 1.84 TFLOPS achieved by the Monotile FPGA backend for the HotSpot benchmark, using the same accelerator.
7979

80-
### Building and Running the Examples
80+
## Building and Running the Examples
8181

82-
#### Environment Setup on Noctua 2
82+
### Environment Setup on Noctua 2
8383

8484
Most of the development of StencilStream was done on the [Noctua 2 supercomputer at the Paderborn Center for Parallel Computing](https://pc2.uni-paderborn.de/systems-and-services/noctua-2). Loading the necessary software on this system handled by one of two scripts. For building and running CPU and FPGA targets, source the following script with the base of the repository as the current working directory:
8585

@@ -95,7 +95,7 @@ source scripts/env_cuda.sh
9595

9696
This will load the necessary software modules and also instantiate the Julia project.
9797

98-
#### Building
98+
### Building
9999

100100
Configure the project from the repository root:
101101

@@ -137,7 +137,7 @@ The targets corresponding to the performance table above are:
137137

138138
Compiled binaries land in `build/examples/<example>/`.
139139

140-
#### Benchmarking
140+
### Benchmarking
141141

142142
Each example has a benchmark script at `examples/<example>/scripts/benchmark.jl`. Run it from the example directory. For Convection, HotSpot, and FDTD:
143143

@@ -164,7 +164,7 @@ cd examples/jacobi
164164

165165
Results are written to `metrics.<variant>.json` in the example directory (Jacobi writes `metrics.<executable-name>.json`).
166166

167-
## Licensing & Citing
167+
# Licensing & Citing
168168

169169
StencilStream is published under MIT license, as found in [LICENSE.md](LICENSE.md). When using StencilStream for a scientific publication, please cite one of the following:
170170

0 commit comments

Comments
 (0)