|
| 1 | +# FFT Benchmark for FPGA |
| 2 | + |
| 3 | +This repository contains the FFT Benchmark for FPGA and its OpenCL kernels. |
| 4 | +Currently only the Intel FPGA SDK for OpenCL utility is supported. |
| 5 | + |
| 6 | +It is based on the FFT benchmark of the [HPC Challenge Benchmark](https://icl.utk.edu/hpcc/) suite. |
| 7 | +The FFT1D reference implementation is used for the kernel code. |
| 8 | + |
| 9 | +## Dependencies |
| 10 | + |
| 11 | +The benchmark comes with the following requirements for building and running: |
| 12 | + |
| 13 | +- CMake 2.8 |
| 14 | +- GCC 4.9 |
| 15 | +- Intel OpenCL FPGA SDK 19.3 |
| 16 | + |
| 17 | +It also contains submodules that will be automatically updated when running cmake: |
| 18 | + |
| 19 | +- cxxopts: A header only library to parse command line parameters |
| 20 | +- googletest: A C++ test framework |
| 21 | + |
| 22 | +## Build |
| 23 | + |
| 24 | +CMake is used as the build system. |
| 25 | +The targets below can be used to build the benchmark and its kernels: |
| 26 | + |
| 27 | + | Target | Description | |
| 28 | + | -------- | ---------------------------------------------- | |
| 29 | + | fFFT | Builds the host application | |
| 30 | + | Google_Tests_run| Compile the tests and its dependencies | |
| 31 | + |
| 32 | + More over the are additional targets to generate kernel reports and bitstreams. |
| 33 | + The provided kernel is optimized for Stratix 10 with 512bit LSUs. |
| 34 | + The kernel targets are: |
| 35 | + |
| 36 | + | Target | Description | |
| 37 | + | -------- | ---------------------------------------------- | |
| 38 | + | fft1d_float_8 | Synthesizes the kernel (takes several hours!) | |
| 39 | + | fft1d_float_8_report | Create an HTML report for the kernel | |
| 40 | + | fft1d_float_8_emulate | Create a n emulation kernel | |
| 41 | + |
| 42 | + |
| 43 | + You can build for example the host application by running |
| 44 | + |
| 45 | + mkdir build && cd build |
| 46 | + cmake .. |
| 47 | + make fFFT |
| 48 | + |
| 49 | +You will find all executables and kernel files in the `bin` |
| 50 | +folder of your build directory. |
| 51 | +You should always specify a target with make to reduce the build time! |
| 52 | +You might want to specify predefined parameters before build: |
| 53 | + |
| 54 | +Name | Default | Description | |
| 55 | +---------------- |-------------|--------------------------------------| |
| 56 | +`DEFAULT_DEVICE` | -1 | Index of the default device (-1 = ask) | |
| 57 | +`DEFAULT_PLATFORM`| -1 | Index of the default platform (-1 = ask) | |
| 58 | +`FPGA_BOARD_NAME`| p520_hpc_sg280l | Name of the target board | |
| 59 | +`DEFAULT_REPETITIONS`| 10 | Number of times the kernel will be executed | |
| 60 | +`DEFAULT_ITERATIONS`| 100 | Default number of iterations that is done with a single kernel execution| |
| 61 | +`LOG_FFT_SIZE` | 12 | Log2 of the FFT Size that has to be used i.e. 3 leads to a FFT Size of 2^3=8| |
| 62 | +`AOC_FLAGS`| `-fpc -fp-relaxed` | Additional AOC compiler flags that are used for kernel compilation | |
| 63 | + |
| 64 | +Moreover the environment variable `INTELFPGAOCLSDKROOT` has to be set to the root |
| 65 | +of the Intel FPGA SDK installation. |
| 66 | + |
| 67 | +Additionally it is possible to set the used compiler and other build tools |
| 68 | +in the `CMakeCache.txt` located in the build directory after running cmake. |
| 69 | + |
| 70 | + |
| 71 | + |
| 72 | +## Execution |
| 73 | + |
| 74 | +For execution of the benchmark run: |
| 75 | + |
| 76 | + ./fFFT -f path_to_kernel.aocx |
| 77 | + |
| 78 | +For more information on available input parameters run |
| 79 | + |
| 80 | + ./fFFT -h |
| 81 | + |
| 82 | +To execute the unit and integration tests run |
| 83 | + |
| 84 | + ./Google_Tests_run |
| 85 | + |
| 86 | +in the `bin` folder within the build directory. |
| 87 | +It will run an emulation of the kernel and execute some functionality tests. |
| 88 | + |
| 89 | +## Output Interpretation |
| 90 | + |
| 91 | +The benchmark will print the following two tables to standard output after execution: |
| 92 | + |
| 93 | + res. error mach. eps |
| 94 | + 2.67000e-01 1.19209e-07 |
| 95 | + |
| 96 | + avg best |
| 97 | + Time in s: 7.56801e-03 7.07241e-03 |
| 98 | + GFLOPS: 3.24735e-02 3.47491e-02 |
| 99 | + |
| 100 | +The first table contains the maximum residual error of the calculation and the |
| 101 | +machine epsilon that was used to calculate the residual error. |
| 102 | +The benchmark will perform a FFT with the FPGA kernel on random input data. |
| 103 | +In a second step the resulting data will be used as input for an iFFT using a CPU |
| 104 | +reference implementation in double precision. |
| 105 | +The residual error is then calculated with: |
| 106 | + |
| 107 | +}) |
| 108 | + |
| 109 | +where `x` is the input data of the FFT, `x'` the resulting data from the iFFT, epsilon the machine epsilon and `n` the FFT size. |
| 110 | + |
| 111 | +In the second table the measured execution times and calculated FLOPs are given. |
| 112 | +It gives the average and bast for both. |
| 113 | +The time gives the averaged execution time for a single FFT in case of a batched execution (an execution with more than one iteration). |
| 114 | +They are also used to calculate the FLOPs. |
0 commit comments