Skip to content

Commit 84331d6

Browse files
committed
asv- benchamrks integration from a non-protected branch
1 parent 416e6f3 commit 84331d6

9 files changed

Lines changed: 1158 additions & 0 deletions

File tree

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,7 @@ mkl_fft/_pydfti.c
99
mkl_fft/_pydfti.cpython*.so
1010
mkl_fft/_pydfti.*-win_amd64.pyd
1111
mkl_fft/src/mklfft.c
12+
13+
# ASV benchmark artifacts
14+
.asv/
15+
benchmarks/.asv/

benchmarks/README.md

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
# mkl_fft ASV Benchmarks
2+
3+
Performance benchmarks for [mkl_fft](https://github.com/IntelPython/mkl_fft) using
4+
[Airspeed Velocity (ASV)](https://asv.readthedocs.io/en/stable/).
5+
6+
## Structure
7+
8+
```
9+
benchmarks/
10+
├── asv.conf.json # ASV configuration (CI-only, no env/build settings)
11+
└── benchmarks/
12+
├── __init__.py # Thread pinning (MKL_NUM_THREADS)
13+
├── bench_fft1d.py # mkl_fft root API — 1-D transforms
14+
├── bench_fftnd.py # mkl_fft root API — 2-D and N-D transforms
15+
├── bench_numpy_fft.py # mkl_fft.interfaces.numpy_fft — full coverage
16+
├── bench_scipy_fft.py # mkl_fft.interfaces.scipy_fft — full coverage
17+
└── bench_memory.py # Peak RSS memory benchmarks
18+
```
19+
20+
### Coverage
21+
22+
| File | API | Transforms |
23+
|------|-----|-----------|
24+
| `bench_fft1d.py` | `mkl_fft` | `fft`, `ifft`, `rfft`, `irfft` — power-of-two and non-power-of-two |
25+
| `bench_fftnd.py` | `mkl_fft` | `fft2`, `ifft2`, `rfft2`, `irfft2`, `fftn`, `ifftn`, `rfftn`, `irfftn` |
26+
| `bench_numpy_fft.py` | `mkl_fft.interfaces.numpy_fft` | All exported functions including Hermitian (`hfft`, `ihfft`) |
27+
| `bench_scipy_fft.py` | `mkl_fft.interfaces.scipy_fft` | All exported functions including Hermitian 2-D/N-D (`hfft2`, `hfftn`) |
28+
| `bench_memory.py` | `mkl_fft` | Peak RSS for 1-D, 2-D, and 3-D transforms |
29+
30+
Benchmarks cover float32, float64, complex64, complex128 dtypes, power-of-two
31+
and non-power-of-two sizes, square and non-square/non-cubic shapes.
32+
33+
## Threading
34+
35+
`__init__.py` pins `MKL_NUM_THREADS` to **4** when the machine has 4 or more
36+
physical cores, or falls back to **1** (single-threaded) otherwise. This keeps
37+
results comparable across CI machines in the shared pool regardless of their
38+
total core count. Physical cores are read from `/proc/cpuinfo` — hyperthreads
39+
are excluded per MKL recommendation.
40+
41+
Override by setting `MKL_NUM_THREADS` in the environment before running ASV.
42+
43+
## Running Locally
44+
45+
> Benchmarks are designed for CI. Local runs require `mkl_fft` to be installed
46+
> in the active Python environment. Benchmarks that exercise SciPy interface
47+
> (`bench_scipy_fft.py`) also require SciPy:
48+
>
49+
> ```bash
50+
> python -m pip install -e ..
51+
> python -m pip install scipy
52+
> ```
53+
54+
```bash
55+
cd benchmarks/
56+
57+
# Quick smoke-run against the current working tree (no env management)
58+
asv run --python=same --quick --show-stderr HEAD^!
59+
60+
# Run a specific benchmark file
61+
asv run --python=same --quick --bench bench_fft1d HEAD^!
62+
63+
# View and publish results
64+
asv publish # generates .asv/html/
65+
asv preview # serves at http://localhost:8080
66+
```
67+
68+
## CI
69+
70+
Benchmarks run automatically in Jenkins on the `auto-bench` node via
71+
`benchmarkHelper.performanceTest()` from the shared library. The pipeline uses:
72+
73+
```bash
74+
asv run --environment existing:<python> --set-commit-hash $COMMIT_SHA
75+
```
76+
77+
This bypasses ASV environment management entirely — mkl_fft is pre-installed
78+
into a conda environment by the pipeline before ASV is invoked.
79+
80+
- **Nightly (prod):** results are published to the benchmark dashboard
81+
- **PR (dev):** `asv compare` output is evaluated for regressions; a 30% slowdown
82+
triggers a failed GitHub commit status
83+
84+
Results are stored in the `mkl_fft-results` branch of
85+
`intel-innersource/libraries.python.intel.infrastructure.benchmark-dashboards`.

benchmarks/asv.conf.json

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
{
2+
"version": 1,
3+
"project": "mkl_fft",
4+
"project_url": "https://github.com/IntelPython/mkl_fft",
5+
"show_commit_url": "https://github.com/IntelPython/mkl_fft/commit/",
6+
"repo": "..",
7+
"branches": [
8+
"master"
9+
],
10+
"benchmark_dir": "benchmarks",
11+
"env_dir": ".asv/env",
12+
"results_dir": ".asv/results",
13+
"html_dir": ".asv/html",
14+
"build_cache_size": 2,
15+
"default_benchmark_timeout": 500,
16+
"regressions_thresholds": {
17+
".*": 0.3
18+
}
19+
}

benchmarks/benchmarks/__init__.py

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
"""ASV benchmarks for mkl_fft.
2+
3+
Thread control — design rationale
4+
----------------------------------
5+
Since we do not have a dedicated CI benchmark machine, benchmarks run on a shared CI pool
6+
whose machines vary in core count over time.
7+
Using the full physical core count of each machine would make results
8+
incomparable across runs on different machines.
9+
10+
Strategy:
11+
- Physical cores >= 4 → fix MKL_NUM_THREADS = 4
12+
4 is the lowest common denominator that guarantees multi-threaded MKL
13+
behavior and is achievable on any modern CI machine. Results from
14+
different machines in the pool are therefore directly comparable.
15+
- Physical cores < 4 → fall back to MKL_NUM_THREADS = 1 (single-threaded)
16+
Prevents over-subscription on under-resourced machines and avoids
17+
misleading comparisons against 4-thread baselines.
18+
19+
MKL recommendation: use physical cores, not logical (hyperthreaded) CPUs.
20+
"""
21+
22+
import os
23+
import re
24+
25+
_MIN_THREADS = 4 # minimum physical cores required for multi-threaded mode
26+
27+
28+
def _physical_cores():
29+
"""Return physical core count from /proc/cpuinfo; fall back to os.cpu_count()."""
30+
try:
31+
with open("/proc/cpuinfo") as f:
32+
content = f.read()
33+
cpu_cores = int(re.search(r"cpu cores\s*:\s*(\d+)", content).group(1))
34+
sockets = max(
35+
len(set(re.findall(r"physical id\s*:\s*(\d+)", content))), 1
36+
)
37+
return cpu_cores * sockets
38+
except Exception:
39+
return os.cpu_count() or 1
40+
41+
42+
def _thread_count():
43+
physical = _physical_cores()
44+
return str(_MIN_THREADS) if physical >= _MIN_THREADS else "1"
45+
46+
47+
_THREADS = os.environ.get("MKL_NUM_THREADS", _thread_count())
48+
os.environ["MKL_NUM_THREADS"] = _THREADS
49+
os.environ.setdefault("OMP_NUM_THREADS", _THREADS)
50+
os.environ.setdefault("OPENBLAS_NUM_THREADS", _THREADS)
Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
"""Benchmarks for 1-D FFT operations using the mkl_fft root API."""
2+
3+
import numpy as np
4+
5+
import mkl_fft
6+
7+
_RNG_SEED = 42
8+
9+
10+
def _make_input(rng, n, dtype):
11+
"""Return a 1-D array of length *n* with the given *dtype*.
12+
13+
Complex dtypes are populated with non-zero imaginary parts so the
14+
benchmark exercises a genuine complex transform path.
15+
"""
16+
dt = np.dtype(dtype)
17+
if dt.kind == "c":
18+
return (rng.standard_normal(n) + 1j * rng.standard_normal(n)).astype(dt)
19+
return rng.standard_normal(n).astype(dt)
20+
21+
22+
# ---------------------------------------------------------------------------
23+
# Complex-to-complex 1-D (power-of-two sizes)
24+
# ---------------------------------------------------------------------------
25+
26+
27+
class TimeFFT1D:
28+
"""Forward and inverse complex FFT — power-of-two sizes."""
29+
30+
params = [
31+
[64, 256, 1024, 4096, 16384, 65536],
32+
["float32", "float64", "complex64", "complex128"],
33+
]
34+
param_names = ["n", "dtype"]
35+
36+
def setup(self, n, dtype):
37+
rng = np.random.default_rng(_RNG_SEED)
38+
self.x = _make_input(rng, n, dtype)
39+
40+
def time_fft(self, n, dtype):
41+
mkl_fft.fft(self.x)
42+
43+
def time_ifft(self, n, dtype):
44+
mkl_fft.ifft(self.x)
45+
46+
47+
# ---------------------------------------------------------------------------
48+
# Real-to-complex / complex-to-real 1-D (power-of-two sizes)
49+
# ---------------------------------------------------------------------------
50+
51+
52+
class TimeRFFT1D:
53+
"""Forward rfft and inverse irfft — power-of-two sizes."""
54+
55+
params = [
56+
[64, 256, 1024, 4096, 16384, 65536],
57+
["float32", "float64"],
58+
]
59+
param_names = ["n", "dtype"]
60+
61+
def setup(self, n, dtype):
62+
rng = np.random.default_rng(_RNG_SEED)
63+
cdtype = "complex64" if dtype == "float32" else "complex128"
64+
self.x_real = rng.standard_normal(n).astype(dtype)
65+
# irfft input: complex half-spectrum of length n//2+1
66+
self.x_complex = (
67+
rng.standard_normal(n // 2 + 1)
68+
+ 1j * rng.standard_normal(n // 2 + 1)
69+
).astype(cdtype)
70+
71+
def time_rfft(self, n, dtype):
72+
mkl_fft.rfft(self.x_real)
73+
74+
def time_irfft(self, n, dtype):
75+
mkl_fft.irfft(self.x_complex, n=n)
76+
77+
78+
# ---------------------------------------------------------------------------
79+
# Complex-to-complex 1-D (non-power-of-two sizes)
80+
# ---------------------------------------------------------------------------
81+
82+
83+
class TimeFFT1DNonPow2:
84+
"""Forward and inverse complex FFT — non-power-of-two sizes.
85+
86+
MKL uses a different code path for non-power-of-two transforms;
87+
this suite catches regressions in that path.
88+
"""
89+
90+
params = [
91+
[127, 509, 1000, 4001, 10007],
92+
["float64", "complex128", "complex64"],
93+
]
94+
param_names = ["n", "dtype"]
95+
96+
def setup(self, n, dtype):
97+
rng = np.random.default_rng(_RNG_SEED)
98+
self.x = _make_input(rng, n, dtype)
99+
100+
def time_fft(self, n, dtype):
101+
mkl_fft.fft(self.x)
102+
103+
def time_ifft(self, n, dtype):
104+
mkl_fft.ifft(self.x)
105+
106+
107+
# ---------------------------------------------------------------------------
108+
# Real-to-complex / complex-to-real 1-D (non-power-of-two sizes)
109+
# ---------------------------------------------------------------------------
110+
111+
112+
class TimeRFFT1DNonPow2:
113+
"""Forward rfft and inverse irfft — non-power-of-two sizes."""
114+
115+
params = [
116+
[127, 509, 1000, 4001, 10007],
117+
["float32", "float64"],
118+
]
119+
param_names = ["n", "dtype"]
120+
121+
def setup(self, n, dtype):
122+
rng = np.random.default_rng(_RNG_SEED)
123+
cdtype = "complex64" if dtype == "float32" else "complex128"
124+
self.x_real = rng.standard_normal(n).astype(dtype)
125+
self.x_complex = (
126+
rng.standard_normal(n // 2 + 1)
127+
+ 1j * rng.standard_normal(n // 2 + 1)
128+
).astype(cdtype)
129+
130+
def time_rfft(self, n, dtype):
131+
mkl_fft.rfft(self.x_real)
132+
133+
def time_irfft(self, n, dtype):
134+
mkl_fft.irfft(self.x_complex, n=n)

0 commit comments

Comments
 (0)