Skip to content

Commit c6b07e5

Browse files
FBumannclaudeFabianHofmann
authored
chore: benchmarks (#567)
* Add internal benchmark suite for performance tracking Adds benchmarks/ directory with pytest-benchmark for timing and pytest-memray for peak memory measurement across problem sizes. Models: basic (dense N*N), knapsack (N binary vars), expression arithmetic (broadcasting/scaling), sparse network (ring topology), and pypsa_scigrid (real power system). Timing phases: build (test_build.py), LP write (test_lp_write.py), matrix generation (test_matrices.py). Memory benchmarks (memory.py) measure the build phase only — memray tracks all allocations within a test including setup, so other phases would conflate build and phase-specific memory. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Exclude benchmarks from codecov coverage reporting Benchmarks are not run in CI and should not affect coverage metrics. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Allow 1% coverage threshold for codecov project check Prevents false failures from minor coverage fluctuations when adding non-library files like benchmarks or config changes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Revert codecov threshold change The codecov/project failure is a pre-existing repo-wide issue (multiple open PRs fail the same check), not caused by this PR. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Fabian Hofmann <fab.hof@gmx.de>
1 parent 472ecc9 commit c6b07e5

17 files changed

+688
-2
lines changed

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,10 @@ ENV/
3232
env.bak/
3333
venv.bak/
3434

35+
# Benchmarks (new pytest-benchmark suite)
36+
.benchmarks/
37+
38+
# Benchmarks (old Snakemake suite in benchmark/)
3539
benchmark/*.pdf
3640
benchmark/benchmarks
3741
benchmark/.snakemake

benchmarks/README.md

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# Internal Performance Benchmarks
2+
3+
Measures linopy's own performance (build time, LP write speed, memory usage) across problem sizes using [pytest-benchmark](https://pytest-benchmark.readthedocs.io/) and [pytest-memray](https://pytest-memray.readthedocs.io/). Use these to check whether a code change introduces a regression or improvement.
4+
5+
> **Note:** The `benchmark/` directory (singular) contains *external* benchmarks comparing linopy against other modeling frameworks. This directory (`benchmarks/`) is for *internal* performance tracking only.
6+
7+
## Setup
8+
9+
```bash
10+
pip install -e ".[benchmarks]"
11+
```
12+
13+
## Running benchmarks
14+
15+
```bash
16+
# Quick smoke test (small sizes only)
17+
pytest benchmarks/ --quick
18+
19+
# Full timing benchmarks
20+
pytest benchmarks/test_build.py benchmarks/test_lp_write.py benchmarks/test_matrices.py
21+
22+
# Run a specific model
23+
pytest benchmarks/test_build.py -k basic
24+
```
25+
26+
## Comparing timing between branches
27+
28+
```bash
29+
# Save baseline results on master
30+
git checkout master
31+
pytest benchmarks/test_build.py --benchmark-save=master
32+
33+
# Switch to feature branch and compare
34+
git checkout my-feature
35+
pytest benchmarks/test_build.py --benchmark-save=my-feature --benchmark-compare=0001_master
36+
37+
# Compare saved results without re-running
38+
pytest-benchmark compare 0001_master 0002_my-feature --columns=median,iqr
39+
```
40+
41+
Results are stored in `.benchmarks/` (gitignored).
42+
43+
## Memory benchmarks
44+
45+
`memory.py` runs each test in a separate process with pytest-memray to get accurate per-test peak memory (including C/numpy allocations). Results are saved as JSON and can be compared across branches.
46+
47+
By default, only the build phase (`test_build.py`) is measured. Unlike timing benchmarks where `benchmark()` isolates the measured function, memray tracks all allocations within a test — including model construction in setup. This means LP write and matrix tests would report build + phase memory combined, making the phase-specific contribution impossible to isolate. Since model construction dominates memory usage, measuring build alone gives the most actionable numbers.
48+
49+
```bash
50+
# Save baseline on master
51+
git checkout master
52+
python benchmarks/memory.py save master
53+
54+
# Save feature branch
55+
git checkout my-feature
56+
python benchmarks/memory.py save my-feature
57+
58+
# Compare
59+
python benchmarks/memory.py compare master my-feature
60+
61+
# Quick mode (smaller sizes, faster)
62+
python benchmarks/memory.py save master --quick
63+
64+
# Measure a specific phase (includes build overhead)
65+
python benchmarks/memory.py save master --test-path benchmarks/test_lp_write.py
66+
```
67+
68+
Results are stored in `.benchmarks/memory/` (gitignored). Requires Linux or macOS (memray is not available on Windows).
69+
70+
> **Note:** Small tests (~5 MiB) are near the import-overhead floor and may show noise of ~1 MiB between runs. Focus on larger tests for meaningful memory comparisons. Do not combine `--memray` with timing benchmarks — memray adds ~2x overhead that invalidates timing results.
71+
72+
## Models
73+
74+
| Model | Description | Sizes |
75+
|-------|-------------|-------|
76+
| `basic` | Dense N*N model, 2*N^2 vars/cons | 10 — 1600 |
77+
| `knapsack` | N binary variables, 1 constraint | 100 — 1M |
78+
| `expression_arithmetic` | Broadcasting, scaling, summation across dims | 10 — 1000 |
79+
| `sparse_network` | Ring network with mismatched bus/line coords | 10 — 1000 |
80+
| `pypsa_scigrid` | Real power system (requires `pypsa`) | 10 — 200 snapshots |
81+
82+
## Phases
83+
84+
| Phase | File | What it measures |
85+
|-------|------|------------------|
86+
| Build | `test_build.py` | Model construction (add_variables, add_constraints, add_objective) |
87+
| LP write | `test_lp_write.py` | Writing the model to an LP file |
88+
| Matrices | `test_matrices.py` | Generating sparse matrices (A, b, c, bounds) from the model |
89+
90+
## Adding a new model
91+
92+
1. Create `benchmarks/models/my_model.py` with a `build_my_model(n)` function and a `SIZES` list
93+
2. Add parametrized tests in the relevant `test_*.py` files
94+
3. Add a quick threshold in `conftest.py`

benchmarks/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
"""Linopy benchmark suite — run with ``pytest benchmarks/`` (use ``--quick`` for smaller sizes)."""

benchmarks/conftest.py

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
"""Benchmark configuration and shared fixtures."""
2+
3+
from __future__ import annotations
4+
5+
import pytest
6+
7+
QUICK_THRESHOLD = {
8+
"basic": 100,
9+
"knapsack": 10_000,
10+
"pypsa_scigrid": 50,
11+
"expression_arithmetic": 100,
12+
"sparse_network": 100,
13+
}
14+
15+
16+
def pytest_addoption(parser):
17+
parser.addoption(
18+
"--quick",
19+
action="store_true",
20+
default=False,
21+
help="Use smaller problem sizes for quick benchmarking",
22+
)
23+
24+
25+
def skip_if_quick(request, model: str, size: int):
26+
"""Skip large sizes when --quick is passed."""
27+
if request.config.getoption("--quick"):
28+
threshold = QUICK_THRESHOLD.get(model, float("inf"))
29+
if size > threshold:
30+
pytest.skip(f"--quick: skipping {model} size {size}")

benchmarks/memory.py

Lines changed: 199 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,199 @@
1+
#!/usr/bin/env python
2+
"""
3+
Measure and compare peak memory using pytest-memray.
4+
5+
Usage:
6+
# Save a baseline (on master)
7+
python benchmarks/memory.py save master
8+
9+
# Save current branch
10+
python benchmarks/memory.py save my-feature
11+
12+
# Compare two saved runs
13+
python benchmarks/memory.py compare master my-feature
14+
15+
# Quick mode (smaller sizes)
16+
python benchmarks/memory.py save master --quick
17+
18+
Results are stored in .benchmarks/memory/.
19+
"""
20+
21+
from __future__ import annotations
22+
23+
import argparse
24+
import json
25+
import platform
26+
import re
27+
import subprocess
28+
import sys
29+
from pathlib import Path
30+
31+
if platform.system() == "Windows":
32+
raise RuntimeError(
33+
"memory.py requires pytest-memray which is not available on Windows. "
34+
"Run memory benchmarks on Linux or macOS."
35+
)
36+
37+
RESULTS_DIR = Path(".benchmarks/memory")
38+
MEMORY_RE = re.compile(
39+
r"Allocation results for (.+?) at the high watermark\s+"
40+
r"📦 Total memory allocated: ([\d.]+)(MiB|KiB|GiB|B)",
41+
)
42+
# Only the build phase is measured by default. Unlike timing benchmarks (where
43+
# pytest-benchmark isolates the measured function), memray tracks all allocations
44+
# within a test — including model construction in setup. This means LP write and
45+
# matrix tests would report build + phase memory combined, making the phase-specific
46+
# contribution hard to isolate. Since model construction dominates memory usage,
47+
# measuring build alone gives the most accurate and actionable numbers.
48+
DEFAULT_TEST_PATHS = [
49+
"benchmarks/test_build.py",
50+
]
51+
52+
53+
def _to_mib(value: float, unit: str) -> float:
54+
factors = {"B": 1 / 1048576, "KiB": 1 / 1024, "MiB": 1, "GiB": 1024}
55+
return value * factors[unit]
56+
57+
58+
def _collect_test_ids(test_paths: list[str], quick: bool) -> list[str]:
59+
"""Collect test IDs without running them."""
60+
cmd = [
61+
sys.executable,
62+
"-m",
63+
"pytest",
64+
*test_paths,
65+
"--collect-only",
66+
"-q",
67+
]
68+
if quick:
69+
cmd.append("--quick")
70+
result = subprocess.run(cmd, capture_output=True, text=True)
71+
return [
72+
line.strip()
73+
for line in result.stdout.splitlines()
74+
if "::" in line and not line.startswith(("=", "-", " "))
75+
]
76+
77+
78+
def save(label: str, quick: bool = False, test_paths: list[str] | None = None) -> Path:
79+
"""Run each benchmark in a separate process for accurate memory measurement."""
80+
if test_paths is None:
81+
test_paths = DEFAULT_TEST_PATHS
82+
test_ids = _collect_test_ids(test_paths, quick)
83+
if not test_ids:
84+
print("No tests collected.", file=sys.stderr)
85+
sys.exit(1)
86+
87+
print(f"Running {len(test_ids)} tests (each in a separate process)...")
88+
entries = {}
89+
for i, test_id in enumerate(test_ids, 1):
90+
short = test_id.split("::")[-1]
91+
print(f" [{i}/{len(test_ids)}] {short}...", end=" ", flush=True)
92+
93+
cmd = [
94+
sys.executable,
95+
"-m",
96+
"pytest",
97+
test_id,
98+
"--memray",
99+
"--benchmark-disable",
100+
"-v",
101+
"--tb=short",
102+
"-q",
103+
]
104+
result = subprocess.run(cmd, capture_output=True, text=True)
105+
output = result.stdout + result.stderr
106+
107+
match = MEMORY_RE.search(output)
108+
if match:
109+
value = float(match.group(2))
110+
unit = match.group(3)
111+
mib = round(_to_mib(value, unit), 3)
112+
entries[test_id] = mib
113+
print(f"{mib:.1f} MiB")
114+
elif "SKIPPED" in output or "skipped" in output:
115+
print("skipped")
116+
else:
117+
print(
118+
"WARNING: no memray data (pytest-memray output format may have changed)",
119+
file=sys.stderr,
120+
)
121+
122+
if not entries:
123+
print("No memray results found. Is pytest-memray installed?", file=sys.stderr)
124+
sys.exit(1)
125+
126+
RESULTS_DIR.mkdir(parents=True, exist_ok=True)
127+
out_path = RESULTS_DIR / f"{label}.json"
128+
out_path.write_text(json.dumps({"label": label, "peak_mib": entries}, indent=2))
129+
print(f"\nSaved {len(entries)} results to {out_path}")
130+
return out_path
131+
132+
133+
def compare(label_a: str, label_b: str) -> None:
134+
"""Compare two saved memory results."""
135+
path_a = RESULTS_DIR / f"{label_a}.json"
136+
path_b = RESULTS_DIR / f"{label_b}.json"
137+
for p in (path_a, path_b):
138+
if not p.exists():
139+
print(f"Not found: {p}. Run 'save {p.stem}' first.", file=sys.stderr)
140+
sys.exit(1)
141+
142+
data_a = json.loads(path_a.read_text())["peak_mib"]
143+
data_b = json.loads(path_b.read_text())["peak_mib"]
144+
145+
all_tests = sorted(set(data_a) | set(data_b))
146+
147+
print(f"\n{'Test':<60} {label_a:>10} {label_b:>10} {'Change':>10}")
148+
print("-" * 94)
149+
150+
for test in all_tests:
151+
a = data_a.get(test)
152+
b = data_b.get(test)
153+
a_str = f"{a:.1f}" if a is not None else "—"
154+
b_str = f"{b:.1f}" if b is not None else "—"
155+
if a is not None and b is not None and a > 0:
156+
pct = (b - a) / a * 100
157+
change = f"{pct:+.1f}%"
158+
else:
159+
change = "—"
160+
# Shorten test name for readability
161+
short = test.split("::")[-1] if "::" in test else test
162+
print(f"{short:<60} {a_str:>10} {b_str:>10} {change:>10}")
163+
164+
print()
165+
166+
167+
def main():
168+
parser = argparse.ArgumentParser(
169+
description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter
170+
)
171+
sub = parser.add_subparsers(dest="cmd", required=True)
172+
173+
p_save = sub.add_parser("save", help="Run benchmarks and save memory results")
174+
p_save.add_argument(
175+
"label", help="Label for this run (e.g. 'master', 'my-feature')"
176+
)
177+
p_save.add_argument(
178+
"--quick", action="store_true", help="Use smaller problem sizes"
179+
)
180+
p_save.add_argument(
181+
"--test-path",
182+
nargs="+",
183+
default=None,
184+
help="Test file(s) to run (default: all phases)",
185+
)
186+
187+
p_cmp = sub.add_parser("compare", help="Compare two saved runs")
188+
p_cmp.add_argument("label_a", help="First run label (baseline)")
189+
p_cmp.add_argument("label_b", help="Second run label")
190+
191+
args = parser.parse_args()
192+
if args.cmd == "save":
193+
save(args.label, quick=args.quick, test_paths=args.test_path)
194+
elif args.cmd == "compare":
195+
compare(args.label_a, args.label_b)
196+
197+
198+
if __name__ == "__main__":
199+
main()

benchmarks/models/__init__.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
"""Model builders for benchmarks."""
2+
3+
from benchmarks.models.basic import SIZES as BASIC_SIZES
4+
from benchmarks.models.basic import build_basic
5+
from benchmarks.models.expression_arithmetic import SIZES as EXPR_SIZES
6+
from benchmarks.models.expression_arithmetic import build_expression_arithmetic
7+
from benchmarks.models.knapsack import SIZES as KNAPSACK_SIZES
8+
from benchmarks.models.knapsack import build_knapsack
9+
from benchmarks.models.sparse_network import SIZES as SPARSE_SIZES
10+
from benchmarks.models.sparse_network import build_sparse_network
11+
12+
__all__ = [
13+
"BASIC_SIZES",
14+
"EXPR_SIZES",
15+
"KNAPSACK_SIZES",
16+
"SPARSE_SIZES",
17+
"build_basic",
18+
"build_expression_arithmetic",
19+
"build_knapsack",
20+
"build_sparse_network",
21+
]

benchmarks/models/basic.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
"""Basic benchmark model: 2*N^2 variables and constraints."""
2+
3+
from __future__ import annotations
4+
5+
import linopy
6+
7+
SIZES = [10, 50, 100, 250, 500, 1000, 1600]
8+
9+
10+
def build_basic(n: int) -> linopy.Model:
11+
"""Build a basic N*N model with 2*N^2 vars and 2*N^2 constraints."""
12+
m = linopy.Model()
13+
x = m.add_variables(coords=[range(n), range(n)], dims=["i", "j"], name="x")
14+
y = m.add_variables(coords=[range(n), range(n)], dims=["i", "j"], name="y")
15+
m.add_constraints(x + y <= 10, name="upper")
16+
m.add_constraints(x - y >= -5, name="lower")
17+
m.add_objective(x.sum() + 2 * y.sum())
18+
return m

0 commit comments

Comments
 (0)