Skip to content

Commit d5dc428

Browse files
Add a benchmark where the JIT performs well
1 parent 08f2682 commit d5dc428

1 file changed

Lines changed: 33 additions & 3 deletions

File tree

posts/jit-reflections.md

Lines changed: 33 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -75,13 +75,14 @@ Calling a spade a spade: CPython 3.13's JIT is slow. It hurts me to say this con
7575
I work on it, but I don't want to sugarcoat my words here.
7676

7777
The argument at the time was that it was a new feature and we needed to lay the foundations
78-
and test the waters. You might think that surely, CPython 3.14's JIT is a lot faster right? Nope.
78+
and test the waters. You might think that surely, CPython 3.14's JIT is a lot faster right?
79+
In some ways, the JIT has become faster, but only in select scenarios.
7980
The answer is again... complicated. When using a modern compiler like Clang 20
8081
to build CPython 3.14, I often found the interpreter outperforms the JIT. The JIT only really starts reaching
8182
parity or outperforming the interpreter if we use an old compiler like GCC 11 to build the interpreter.
8283
However, IMO that's not entirely fair to the interpreter, as we're purposely limiting it by using a compiler
8384
we _know_ is worse for it. You can see this effect very clearly on Thomas Wouter's analysis
84-
[here](https://github.com/Yhg1s/python-benchmarking-public).
85+
[here](https://github.com/Yhg1s/python-benchmarking-public). Note that this is the geometric mean. So there are select workloads where the JIT does show a real speedup!
8586

8687
![Performance of JIT Compiler across different compilers, Credit Thomas Wouters](jit-reflections-perf.png)
8788
(Image credits to Thomas Wouters). Anything below 1.00x on the graph is a slowdown.
@@ -92,6 +93,33 @@ by random performance bugs on the side (which has happened many times now).
9293
**Note: this result only applies to our x64 benchmarks.**
9394
**I cannot conclude anything about AArch64, which has been improving over time.**
9495

96+
In some cases, we do see significant speedups (up to ~20%) in certain
97+
benchmarks. Indicating that some progress has been made on 3.14. Which is a
98+
good thing! What we're tackling is that the performance
99+
is a mixed bag and often not very predictable. In the
100+
[richards](https://github.com/python/pyperformance/blob/main/pyperformance/data-files/benchmarks/bm_richards/run_benchmark.py) benchmark, we see a ~20% speedup,
101+
but on the
102+
[nbody](https://github.com/python/pyperformance/blob/main/pyperformance/data-files/benchmarks/bm_nbody/run_benchmark.py)
103+
benchmark, we see a ~10% slowdown on my system, and a smaller slowdown for
104+
the
105+
[spectralnorm](https://github.com/python/pyperformance/blob/main/pyperformance/data-files/benchmarks/bm_spectral_norm/run_benchmark.py) benchmark.
106+
All of these are known
107+
to be loop-heavy artificial benchmarks, which V8 has since
108+
[ditched](https://v8.dev/blog/real-world-performance) so in theory, they all
109+
should see a speedup, but they don't, which is strange.
110+
111+
```
112+
3.14 JIT Off:
113+
richards: Mean +- std dev: 44.5 ms +- 0.5 ms
114+
nbody: Mean +- std dev: 91.8 ms +- 3.5 ms
115+
spectral_norm: Mean +- std dev: 90.6 ms +- 0.7 ms
116+
117+
3.14 JIT On:
118+
richards: Mean +- std dev: 37.8 ms +- 2.4 ms
119+
nbody: Mean +- std dev: 104 ms +- 2 ms
120+
spectral_norm: Mean +- std dev: 96.0 ms +- 0.7 ms
121+
````
122+
95123
You might ask: why is the 3.14 JIT not much faster? The real answer, which
96124
again hurts me to say is that the 3.14 JIT has almost no major _optimizer_*
97125
features over 3.13. In 3.14, we were mostly expanding the existing types
@@ -163,9 +191,11 @@ certain major features to enter the CPython JIT in 3.14, but missed them due
163191
to my own lack of time. So I'm not pointing blaming anyone here other than
164192
myself.
165193
166-
Lastly, the (lack-of) performance gains for the JIT are for architectures that
194+
The (lack-of) performance gains for the JIT are for architectures that
167195
I observed (mostly a range of x64 processors). It is possible that some
168196
architectures have real gains that I'm not aware of.
169197
198+
I also added some benchmarks run on my system, where I show a speedup in some
199+
workloads, but a slowdown in others.
170200
171201

0 commit comments

Comments
 (0)