You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -74,10 +74,10 @@ I think we can get some interesting insights from this data:
74
74
2. Numpy is really great when fully exploited, but it does _not_ solve all of our problems: While it's true that Python code written with Numpy can be extremely fast as long as vectorized math is used, there are simply some algorithms that can not be vectorized. The Gauß-Seidel method used in partdiff is an excellent example for an algorithm that can _not_ be vectorized[^1]. To make Gauß-Seidel as fast as the C version, we need to use some fancier (but also more complex) tricks (see below). But who knows, maybe the Numpy developers will introduce an iterator supporting stencil calculations somehow in the future. And of course, I have to admit that when Numpy vectorized math _does_ work, the process is relatively pain free since we're still just writing relatively simple Python code.
75
75
3. With numba, we can relatively easily get performance that is in the same ballpark as the C version. For `partdiff`, the only things it had problems with was the `dataclass` arguments and the `time()` method. As long as your algorithm only uses simple types (including enums), using numba is relatively easy. Therefore, **`numba` definitely has the highest ratio of performance gain per hours wasted**.
76
76
4. With Cython, we can get even quicker than with numba, getting performance relatively identical to C. This comes at the cost of being relatively annoying to write:
77
-
- All the `cdef` directives add a lot of clutter and you need to `cdef`_everything_
77
+
- All the `cdef` directives add a lot of clutter and you need to `cdef`_everything_.
78
78
- If you don't know what you're doing, Cython can just fall back to slow Python if it doesn't know how to handle something. This can be _very_ annoying.
79
79
- Tooling is a bit cumbersome. While `uv` makes the compilation process quite easy (if you know how), there is no formatter for Cython yet (but ruff might support it in the future [[1]](https://github.com/astral-sh/ruff/issues/10250)).
80
80
While I would argue that writing modern C is _less_ annoying that writing Cython, I still think it's great that we can have these little islands of native code in our Python applications and still get the nice bits of Python for the non-sensitive stuff. So that might ultimately be worth it.
81
-
5. Nuitka is a tool that might make deployment of Python applications slightly easier, but the claim that is boosts your performance is relatively hollow. Setting up a buggy, oversensitive and errorprone toolchain and waiting several minutes (for a fully optimized build) per build for a 2–5% performance boost is _not_ worth it if you ask me.
81
+
5. Nuitka is a tool that might make deployment of Python applications slightly easier, but the claim that it boosts your performance is relatively hollow. Setting up a buggy, oversensitive and errorprone toolchain and waiting several minutes (for a fully optimized build) per build for a 2–5% performance boost is _not_ worth it if you ask me.
82
82
83
-
[^1]: In general, it is not possible to parallelize the Gauß-Seidel without some form of synchronization if bitwise accuracy is needed. MPI can be used to parallelize Gauß-Seidel efficiently which works well for large problem sizes.
83
+
[^1]: In general, it is not possible to parallelize the Gauß-Seidel method without some form of synchronization while still retaining bitwise reproducability. MPI can be used to parallelize Gauß-Seidel efficiently which works well for large problem sizes.
0 commit comments