Skip to content

Commit ded05c1

Browse files
committed
Update documentation and perform benchmarking
1 parent c60d480 commit ded05c1

6 files changed

Lines changed: 109 additions & 5 deletions

File tree

README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -113,3 +113,6 @@ An exhaustive list of features supported by this tool
113113

114114
### [Advanced](docs/advanced.md)
115115
An architectural explanation of how this tool works, internally.
116+
117+
### [Optimizations](docs/optimization.md)
118+
A list of optimizations used in this project's developments

docs/advanced.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,9 +95,18 @@ $$
9595
9696
Where $|l(z)| = \frac{2}{\pi}\arctan(|z|)$
9797
98+
Currently, we also use a gamma factor, $k$, to make the light ascent from black to white slower, as per request of one of the users. The new definition is simply
99+
100+
$|l(z)| = \frac{2}{\pi}\arctan(|z|^k)$
101+
98102
## 3D mode
99103
3D mode is rendered by making an NxN grid mesh, handled by the `Mesh` struct and `create_grid_mesh` function, and shaping it around the function's "real" shape (picking points in set spaces and placing vertices there).
100104
Height is directly mapped to |f(z)|, so information is redundant. Movement is done through a custom `Camera` struct.
101105
102106
## Picker
103-
Whenever you hover over a value and get a number back, this is done through a "picker" shader. It's a 1x1 grid rendered off-screen that calculates the value of f(z) precisely where you are hovering. The result is then shared as an `out vec4`, the first two floats being `z`, and second two, `f(z)`.
107+
Whenever you hover over a value and get a number back, this is done through a "picker" shader. It's a 1x1 grid rendered off-screen that calculates the value of f(z) precisely where you are hovering. The result is then shared as an `out vec4`, the first two floats being `z`, and second two, `f(z)`.
108+
109+
## Ultra-High Precision Mode
110+
Ultra-High Precision Mode, or Arbitrary Precision Mode, works by running this program's shaders in the CPU. They are first transpiled from GLSL to C++ in the build step, done in `math_transpiler.py`, and a map from a GLSL function definition to a C++ function definition is set.
111+
At runtime, the function stack is read and evaluated. This is done through multiple threads, that concurrently update their rows. This allows for the user to see the function as it's being updated.
112+
There exists a custom made math library, but the final product uses Boost::Multiprecision for performance. It is set for 50 digits of precision.
729 KB
Loading

docs/features.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,3 +65,7 @@ Time is a supported variable, `t`. Modulo and trigonometric operators are suppor
6565
- `(1-sin(t))*z + sin(t)*z^2`
6666
- `tan(z)^(sin(t)) % sin(z)`
6767
![Animation of tan(z)^(sin(t)) % sin(z)](assets/gifs/Animation.gif)
68+
69+
## Ultra high precision mode
70+
Plots can be rendered in arbitrary precision through the "Arbitrary Precision" subheader. This means that functions will be rendered up to a user-defined value of decimal digits, which defaults to 50. This is particularly useful for seeing fractal-like functions in high zoom values
71+
![Render of mandelbrot set with UHPM versus standard mode](assets/high_precision_mandelbrot.png)

docs/optimization.md

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# Optimizations
2+
There are many little optimizations done all throughout this project, that makes it able to run in very fast speeds.
3+
4+
## Benchmarking
5+
First, it is important to show that the optimizations used in this tool actually provide a substantial performance increase. For such, 4 distinct publicly available complex plotter were used for comparison:
6+
- Samuel Li's [Complex Function Plotter](https://samuelj.li/complex-function-plotter/). The main inspiration for this project.
7+
- Peter E. Francis' [Complex Function Plot](https://peterefrancis.com/complex-function-plot/plotter.html). A CPU based plotter
8+
- David Bau's [Conformal Map Plotter](https://davidbau.com/conformal/). While this function does not use regular domain coloring, it _still_ is a complex function plotter
9+
- Fernando Theodoro & Mateus Bastazini's [Complex Functions](https://www2.fc.unesp.br/matematicaearte/plotter/). A project I contributed to previously
10+
11+
The computer for benchmarking had the following specs:
12+
CPU: Intel Core Ultra 7 255HX
13+
Random Memory: 16GB DDR5 5600 MT/s
14+
Integrated Graphics Card: NVIDIA Geforce RTX 5060
15+
16+
All benchmarking measures the _parsing and drawing_ time for different complex functions.
17+
1. Polynomial: $z^{10} - z^9 - z^8 - z^7 - z^6 - z^5 - z^4 - z^3 - z^2 - z - 1$
18+
2. Trigonometric and exponential: $\sin(\cos(\tan(z))) \cdot e^z$
19+
3. Non-elementary: $\Gamma(\zeta(z))$
20+
21+
| Creator | Uses GPU | Polynomial | Trigonometric and exponential | Non-elementary |
22+
| :--- | :---: | :--- | :--- | :--- |
23+
| **Peter E. Francis** | No | 11336 ms | 7098 ms | *Not supported* |
24+
| **Samuel Li** | Yes | 279 ms | 459 ms | 11369 ms |
25+
| **David Bau** | Yes | 369 ms | 120 ms | *Not supported* |
26+
| **Theodoro & Bastazini** | Yes | 226 ms | 293 ms | 194 ms |
27+
| **This (Web)** | Yes | 4.14 ms | **2.42 ms** | **2.27 ms** |
28+
| **This (Desktop)**| Yes | **2.91 ms** | 3.93 ms | 3.14 ms |
29+
30+
It is important to notice that the web version outperforms the desktop version for small functions. This is likely due to optimizations introduced in the Emscripten to WebAssembly optimizations, besides the transpilation from OpenGL to WebGL.
31+
32+
It can be observed from the table that this tool outperforms all of its other competitors. This is due to our optimizations!
33+
34+
## Architectural Optimizations
35+
36+
### Compiled and Interpreted Modes
37+
One of the main bottlenecks of GPU based renderers is dynamically writing the shader code, and compiling it at runtime, which introduces a short stutter. This tool uses **per-request** compiling, meaning that it only compiles the shader if the user splicitly requests it.
38+
Otherwise, the parsed expression is transformed into a sequence of bytecode instructions in Reverse-Polish Notation and sent to the shader as a texture. The shader then evaluates this as a stack. This requires no recompilations.
39+
For higher performance, the expression can also be turned into a GLSL string, and recompiled.
40+
41+
### Web Assembly and C++
42+
This tool is mainly desktop-focused, but the web version is compiled directly from C++ into WebAssembly using Emscripten. Web Assembly is a far lower-level, higher-performance alternative to Javascript, which makes its use a big player in outperforming other javascript-based plotters.
43+
44+
### GPU Rendering
45+
All simple calculations are done in the GPU, rather than the CPU. This allows the plots to be drawn far faster than their CPU counterparts (as evidenced in the Benchmarking section)
46+
47+
### Threading
48+
For the high precision CPU renders, the plot is drawn in multiple concurrent threads.
49+
50+
## Numeric optimizations
51+
### Constant Folding
52+
Constant expressions are evaluated before the drawing and rendering logic. For instance, `(13 * 7)*z` will be sent for rendering as `91 * z`
53+
54+
### Simplification
55+
Expressions that can be evaluated to constants will be, during runtime. Such examples include:
56+
1. Division by itself `z/z = 1` (with an added singularity at `z=0`)
57+
2. Composition of inverses `sin(arcsin(z)) = z`
58+
59+
### Analyic Differentiation
60+
Derivatives are calculated analytically rather than numerically. This means that an expression such as `d/dz(z^2 + z)` is correctly transformed into `2z + 1` in parse-time, rather than numerically evaluated through approximations in render-time.
61+
62+
## Memory Optimizations
63+
64+
### Manual Register Management and Mutable Math
65+
In the now deprecated arbitrary-precision math library for the GPU, registers were manually managed. Rather than using local variables, 16 registers of shared memory was used between every complex function. This was transpiled automatically from the low precision GLSL.
66+
Also, this library uses non-functional math. This means that rather than functions returning a new copy to their result, they modify a given register. Such an example is the hp_add function:
67+
```glsl
68+
number R[16];
69+
number hp_add(number a, number b){
70+
void hp_add(in number a, in number b, out number res) {
71+
if (a.sign == b.sign) {
72+
abs_sum(a, b, R[0]);
73+
res.sign = a.sign;
74+
return;
75+
}
76+
int cmp = 0;
77+
compare_abs(a, b, cmp);
78+
if (cmp >= 0) {
79+
abs_hp_sub(a, b, R[0]);
80+
res.sign = a.sign;
81+
return;
82+
}
83+
abs_hp_sub(b, a, R[0]);
84+
R[0].sign = b.sign;
85+
res = R[0];
86+
}
87+
}
88+
```

src/main.cpp

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
#include <preprocessor/transpiler.h>
2323
#include <cpu_drawing/cpu_render.h>
2424
#include <glsl_generated/generated_math_mapper.h>
25+
#include <chrono>
2526

2627

2728
#include <preprocessor/string_builder.h>
@@ -140,7 +141,6 @@ void export_to_png(AppContext* ctx, int target_width, int target_height, const c
140141
if (ctx->function_state->is_3d) glEnable(GL_DEPTH_TEST);
141142
else glDisable(GL_DEPTH_TEST);
142143

143-
draw_scene(ctx, (float)target_width, (float)target_height);
144144

145145
glPixelStorei(GL_PACK_ALIGNMENT, 1);
146146
unsigned char* pixels = new unsigned char[target_width * target_height * 4];
@@ -199,7 +199,6 @@ void main_loop_step(AppContext* ctx) {
199199

200200
ctx->function_state->is_3d = ctx->view_state->is_3d;
201201
init_imgui_loop();
202-
203202
if(ctx->view_state->wants_high_precision){
204203
const int hp_width = ctx->view_state->hp_width;
205204
const int hp_height = ctx->view_state->hp_height;
@@ -245,6 +244,7 @@ void main_loop_step(AppContext* ctx) {
245244
render_and_update(*(ctx->function_state), *(ctx->view_state), stack_tbo_texture, constants_tbo_texture, *(ctx->shader_program), *(ctx->compiled_shader));
246245
}
247246

247+
248248
draw_scene(ctx, ctx->view_state->width, ctx->view_state->height);
249249

250250
if (is_3d != ctx->view_state->is_3d) {
@@ -338,10 +338,10 @@ int main() {
338338
const string frag_source_3d = get_source("shaders/plotter3d.frag");
339339
try {
340340
vert_source_3d = build_shader_string(vert_source_3d, frag_source);
341-
std::cout << vert_source_3d;
341+
//std::cout << vert_source_3d;
342342
}
343343
catch (std::runtime_error& er) {
344-
std::cout << er.what();
344+
//std::cout << er.what();
345345
}
346346
shader_3d.compile(vert_source_3d, frag_source_3d);
347347

0 commit comments

Comments
 (0)