Skip to content

Commit 5d96733

Browse files
committed
Back to original state of "works but crashes for big expressions", but this time, our zooming logic works
1 parent 6d39bca commit 5d96733

14 files changed

Lines changed: 2764 additions & 446 deletions

devlog.md

Lines changed: 9 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,14 @@
1-
Rendering images of a selected area with arbitrarily large precision. This means allowing a user-controlled amount of floats of precision for each complex number, the amount of iterations executed for non-elementary function, and possibly a different method of rendering colors to make sure it's more precise.
1+
Problem!
22

3-
UHPM should be used only to render static "images", so it shouldn't care about performance. It should be fine to wait a couple of seconds to wait for a picture of a specific range in the graph to be rendered, if this is going to be used for specific analysis, or for an image to be used in, for instance, a scientific paper.
3+
Currently, our Ultra High Precision Mode math library is written in a functional manner. This means that all functions are pure, beautiful and return things. This allows us to transform operations like `x * (y + z)` nicely into `hp_mult(x,hp_add(y,z))`
44

5-
Okay, let's attempt to do this.
5+
The problem is that GLSL really doesn't like when we return arrays, and nesting functions _also_ introduce multiple temporary variables, which occupate registers. It would much rather have this expression turned into `number expr; hp_add(y,z,expr); hp_mult(expr,x);`. It _also, also_ dislikes temporary variables being allocated in functions.
66

7-
In order to get a functioning high precision library, there is some basics that we have to do.
8-
First, we need the four basic operations, exponentiation, logarithm and trigonometric functions. We can use whatever numerical method we want for this so long as it runs in a non-absurd manner (a good limit to set is, at maximum, 3 seconds for a whole frame.)
9-
Once we have the four operations down, all those other functions can be computed quite fast through Newton-Raphson and their Taylor Series. Since precision is our main objective, we should run these numerical methods a number of times needed to give us an error smaller than our current precision.
10-
This also means making the number of iterations in our non-elementary functions user-defined.
7+
To fix this, we have to write our code like assembly! This means that we'll have to keep a list of global registers that are used whenever needed by a function, and we do composition on these registers!
118

12-
We don't want to be rewriting all our functions defined for regular precision in higher precision. So, once we have these basic functions down, and some other helper functions used by GLSL (like length, step, mix, etc), we will need to write a transpiler that transforms functions' implementations to use our higher precision library.
13-
For instance, the function `cadd(vec2 a, vec2 b){return a+b;}` should become `cadd(hp_vec2 a, hp_vec2 b) {return hp_vec2(hp_add(a.x,b.x),hp_add(a.y,b.y)));}`. It might be useful to manually replace all occurances of regular additions with a helper function `scalar_add(a,b)` in order to make this parsing easier (and avoid having to create yet another RPN interpreter)
9+
This means lots of work ahead:
1410

15-
There will be lots of precomputing necessary on the CPU side, namely for constants like PI, E and lookup tables for non-elementary functions. This means that we will need to have a C++ way to generate these arbitrary precision `number` structs.
16-
17-
Optionally, we could simply write a parser that transforms GLSL into C++ in build time, giving us the possibility of running our "shader" in the CPU if necessary. This might be such case if we ask for a very high limb count (for instance, 32 `uint` limbs, which equates to 1024 bits), which might cause the GPU to run for too long and end up losing context.
18-
This would give us a more stable, albeit far slower, method of drawing functions with any precision. Since this would all be transpiled from GLSL, we actually wouldn't need to rewrite any code, and would give us some pretty nice luxuries, like arbitrary function constant folding.
19-
20-
In short, our steps to implement UHPM are:
21-
1. Implementing functions
22-
1.1. Implementing the four operations
23-
1.2. Implementing ln, exp, sin, cos, tan
24-
1.3. Implementing other GLSL specific functions used in other pieces of code
25-
26-
2. Writing a transpiler from low-precision GLSL to high-precision GLSL
27-
28-
3. Working with CPU side arbitrary precision
29-
3.1 Sending arbitrary precision numbers to the shader
30-
3.2 Precomputing constants
31-
3.3 Precomputing non-elementary function lookup tables
32-
33-
4. Writing a GLSL to C++ transpiler (optional)
34-
4.1 Hooking up GLSL function names to their respective C++ functions automatically
35-
4.1.1 Constant folding with C++ GLSL definitions
36-
4.2 Running shaders in the CPU
11+
1. Change math library to use `inout` rather than `return`
12+
2. Change transpiler to expand nested expressions
13+
3. Change `generate_glsl_string` to expand nested expressions
14+
4. Rewrite transpiler to automatically use registers

src/glsl_transpiled/glsl_big_number.cpp

Lines changed: 70 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ number null_number() {
1313
res.limb[i] = 0u;
1414
}
1515
res.sign = 1;
16+
res.is_infinite = false;
1617
return res;
1718
}
1819

@@ -100,19 +101,29 @@ number abs_hp_sub(number a, number b) {
100101

101102
number hp_add(number a, number b) {
102103
number c = null_number();
104+
103105
if (a.sign == b.sign) {
104-
c = abs_sum(a, b);
106+
uint64_t carry = 0;
107+
for (int i = 0; i < NUMBER_OF_LIMBS; ++i) {
108+
uint64_t sum = (uint64_t)a.limb[i] + b.limb[i] + carry;
109+
c.limb[i] = (uint32_t)sum;
110+
carry = sum >> 32;
111+
}
105112
c.sign = a.sign;
106113
return c;
107114
}
115+
108116
int cmp = compare_abs(a, b);
109-
if (cmp >= 0) {
110-
c = abs_hp_sub(a, b);
111-
c.sign = a.sign;
112-
return c;
117+
number max_val = (cmp >= 0) ? a : b;
118+
number min_val = (cmp >= 0) ? b : a;
119+
120+
uint64_t borrow = 0;
121+
for (int i = 0; i < NUMBER_OF_LIMBS; ++i) {
122+
uint64_t diff = (uint64_t)max_val.limb[i] - min_val.limb[i] - borrow;
123+
c.limb[i] = (uint32_t)diff;
124+
borrow = (diff >> 32) & 1;
113125
}
114-
c = abs_hp_sub(b, a);
115-
c.sign = b.sign;
126+
c.sign = max_val.sign;
116127
return c;
117128
}
118129

@@ -145,88 +156,82 @@ number hp_mult(number a, number b) {
145156
number c = null_number();
146157
c.sign = a.sign * b.sign;
147158

148-
if (is_zero(a) || is_zero(b)) return c;
149-
if (a.is_infinite || b.is_infinite) {
150-
c.limb = infinite_number().limb;
159+
if (is_zero(a) || is_zero(b)) {
160+
c.sign = 1;
151161
return c;
152162
}
163+
if (a.is_infinite || b.is_infinite) return infinite_number();
164+
165+
uint32_t ext_c[NUMBER_OF_LIMBS + FRACTIONAL_SIZE] = { 0 };
166+
153167
for (int i = 0; i < NUMBER_OF_LIMBS; ++i) {
154168
if (a.limb[i] == 0u) continue;
155-
uint carry = 0u;
169+
uint64_t carry = 0;
156170

157171
for (int j = 0; j < NUMBER_OF_LIMBS; ++j) {
158-
int target = i + j - FRACTIONAL_SIZE;
172+
int ext_target = i + j;
173+
if (ext_target >= NUMBER_OF_LIMBS + FRACTIONAL_SIZE) break;
159174

160-
if (target >= NUMBER_OF_LIMBS) break;
175+
uint64_t prod = (uint64_t)a.limb[i] * b.limb[j] + carry;
176+
prod += ext_c[ext_target];
177+
ext_c[ext_target] = (uint32_t)prod;
178+
carry = prod >> 32;
179+
}
161180

162-
uvec2 prod = product_with_remainder(a.limb[i], b.limb[j]);
163-
if (target < 0) {
164-
uvec2 hp_add2 = sum_with_carry(prod.x, carry);
165-
carry = prod.y + hp_add2.y;
166-
}
167-
else {
168-
uvec2 hp_add1 = sum_with_carry(c.limb[target], prod.x);
169-
uvec2 hp_add2 = sum_with_carry(hp_add1.x, carry);
170-
c.limb[target] = hp_add2.x;
171-
carry = prod.y + hp_add1.y + hp_add2.y;
172-
}
181+
int leftover_target = i + NUMBER_OF_LIMBS;
182+
while (carry > 0 && leftover_target < NUMBER_OF_LIMBS + FRACTIONAL_SIZE) {
183+
uint64_t sum = (uint64_t)ext_c[leftover_target] + carry;
184+
ext_c[leftover_target] = (uint32_t)sum;
185+
carry = sum >> 32;
186+
leftover_target++;
173187
}
174188
}
175-
c.sign = c.sign * int(!is_zero(c));
189+
190+
for (int i = 0; i < NUMBER_OF_LIMBS; ++i) {
191+
c.limb[i] = ext_c[i + FRACTIONAL_SIZE];
192+
}
193+
194+
if (is_zero(c)) c.sign = 1;
176195
return c;
177196
}
178197

179198
number shift_left(number a, int shift) {
180199
if (shift <= 0) return a;
181-
182200
int limb_shift = shift / 32;
183201
int bit_shift = shift % 32;
184-
185202
number c = null_number();
186203
c.sign = a.sign;
187-
188-
if (limb_shift >= NUMBER_OF_LIMBS) return null_number();
204+
if (limb_shift >= NUMBER_OF_LIMBS) return c;
189205

190206
for (int i = NUMBER_OF_LIMBS - 1; i >= limb_shift; --i) {
191-
int target = i;
192-
int source = i - limb_shift;
193-
194-
uint val = a.limb[source] << bit_shift;
195-
196-
if (source > 0 && bit_shift > 0) {
197-
val |= a.limb[source - 1] >> (32 - bit_shift);
207+
uint32_t val = a.limb[i - limb_shift] << bit_shift;
208+
if (i - limb_shift > 0 && bit_shift > 0) {
209+
uint64_t prev_limb = a.limb[i - limb_shift - 1];
210+
val |= static_cast<uint32_t>(prev_limb >> (32 - bit_shift));
198211
}
199-
200-
c.limb[target] = val;
212+
c.limb[i] = val;
201213
}
202-
203214
return c;
204215
}
205216

206217
number shift_right(number a, int shift) {
207218
if (shift <= 0) return a;
208-
209219
int limb_shift = shift >> 5;
210220
int bit_shift = shift % 32;
211-
212221
number c = null_number();
213222
c.sign = a.sign;
223+
if (limb_shift >= NUMBER_OF_LIMBS) return c;
214224

215-
if (limb_shift >= NUMBER_OF_LIMBS) return null_number();
216225
for (int i = 0; i < NUMBER_OF_LIMBS - limb_shift; ++i) {
217-
int target = i;
218-
int source = i + limb_shift;
219-
220-
uint val = a.limb[source] >> bit_shift;
221-
222-
if (source < NUMBER_OF_LIMBS - 1 && bit_shift > 0) {
223-
val |= a.limb[source + 1] << (32 - bit_shift);
226+
uint32_t val = a.limb[i + limb_shift] >> bit_shift;
227+
if (i + limb_shift < NUMBER_OF_LIMBS - 1 && bit_shift > 0) {
228+
uint64_t next_limb = a.limb[i + limb_shift + 1];
229+
val |= static_cast<uint32_t>(next_limb << (32 - bit_shift));
224230
}
225-
c.limb[target] = val;
231+
c.limb[i] = val;
226232
}
227233
return c;
228234
}
229-
230235
int find_msb(number a) {
231236
for (int i = NUMBER_OF_LIMBS - 1; i >= 0; --i) {
232237
uint x = a.limb[i];
@@ -283,8 +288,6 @@ number hp_div(number n, number d) {
283288
number q = null_number();
284289
q.sign = n.sign * d.sign;
285290

286-
uint u_halves[NUMBER_OF_LIMBS * 2 + FRACTIONAL_SIZE * 2 + 1];
287-
288291
if (is_zero(d)) {
289292
q = infinite_number();
290293
return q;
@@ -297,7 +300,10 @@ number hp_div(number n, number d) {
297300
int msb_n = find_msb(n);
298301
if (msb_n == -1) return q;
299302

300-
for (int i = 0; i < NUMBER_OF_LIMBS*2 + FRACTIONAL_SIZE * 2 + 1; ++i) {
303+
const int MAX_HALVES = NUMBER_OF_LIMBS * 2 + FRACTIONAL_SIZE * 2 + 2;
304+
uint u_halves[MAX_HALVES];
305+
306+
for (int i = 0; i < MAX_HALVES; ++i) {
301307
u_halves[i] = 0u;
302308
}
303309

@@ -317,7 +323,7 @@ number hp_div(number n, number d) {
317323
uint dividend = (rem << 16) | u_halves[i];
318324
uint q_i = dividend / v0;
319325
rem = dividend % v0;
320-
326+
321327
if (i < NUMBER_OF_LIMBS * 2) {
322328
set_half(q, i, q_i);
323329
}
@@ -330,7 +336,6 @@ number hp_div(number n, number d) {
330336

331337
if (shift > 0) {
332338
uint carry_shift = 0u;
333-
const int MAX_HALVES = NUMBER_OF_LIMBS * 2 + FRACTIONAL_SIZE * 2 + 1;
334339
for (int i = 0; i < MAX_HALVES; ++i) {
335340
uint val = (u_halves[i] << shift) | carry_shift;
336341
carry_shift = u_halves[i] >> (16 - shift);
@@ -356,12 +361,13 @@ number hp_div(number n, number d) {
356361
if (u_jn == v_n1) {
357362
q_hat = 0xFFFFu;
358363
r_hat = u_jn1 + v_n1;
359-
} else {
364+
}
365+
else {
360366
q_hat = dividend / v_n1;
361367
r_hat = dividend % v_n1;
362368
}
363369

364-
while (r_hat < 0x10000u && (q_hat * v_n2) > ((r_hat << 16) | u_jn2)) {
370+
while (r_hat < 0x10000u && (q_hat * v_n2) >((r_hat << 16) | u_jn2)) {
365371
q_hat--;
366372
r_hat += v_n1;
367373
}
@@ -379,7 +385,7 @@ number hp_div(number n, number d) {
379385
u_halves[j + i] = uint(diff) & 0xFFFFu;
380386
borrow = (diff < 0) ? 1u : 0u;
381387
}
382-
388+
383389
int final_diff = int(u_halves[j + n_len]) - int(k) - int(borrow);
384390
u_halves[j + n_len] = uint(final_diff) & 0xFFFFu;
385391

@@ -405,12 +411,12 @@ number hp_div(number n, number d) {
405411
number div_uint(number n, uint d) {
406412
number q = null_number();
407413
q.sign = n.sign;
408-
uint rem = 0u;
409-
for (int i = NUMBER_OF_LIMBS * 2 - 1; i >= 0; --i) {
410-
uint dividend = (rem << 16) | get_half(n, i);
411-
uint q_i = dividend / d;
414+
uint64_t rem = 0;
415+
416+
for (int i = NUMBER_OF_LIMBS - 1; i >= 0; --i) {
417+
uint64_t dividend = (rem << 32) | n.limb[i];
418+
q.limb[i] = (uint32_t)(dividend / d);
412419
rem = dividend % d;
413-
set_half(q, i, q_i);
414420
}
415421
return q;
416422
}

src/glsl_transpiled/glsl_big_number.h

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
#include <vector>
55
#include <iostream>
66

7+
78
constexpr inline uint NUMBER_OF_LIMBS = 8;
89
constexpr inline uint FRACTIONAL_SIZE = NUMBER_OF_LIMBS / 2;
910

@@ -13,10 +14,13 @@ struct number {
1314
bool is_infinite = false;
1415
};
1516

17+
number float_to_number(float f);
18+
1619
struct hp_vec2 {
1720
number x;
1821
number y;
1922
hp_vec2(number x,number y) : x(x), y(y){};
23+
hp_vec2(float x, float y) : x(float_to_number(x)), y(float_to_number(y)){};
2024
};
2125

2226
hp_vec2 initialize_hp_vec2(number x, number y);
@@ -72,5 +76,6 @@ number hp_sqrt(number x);
7276

7377
number number_integer(const uint n);
7478

75-
number float_to_number(float f);
79+
float number_to_float(number n);
80+
7681

0 commit comments

Comments
 (0)