Skip to content

Commit e1c22c2

Browse files
committed
docs(numbersystems): fix errors + additional explanation
1 parent 8c50a6c commit e1c22c2

1 file changed

Lines changed: 46 additions & 3 deletions

File tree

topics/numbersystems.adoc

Lines changed: 46 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -211,11 +211,11 @@ overflow 11111 111
211211

212212
== Floating Point Numbers
213213

214-
So far we have a look at whole numbers, let's have a look at numbers with a decimal part, aka floating point numbers.
214+
So far we have looked at whole numbers, let's have a look at numbers with a decimal part, aka floating point numbers.
215215

216216
=== Decimal System
217217

218-
What you are use to in the decimal system.
218+
What you are used to in the decimal system.
219219

220220
[source]
221221
----
@@ -288,6 +288,14 @@ Formula:
288288
(sign) x (1 + mantissa) x 2^(exponent - 127)
289289
----
290290

291+
The `1 +` in the formula reflects the *hidden (implicit) bit*.
292+
In normalised form the leading bit before the decimal point is always `1`
293+
(e.g. `1.10010001 x 2^6`).
294+
Because it is always `1` it does not need to be stored — the 23 mantissa bits
295+
only store the *fractional part* after the `1.`.
296+
When interpreting the stored value you always add that implicit `1` back,
297+
which is what the `1 + mantissa` in the formula does.
298+
291299
Example:
292300

293301
[source]
@@ -378,7 +386,7 @@ Final float binary:
378386

379387
=== Why 0.1 Can't Be Represented Exactly
380388

381-
There are some problems with floating point numbers, on of the is that not every number can be represented correctly.
389+
There are some problems with floating point numbers, one of them is that not every number can be represented correctly.
382390

383391
[source]
384392
----
@@ -412,6 +420,41 @@ The double is defined as follows:
412420
* 11 bits exponent
413421
* 52 bits mantissa
414422

423+
The formula is the same as for a float, but with a larger exponent bias of *1023*:
424+
425+
[source]
426+
----
427+
(-1)^sign x (1 + mantissa) x 2^(exponent - 1023)
428+
----
429+
430+
Using our earlier example `100.25`:
431+
432+
[source]
433+
----
434+
Integer part: 1100100
435+
Fractional part: 01
436+
Combined: 1100100.01 x 2^0
437+
438+
Normalise (shift left 6): 1.10010001 x 2^6
439+
440+
Exponent: 6 + 1023 = 1029
441+
1029 in binary: 10000000101 (11 bits)
442+
443+
Mantissa (52 bits, fractional part only):
444+
1001000100000000000000000000000000000000000000000000
445+
446+
Final double binary:
447+
0 10000000101 1001000100000000000000000000000000000000000000000000
448+
----
449+
450+
Show in Java:
451+
452+
[source,java]
453+
----
454+
var bits = Double.doubleToLongBits(100.25);
455+
Long.toBinaryString(bits);
456+
----
457+
415458
== BigDecimal
416459

417460
If we want to keep precision, Java supports `BigDecimal` and `BigInteger`, which have virtually infinite precision.

0 commit comments

Comments
 (0)