Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions benchmark/desk/gen/bench-unum-grid.hoon
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
:: bench-unum-grid: run the whole /lib/unum timing grid in one dojo call.
:: +bench-unum-grid n=@ud
:: For each (door, arm) it ~&-prints a [%cell door arm] label, then +cell's
:: ~>(%bout) slogs "took ..". Per-call = (arm took - base took) / n. Also
:: runs +fdp-cell per door (per-element = took / n). Returns the folded
:: results to force evaluation. Run once on the jetted build (hints on) and
:: once interpreted (hints commented) and diff the scraped times.
::
/+ uc=unum-cells
:- %say
|= [* [n=@ud ~] ~]
:- %noun
=/ arms=(list @tas)
:~ %base %add %sub %mul %div %fma %sqt %neg
%exp %log %sin %cos %atan %pow %lth
==
=/ doors=(list @tas) ~[%rpb %rph %rps]
=/ rows
%+ turn doors
|= door=@tas
%+ turn arms
|= arm=@tas
~& [%cell door arm]
(cell:uc door arm n)
=/ fdps
%+ turn doors
|= door=@tas
~& [%fdp door]
(fdp-cell:uc door n)
[rows fdps]
9 changes: 9 additions & 0 deletions benchmark/desk/gen/bench-unum.hoon
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
:: bench-unum: run ONE /lib/unum timing cell. +bench-unum [door arm n]
:: door ?(%rpb %rph %rps) ; arm @tas (or %base) ; n @ud
:: +cell's ~>(%bout) prints elapsed; per-call = (cell(arm) - cell(%base)) / n.
::
/+ uc=unum-cells
:- %say
|= [* [door=?(%rpb %rph %rps) arm=@tas n=@ud ~] ~]
:- %noun
(cell:uc door arm n)
31 changes: 31 additions & 0 deletions benchmark/desk/lib/bench-core.hoon
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
:: bench-core: shared timing loop for the numerics benchmark suite.
::
:: `time` runs a tight loop over a PRECOMPUTED list of inputs, wrapped in
:: ~>(%bout ...), which PRINTS the elapsed time ("took ms/..") as a slog and
:: RETURNS the folded accumulator. The host driver scrapes the printed line;
:: the returned value forces evaluation of every call (defeats dead-code
:: elimination).
::
:: CRITICAL: inputs are precomputed by the caller OUTSIDE this gate, so the
:: slow interpreted @ud->@rX conversions (sun:rd / san:rd, ~93 us/call) are
:: NOT charged to per-call cost. Inside the timed loop the only work is the
:: arm under test plus a jetted atom-add fold -- both the input list walk
:: (O(1) head access) and the fold are cheap, so the measured time reflects
:: the arm, isolated. Each input is a [x y] pair; y is unused (0) for the
:: single-argument arms and carries the second operand for atan2/pow/pow-n.
::
:: Per-call cost = (time(arm-list) - time(base-list)) / n, where the base list
:: uses the same inputs but the step skips the transcendental.
::
|%
:: +time: walk the precomputed input list, timed by %bout; return the acc.
::
++ time
|= [xs=(list [x=@ y=@]) step=$-([[x=@ y=@] acc=@] @)]
^- @
~> %bout
=/ acc=@ `@`0
|- ^- @
?~ xs acc
$(xs t.xs, acc (step i.xs acc))
--
59 changes: 59 additions & 0 deletions benchmark/desk/lib/unum-cells.hoon
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
:: unum-cells: per-(door,arm) timing cell for /lib/unum (posit) benchmarks,
:: mirroring lib/bench-cells (the /lib/math harness). +cell precomputes the
:: posit input list (sun/div kept OUT of the hot path) and folds the arm over
:: it via (time ..), whose ~>(%bout) slogs "took ..". Per-call cost =
:: (cell(arm) - cell(%base)) / n. +fdp-cell times one fused dot product over
:: length-n vectors (per-element = took / n).
::
:: The posit width is selected at runtime by %*-specializing the generic ++pp
:: core on bloq (3/4/5 = posit8/16/32); the unum jets read bloq from that
:: sample, so this one body benchmarks every width and both the jetted (hints
:: on) and interpreted (hints commented) builds.
::
/+ *bench-core, unum
|%
:: +dor: door @tas -> bloq
++ dor |=(door=@tas ?:(=(door %rpb) 3 ?:(=(door %rph) 4 5)))
:: +cell: time `n` folds of `arm` at posit width `door`.
++ cell
|= [door=@tas arm=@tas n=@ud]
^- @
=/ d %*(. pp:unum bloq (dor door))
:: inputs in [0.25, 1.25]: positive, in-range for exp/log/sqt and all ops.
=/ inp |=(k=@ud ^-(@ (div:d (sun:d +((mod k 5))) (sun:d 4))))
=/ xs=(list [x=@ y=@])
%+ turn (gulf 0 (dec n))
|= k=@ud ^-([@ @] [(inp k) (inp +(k))])
=/ step
|= [p=[x=@ y=@] acc=@] ^- @
%+ add acc
?+ arm ~|([%bad-arm arm] !!)
%base x.p
%neg (neg:d x.p)
%abs (abs:d x.p)
%sqt (sqt:d x.p)
%exp (exp:d x.p)
%log (log:d x.p)
%sin (sin:d x.p)
%cos (cos:d x.p)
%atan (atan:d x.p)
%add (add:d x.p y.p)
%sub (sub:d x.p y.p)
%mul (mul:d x.p y.p)
%div (div:d x.p y.p)
%fma (fma:d x.p y.p x.p)
%pow (pow:d x.p y.p)
%lth ?:((lth:d x.p y.p) 1 0)
==
(time xs step)
:: +fdp-cell: time one fused dot product over length-n posit vectors.
++ fdp-cell
|= [door=@tas n=@ud]
^- @
=/ d %*(. pp:unum bloq (dor door))
=/ inp |=(k=@ud ^-(@ (div:d (sun:d +((mod k 5))) (sun:d 4))))
=/ av=(list @) (turn (gulf 0 (dec n)) inp)
=/ bv=(list @) (turn (gulf 0 (dec n)) |=(k=@ud (inp +(k))))
~> %bout
(fdp:d av bv)
--
47 changes: 47 additions & 0 deletions benchmark/results/2026-06-28/unum/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# `/lib/unum` (posit) jet benchmark — 2026-06-28

Three-way per-call comparison of the `/lib/unum` arms: **interpreted** Hoon vs
**Python/SoftUnum** (the C lib via ctypes) vs **jetted** Hoon, at posit8/16/32
(`rpb`/`rph`/`rps`). Full table in [`table.txt`](table.txt).

## Method (mirrors the `/lib/math` `bench-math` protocol)

- `gen/bench-unum-grid` (→ `lib/unum-cells`) precomputes a posit input list
*outside* the timed loop, then folds each arm `n` times inside `~>(%bout)`,
which slogs `took …`. Per-call = `(arm − base) / n`. The `%bout` value with
the dots stripped is microseconds.
- **One jet binary, hint toggle.** Jetted = `/lib/unum` with its `~%`/`~/`
hints; interpreted = the same file with the hints commented out (so no jet
matches and the pure Hoon runs). Both on a hoon-135 fakezod (the 408k pill).
- Jetted measured at `n=100,000`; interpreted at `n=100` (interpreted posit
transcendentals are ~20–60 ms/call, so a larger `n` is impractical).
- Python column: `tools/bench_unum_report.py` times SoftUnum via ctypes
(200k calls/arm) — the raw C speed a Python user sees (call overhead included).

## Headline numbers (jetted vs interpreted)

| class | jetted | interpreted | speedup |
|---|---|---|---|
| arithmetic (`add`/`sub`/`mul`/`div`/`fma`) | ~1.5–2 µs | ~270–530 µs | **~180–295×** |
| `sqrt` posit8 | 1.4 µs | 338 µs | 238× |
| transcendentals (`exp`/`log`/`sin`/`cos`/`pow`) | ~12–60 µs | ~18–57 ms | **~760–2000×** |
| `fdp` (fused dot product, per element) | 0.09–0.27 µs | ~205 µs | **760–2300×** |

The fused dot product shows the largest win: the jet runs the whole quire
accumulation in C (no per-element noun allocation), so posit8 `fdp` is ~0.09
µs/element — ~2300× the interpreted Hoon.

## Caveat: posit16/32 `sqrt`/`atan` are slow even jetted

`sqt`/`atan` widen sharply at posit16/32 because SoftUnum's wide path uses a
512-bit fixed `wide_t` with a bit-by-bit integer sqrt; e.g. jetted `atan:rps` is
~850 µs/call (still 75× the interpreted Hoon, but far off the posit8 45 µs).
The 512-bit `isqt`/AGM is the bottleneck — a candidate for a faster wide sqrt if
posit32 transcendental throughput matters.

## Files

- `table.txt` — the full per-call table.
- `jetted-n100000.txt`, `interp-n100.txt` — raw scraped `%bout` grids.
- harness: `benchmark/desk/lib/unum-cells.hoon`, `benchmark/desk/gen/bench-unum{,-grid}.hoon`,
`benchmark/tools/bench_unum_report.py`.
96 changes: 96 additions & 0 deletions benchmark/results/2026-06-28/unum/interp-n100.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
[%cell %rpb %base]
took µs/204
[%cell %rpb %add]
took ms/37.571
[%cell %rpb %sub]
took ms/33.376
[%cell %rpb %mul]
took ms/27.582
[%cell %rpb %div]
took ms/33.893
[%cell %rpb %fma]
took ms/52.949
[%cell %rpb %sqt]
took ms/34.048
[%cell %rpb %neg]
took ms/2.191
[%cell %rpb %exp]
took s/1.848.485
[%cell %rpb %log]
took s/3.599.430
[%cell %rpb %sin]
took s/3.023.956
[%cell %rpb %cos]
took s/3.026.054
[%cell %rpb %atan]
took s/6.166.146
[%cell %rpb %pow]
took s/5.304.743
[%cell %rpb %lth]
took ms/2.309
[%cell %rph %base]
took µs/189
[%cell %rph %add]
took ms/38.078
[%cell %rph %sub]
took ms/33.866
[%cell %rph %mul]
took ms/26.992
[%cell %rph %div]
took ms/34.390
[%cell %rph %fma]
took ms/49.550
[%cell %rph %sqt]
took ms/33.723
[%cell %rph %neg]
took ms/2.117
[%cell %rph %exp]
took s/1.916.618
[%cell %rph %log]
took s/3.740.115
[%cell %rph %sin]
took s/3.081.869
[%cell %rph %cos]
took s/3.109.865
[%cell %rph %atan]
took s/6.351.491
[%cell %rph %pow]
took s/5.535.635
[%cell %rph %lth]
took ms/2.283
[%cell %rps %base]
took µs/202
[%cell %rps %add]
took ms/37.676
[%cell %rps %sub]
took ms/34.177
[%cell %rps %mul]
took ms/27.852
[%cell %rps %div]
took ms/34.635
[%cell %rps %fma]
took ms/49.469
[%cell %rps %sqt]
took ms/34.359
[%cell %rps %neg]
took ms/2.170
[%cell %rps %exp]
took s/1.942.873
[%cell %rps %log]
took s/3.906.881
[%cell %rps %sin]
took s/3.229.494
[%cell %rps %cos]
took s/3.248.253
[%cell %rps %atan]
took s/6.371.033
[%cell %rps %pow]
took s/5.696.497
[%cell %rps %lth]
took ms/2.282
[%fdp %rpb]
took ms/20.458
[%fdp %rph]
took ms/20.457
[%fdp %rps]
took ms/20.549
96 changes: 96 additions & 0 deletions benchmark/results/2026-06-28/unum/jetted-n100000.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
[%cell %rpb %base]
took ms/201.210
[%cell %rpb %add]
took ms/347.461
[%cell %rpb %sub]
took ms/350.724
[%cell %rpb %mul]
took ms/353.591
[%cell %rpb %div]
took ms/366.860
[%cell %rpb %fma]
took ms/379.836
[%cell %rpb %sqt]
took ms/343.556
[%cell %rpb %neg]
took ms/269.387
[%cell %rpb %exp]
took s/1.361.196
[%cell %rpb %log]
took s/1.992.308
[%cell %rpb %sin]
took s/1.734.641
[%cell %rpb %cos]
took s/1.725.140
[%cell %rpb %atan]
took s/4.711.162
[%cell %rpb %pow]
took s/2.921.758
[%cell %rpb %lth]
took ms/341.119
[%cell %rph %base]
took ms/201.493
[%cell %rph %add]
took ms/378.660
[%cell %rph %sub]
took ms/380.545
[%cell %rph %mul]
took ms/368.763
[%cell %rph %div]
took ms/360.478
[%cell %rph %fma]
took ms/398.496
[%cell %rph %sqt]
took s/1.140.286
[%cell %rph %neg]
took ms/275.133
[%cell %rph %exp]
took s/2.730.851
[%cell %rph %log]
took s/4.091.147
[%cell %rph %sin]
took s/3.613.529
[%cell %rph %cos]
took s/3.703.221
[%cell %rph %atan]
took s/46.093.061
[%cell %rph %pow]
took s/6.134.767
[%cell %rph %lth]
took ms/341.385
[%cell %rps %base]
took ms/213.612
[%cell %rps %add]
took ms/382.517
[%cell %rps %sub]
took ms/389.338
[%cell %rps %mul]
took ms/375.795
[%cell %rps %div]
took ms/357.904
[%cell %rps %fma]
took ms/403.311
[%cell %rps %sqt]
took s/1.944.009
[%cell %rps %neg]
took ms/287.158
[%cell %rps %exp]
took s/2.745.098
[%cell %rps %log]
took s/4.200.254
[%cell %rps %sin]
took s/3.688.782
[%cell %rps %cos]
took s/3.781.378
[%cell %rps %atan]
took s/85.240.487
[%cell %rps %pow]
took s/6.279.328
[%cell %rps %lth]
took ms/341.757
[%fdp %rpb]
took ms/8.910
[%fdp %rph]
took ms/26.396
[%fdp %rps]
took ms/26.986
Loading