You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file documents the coremark bench results to document performance improvements over time.
1
+
This file documents the coremark bench results to keep track of performance improvements over time.
2
2
- ac6b813: avg ≈ 500 (introduced benchmarking without criterion)
3
3
- 5f4af0c: avg = 529.773949, n = 20 (improved leb128 handling with unsafe)
4
4
- we try to avoid unsafe code in module/validator/instance directly
@@ -7,14 +7,27 @@ This file documents the coremark bench results to document performance improveme
7
7
- current design: two-level indirection (one array covering all possible code indices and directing them to a densely packed side table)
8
8
- we also try to not use nightly features... making error creation cold path is a really elegant solution in this regard
9
9
- repr(C) for the SideTableEntry struct caused mysterious improvements, not sure if it is a fluke
10
-
-current: avg = 855.71814, n = 20 (remove defensive malformed check in main loop)
10
+
-cc02503: avg = 855.71814, n = 20 (remove defensive malformed check in main loop)
11
11
- since the module is already validated at run time, there is no reason for the check to exist, it was a remnant of early development phase that lacked proper handling for some malformed modules
12
+
- current: no significant difference
12
13
14
+
On nightly, the performance is slightly better (sometimes reaching 900)
15
+
16
+
Next step: use direct threading to improve branch prediction
13
17
14
18
Hardware Overview:
15
19
- Model Name: MacBook Pro
16
20
- Model Identifier: Mac16,8
17
21
- Model Number: MX2H3LL/A
18
22
- Chip: Apple M4 Pro
19
23
- Total Number of Cores: 12 (8 performance and 4 efficiency)
20
-
- Memory: 24 GB
24
+
- Memory: 24 GB
25
+
26
+
Performance of other Rust-based interpreters:
27
+
wasmi: ~1700
28
+
tinywasm: ~630
29
+
30
+
Goal:
31
+
We expect/hope to reach ~1200 after threaded dispatch implementation. It seems like Ben Titzer only reached performance comparable to production-ready, optimizing interpreters through manually crafted assembly code for hot paths.
32
+
33
+
Higher performance may not be pursued after the point and instead I might focus on adding more instructions to achieve Wasm 2.0 spec parity (should be easy with AI).
0 commit comments