web/wasm_bench.mjs is an iteration-loop benchmark driver for
the WebAssembly build of the OpenHTJ2K decoder. It pays the WASM
startup cost once, runs N decodes of the same file in a loop, and
reports min/median/p95/mean wall-clock together with Msamples/s and
fps. It is the right tool for measuring steady-state decode throughput
and for byte-exact regression testing during WASM-side perf work.
For one-shot decode to a file, use web/open_htj2k_dec.mjs
(documented under the WebAssembly section of
building.md). The two scripts share the same WASM
loader but serve different purposes: open_htj2k_dec.mjs writes a
single PPM/PGM/PGX output, while wasm_bench.mjs times repeated
decodes and optionally dumps planar buffers for comparison.
- Node.js ≥ 18 to run the bench driver (uses modern ESM, top-level
await, and the
node:testshape consistent with the rest of the WASM tooling in this repo). - Emscripten to build the WASM artefacts the driver loads — see building.md → Building for WebAssembly.
Build at least one WASM variant first. The bench driver looks in
${WEB_DIR}/../build_wasm_prof/html/ by default; override with
--build-dir to point elsewhere.
emcmake cmake -S web -B build_wasm_prof -DCMAKE_BUILD_TYPE=Release
cmake --build build_wasm_prof \
--target libopen_htj2k_simd libopen_htj2k_mt_simd \
-j$(nproc)Add -DOPENHTJ2K_WASM_PROFILE=ON at configure time if you also intend
to collect CPU profiles; see profiling below. The
--profiling-funcs linker flag preserves Wasm function names and adds
~15% to .wasm size with zero measured runtime cost.
node web/wasm_bench.mjs -i <codestream> [options...]| Option | Default | Description |
|---|---|---|
-i, --input <file> |
— (required) | Input codestream (.j2c, .j2k, or .jph). |
--variant scalar|simd|mt|mt_simd |
simd |
WASM build to load. mt* variants require --threads > 0. |
--threads N |
1 |
Number of decode threads. Ignored by scalar / simd. 0 = auto (uses navigator.hardwareConcurrency). |
--iters N |
20 |
Number of measured iterations. |
--warmup N |
3 |
Number of unmeasured iterations before measurement. |
--mode stream|planar_u8 |
stream |
Which decoder entry point to call. stream uses invoke_decoder_stream (PPM/PGM path); planar_u8 uses invoke_decoder_planar_u8 (WASM RTP demo path). |
--reduce N |
0 |
Resolution reduction level (0 = full resolution). |
--build-dir <path> |
../build_wasm_prof/html |
Override WASM binary directory. |
--dump-planes <prefix> |
off | planar_u8 only: on the final iteration write each component plane as <prefix>_{Y,Cb,Cr}.pgm. Used for byte-exact diff checks. |
The script prints a JSON object to stdout on completion:
{
"variant": "simd", "threads": 1,
"input": "...", "dims": {"W": 3840, "H": 2160, "C": 3, "depth": 12},
"iters": 20,
"total_ms": { "min": ..., "p50": ..., "p95": ..., "max": ..., "mean": ... },
"parse_ms": { ... },
"decode_ms": { ... },
"throughput_msamples_per_s_mean": ...,
"fps_mean": ...
}node web/wasm_bench.mjs \
-i build-f32/bin/u05Q90.j2c \
--variant simd --iters 20 --warmup 3for t in 1 2 4 8; do
echo "=== $t threads ==="
node web/wasm_bench.mjs \
-i build-f32/bin/u05Q90.j2c \
--variant mt_simd --threads $t --iters 15 --warmup 3
doneinvoke_decoder_planar_u8 writes per-component u8 buffers at native
(per-component) resolution — the shape the browser RTP demo
(web/rtp_demo.html) uses to upload three R8 textures per
frame for GPU-side YCbCr→RGB.
node web/wasm_bench.mjs \
-i conformance_data/ATK_DFS_IRV.j2c \
--variant mt_simd --threads 2 --iters 15 --warmup 3 \
--mode planar_u8--mode planar_u8 has no conformance-test coverage, so when modifying
that path record an explicit plane-level checksum:
# Capture baseline BEFORE your code change.
node web/wasm_bench.mjs \
-i some_file.j2c --variant simd --iters 1 \
--mode planar_u8 --dump-planes /tmp/before/myfile
# ...edit wrapper.cpp, rebuild WASM...
# Capture AFTER and diff.
node web/wasm_bench.mjs \
-i some_file.j2c --variant simd --iters 1 \
--mode planar_u8 --dump-planes /tmp/after/myfile
cmp /tmp/before/myfile_Y.pgm /tmp/after/myfile_Y.pgm
cmp /tmp/before/myfile_Cb.pgm /tmp/after/myfile_Cb.pgm
cmp /tmp/before/myfile_Cr.pgm /tmp/after/myfile_Cr.pgmFor good coverage of the wrapper paths, diff at least:
- 8-bit 4:4:4 (e.g.
conformance_data/p0_04.j2k) - 12-bit 4:4:4 (any 12 bpc YCbCr file)
- 12-bit 4:2:2 (e.g.
conformance_data/ATK_DFS_IRV.j2c) - Sub-8-bit grayscale (e.g.
conformance_data/ds0_ht_03_b14.j2k)
# Produces bench.cpuprofile in the current directory.
# Load it via Chrome DevTools → Performance → "Load profile".
node --cpu-prof --cpu-prof-name=bench.cpuprofile --cpu-prof-interval=100 \
web/wasm_bench.mjs \
-i build-f32/bin/u05Q90.j2c --variant simd --iters 30 --warmup 5node --cpu-prof silently produces no output file for the mt /
mt_simd variants — Emscripten's pthread teardown bypasses Node's
CPU-profile finalizer. Use Linux perf instead, with Node's
--perf-basic-prof-only-functions so Wasm JIT code is mapped to
readable names:
perf record -F 499 -g -o bench.perf.data --call-graph dwarf -- \
node --perf-basic-prof-only-functions \
web/wasm_bench.mjs \
-i build-f32/bin/u05Q90.j2c \
--variant mt_simd --threads 2 --iters 20 --warmup 3
perf report -i bench.perf.data --stdio --no-children -g none | head -30On most Linux hosts this needs kernel.perf_event_paranoid at 2 or
lower (sudo sysctl kernel.perf_event_paranoid=2).