[OTBN] Integrate compression option for imem and dmem by siemen11 · Pull Request #30163 · lowRISC/opentitan

siemen11 · 2026-05-20T20:36:12Z

Use LZ4 to add a compression option in otbn_binary (activated by setting
compress = True). If this is set, the imem and dmem of the otbn
application are compressed before they are stored.
By default, binaries are not compressed.

Change the otbn.c and otbn.h in the cryptolib to use LZ4 from lib/base
in order to decompress the loaded imem and dmem from an application.

The compression option is therefore mandatory in the cryptolib
(lib/crypto), however, the applications from silicon_creator (using
silicon_creator/lib/drivers/otbn.c/h) still use the uncompressed version
(and will not work with the compressed version).

Concerning the CRC: The CRC for the imem/dmem is calculated before
compression. For loading to OTBN, the OTBN is given the CRC of the
uncompressed payload before and checks itself the CRC of what it
receives. I.e., the CRC still provides end-2-end protection.

johannheyszl · 2026-05-21T07:23:28Z

interesting idea.

kindly, what is the code size overhead of the addition. I'd assume significantly smaller than the saving.

siemen11 · 2026-05-21T07:42:54Z

interesting idea.

kindly, what is the code size overhead of the addition. I'd assume significantly smaller than the saving.

We require an additional ~400 bytes for the decompression algorithm but save ~5400 bytes

siemen11 · 2026-05-21T09:58:49Z

Some quick overview for the choice of compression algorithms provided by Gemini:

Algorithm	RAM (KB)	Flash (KB)	Speed	Ratio %
LZ4	0.00	0.5	100	50%
Heatshrink	0.05	0.8	30	55%
FastLZ	0.20	1.0	90	52%
DEFLATE	4.00	3.0	40	65%
Zstandard	24.00	15.0	80	70%

Currently, LZ4 was chosen since it saves already quite a bit and it would be the least invasive in speed or RAM usage

johannheyszl · 2026-05-21T14:20:27Z

Thanks @siemen11 - have to say that I don't see any immediate issues with this approach, and it clearly helps save code size.

Any quick thoughts on this @cfrantz @moidx ?

nasahlpa · 2026-05-22T05:53:25Z

-attrs==25.3.0 \
-    --hash=sha256:427318ce031701fea540783410126f03899a97ffc6f61596ad581ac2e40e3bc3 \
-    --hash=sha256:75d7cefc7fb576747b2c81b4442d4d4a1ce0900973527c011d1030fd3bf4af1b
+argcomplete==3.6.3 \


Why is the update needed?

Unsure... hence why I left it as a completely separate commit, but using the exact right version of python (3.9) and uv, I got differences when remaking python-requirements :s

nasahlpa · 2026-05-22T05:59:21Z

 */
 status_t read_otbn_load_checksum(uint32_t *checksum) {
-  TRY(dif_otbn_get_load_checksum(&otbn, checksum));
+  *checksum = abs_mmio_read32(TOP_EARLGREY_OTBN_BASE_ADDR +


Can you please provide more details why we shouldn't use DIF functions in the pentest framework?

Actually you were completely correct, this was not needed... I removed the commit entirely. Should be cleaner now, thanks 🙏

nasahlpa · 2026-05-22T06:10:17Z

+    uint32_t uncomp_len = (uint32_t)(src[2] | (src[3] << 8));
+    src += 4;
+
+    if (LZ4_decompress((const char *)src, (char *)local_chunk_buf,


How much runtime overhead adds the decompression and do we actually care about the runtime penalty? I assume we anyways only load the cryptolib once?

I think the runtime penatly applies upon every load of an OTBN binary.
Would be good to understand time overhead - agree. Suppose significantly lower than asymm crypto exec.

So running ecdsa_p256_functest_fpga_cw340_sival_rom_ext for example with this commit gives:

Successfully finished test sign_kat in 1693294 cycles or 70553 us @ 24 MHz.

This is some fluff, a sha2, and a otcrypto_ecdsa_p256_sign_config_k

Before we had

Successfully finished test sign_kat in 1598898 cycles or 66620 us @ 24 MHz.

So looks like 1 ms of time, i.e., 94396 cycles or around 6% for p256.

For ed25519 for example I see

Successfully finished test sign_verify_test in 1600837 cycles or 66701 us @ 24 MHz.

Versus before

Successfully finished test sign_verify_test in 1419775 cycles or 59157 us @ 24 MHz.

which is a 181062 cycles or 1.8 ms or a 12.7% overhead, this incudes both a sign and a verify.

And then for p384

Successfully finished test sign_kat in 3558983 cycles or 148290 us @ 24 MHz.

versus before

Successfully finished test sign_kat in 3438653 cycles or 143277 us @ 24 MHz.

So that is 120330 cycles or 1.2ms or 3.5% overhead in latency.

We always clean and load the OTBN binary, we never re-use the binary between executions. Hence the addition of ~1ms is on every operation.

TL;DR: We have a rough 25% in flash savings and a 10% in latency overhead for LZ4

Did some more experiments and optimizations. We have around a 20-24% in flash savings of OTBN binaries.
The latency increase is 4-5% for p256/p384 (lower for RSA), but 14% for 25519. This is because 25519 is simply just fast...
Hence, we have 20% flash improvement and 4% latency increase

nasahlpa · 2026-05-22T06:11:49Z

        "//sw/device/lib/base:status",
        "//sw/device/lib/dif:otbn",
-        "//sw/otbn/crypto:run_rsa",
+        "//sw/otbn/crypto:test_run_rsa",


Why this change?

We now have two "types" of OTBN binaries: compressed and non-compressed.
The cryptolib will move to using compresssed binaries (run_rsa). But the otbn.c from dif, testutils, or silicon_creator will load applications as if uncompressed whereas otbn drivers loads them compressed (so decompresses first). Hence, we need to distinguish which one we are using

I guess it's possible to let otbn_build emit both compressed and non-compressed blob into the same archive (either put into different sub-sections, or put multiple objects to a archive), and relies on linker gc to keep the one used.

It was a bit scary to do because if we mess something up you have both binaries in there, but no, works pretty well and removes the ugly Bazel targets, thank you!

nasahlpa · 2026-05-22T06:18:24Z

  "libclang==16.0.0",
  "libcst == 1.1.0", # 1.2.0+ needs Python 3.9+
  "lxml ~= 5.0",
+  "lz4 == 4.4.5",


@luismarques do we have a policy regarding including new python packages? Do you know about license?

https://pypi.org/project/lz4/ shows BSD 3-Clause License

anchivuk · 2026-05-22T11:40:52Z

@siemen11 , you can also consider do this for the KATS (as discussed offline), provided the entire decompression algorithm is included inside the cryptolib

siemen11 · 2026-05-22T11:50:57Z

@siemen11 , you can also consider do this for the KATS (as discussed offline), provided the entire decompression algorithm is included inside the cryptolib

Indeed super nice idea! Though sadly trying it out since the vectors are full entropy, LZ4 seems to increase the size 😆 was worth the try!

johannheyszl

I am in favour due to the code size savings and not seeing downsides - thanks @siemen11 !

sasdf · 2026-06-16T14:53:50Z

+    // Write to IMEM. Always starts at zero on the OTBN side.
+    uint32_t words = uncomp_len / (uint32_t)sizeof(uint32_t);
+    for (uint32_t i = 0; launder32(i) < words; i++) {
+      abs_mmio_write32(mmio_addr + i * sizeof(uint32_t), local_chunk_buf[i]);


missing boundary check? (depends on whether the uncompressed size metadata is considered trusted or not)

Added them, best to be on the safe side, thank you!

sasdf · 2026-06-16T14:57:55Z

+    if (offset >= match_len) {
+      // Safe from overlap
+      if (match_len >= FAST_PATH_THRESH) {
+        memcpy(dst_ptr, match_ptr, match_len);


note that memcpy (not memmove) with overlapped dst/src is UB in C. How much speedup does these fast paths provide?

0.3% :)
So I simplified it back again, I got carried away optimizing it

sasdf · 2026-06-17T14:13:41Z

+
+  while (src < app.imem_compressed_end) {
+    // Parse the 4-byte chunk header as unsigned 32-bit integers
+    uint32_t comp_len = (uint32_t)(src[0] | (src[1] << 8));


Not sure if src is considered trusted, we doesn't check if src has enough bytes for this 4 byte header.

Similar situation for reading comp_len bytes from src below, and also when decompressing dmem.

Maybe we can extract the decompress routine as a common helper to share between imem/dmem loader?

Added a check on the header size, but in general I made the logic closer to what we had before and implemented your suggestion of a decompress routine

Add lz4 to the python dependencies. Signed-off-by: Siemen Dhooghe <sdhooghe@google.com>

Add the lz4 decompression algorithm. This algorithm costs around 400 bytes. Signed-off-by: Siemen Dhooghe <sdhooghe@google.com>

Use LZ4 to create a compressed OTBN binary (both imem and dmem). The OTBN build script creates both the uncompressed and compressed binary relying on garbage collection to filter out the unused binary. Change the otbn.c and otbn.h in the cryptolib to use LZ4 from lib/base in order to decompress the loaded imem and dmem from an application. The cryptolib (lib/crypto) uses the compressed binaries, however, the applications from silicon_creator (using silicon_creator/lib/drivers/otbn.c/h) still use the uncompressed version. This option saves the cryptolib ~5400bytes while increasing ~400bytes to load LZ4.c while providing backwards compatability for the boot. Concerning the CRC: The CRC for the imem/dmem is calculated before compression. For loading to OTBN, the OTBN is given the CRC of the uncompressed payload before and checks itself the CRC of what it receives. I.e., the CRC still provides end-2-end protection. Signed-off-by: Siemen Dhooghe <sdhooghe@google.com>

nasahlpa

Thanks Siemen. I had a look into the code - LGMT!

siemen11 requested a review from johannheyszl May 20, 2026 20:36

siemen11 force-pushed the otbn_code_saving branch from cd68be9 to 506e010 Compare May 21, 2026 09:37

siemen11 changed the title ~~[test] OTBN flash size saving test~~ [OTBN] Integrate compression option for imem and dmem May 21, 2026

siemen11 force-pushed the otbn_code_saving branch from 506e010 to 0f5b321 Compare May 21, 2026 09:54

siemen11 force-pushed the otbn_code_saving branch 8 times, most recently from 673988c to 7a55643 Compare May 21, 2026 14:20

siemen11 marked this pull request as ready for review May 21, 2026 15:36

siemen11 requested review from a team and cfrantz as code owners May 21, 2026 15:36

siemen11 requested review from andrea-caforio, engdoreis and nasahlpa and removed request for a team, cfrantz and engdoreis May 21, 2026 15:36

nasahlpa reviewed May 22, 2026

View reviewed changes

siemen11 force-pushed the otbn_code_saving branch 2 times, most recently from fbcf2b4 to 5953d59 Compare May 22, 2026 08:48

siemen11 force-pushed the otbn_code_saving branch 4 times, most recently from a83f2ef to 9f0cfe5 Compare May 31, 2026 10:34

siemen11 added the CI:Rerun Rerun failed CI jobs label May 31, 2026

github-actions Bot removed the CI:Rerun Rerun failed CI jobs label May 31, 2026

siemen11 force-pushed the otbn_code_saving branch from 9f0cfe5 to 923590d Compare May 31, 2026 11:14

johannheyszl approved these changes Jun 11, 2026

View reviewed changes

johannheyszl requested review from cfrantz and moidx June 11, 2026 09:33

sasdf reviewed Jun 16, 2026

View reviewed changes

siemen11 force-pushed the otbn_code_saving branch from 923590d to 0a17be0 Compare June 16, 2026 16:15

sasdf approved these changes Jun 17, 2026

View reviewed changes

siemen11 force-pushed the otbn_code_saving branch from 0a17be0 to a21369a Compare June 17, 2026 19:42

siemen11 added 2 commits June 17, 2026 23:59

[python] Add lz4 dependency

024685d

Add lz4 to the python dependencies. Signed-off-by: Siemen Dhooghe <sdhooghe@google.com>

[base] Add lz4 decompression

53ecd50

Add the lz4 decompression algorithm. This algorithm costs around 400 bytes. Signed-off-by: Siemen Dhooghe <sdhooghe@google.com>

siemen11 force-pushed the otbn_code_saving branch from a21369a to cea92c4 Compare June 17, 2026 22:05

siemen11 force-pushed the otbn_code_saving branch from cea92c4 to 14210ed Compare June 19, 2026 08:46

nasahlpa approved these changes Jun 19, 2026

View reviewed changes

nasahlpa merged commit e735550 into lowRISC:earlgrey_1.0.0 Jun 19, 2026
34 of 35 checks passed

siemen11 mentioned this pull request Jun 19, 2026

Manual chery-pick: [OTBN] Integrate compression option for imem and dmem #30472

Merged

Uh oh!

Conversation

siemen11 commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johannheyszl commented May 21, 2026

Uh oh!

siemen11 commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

siemen11 commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johannheyszl commented May 21, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

siemen11 May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anchivuk commented May 22, 2026

Uh oh!

siemen11 commented May 22, 2026

Uh oh!

johannheyszl left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sasdf Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nasahlpa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

siemen11 commented May 20, 2026 •

edited

Loading

siemen11 commented May 21, 2026 •

edited

Loading

siemen11 commented May 21, 2026 •

edited

Loading

siemen11 May 31, 2026 •

edited

Loading

sasdf Jun 17, 2026 •

edited

Loading