Skip to content

wolfCrypt on TI C2000 C28x (LAUNCHXL-F28P55X)#10724

Draft
dgarske wants to merge 6 commits into
wolfSSL:masterfrom
dgarske:ti_c25
Draft

wolfCrypt on TI C2000 C28x (LAUNCHXL-F28P55X)#10724
dgarske wants to merge 6 commits into
wolfSSL:masterfrom
dgarske:ti_c25

Conversation

@dgarske

@dgarske dgarske commented Jun 18, 2026

Copy link
Copy Markdown
Member

Summary

Adds WOLFSSL_WIDE_BYTE support so wolfCrypt builds and runs correctly on word-addressed targets where CHAR_BIT != 8 - specifically the TI C2000 C28x DSP family, where a C char/unsigned char (wolfSSL's byte) is 16 bits and is the smallest addressable unit. All changes are gated and are a no-op on normal 8-bit-byte targets.

The work was validated end-to-end on a TI LAUNCHXL-F28P55X (TMS320F28P550SJ, C28x, 150 MHz) using the bare-metal example added in the companion wolfssl-examples PR. Every algorithm below passes known-answer tests on hardware, and the standard host wolfcrypt_test continues to pass (no 8-bit regression).

Validated algorithms (on C28x hardware)

  • SHA-224/256, SHA-384/512, SHA-512/224, SHA-512/256
  • SHA3-224/256/384/512, SHAKE128/256 (with a 32-bit split Keccak permutation for WC_16BIT_CPU that emits native instructions instead of compiler 64-bit helper calls - ~53% faster SHAKE/SHA3 on this target)
  • ML-DSA-87 (Dilithium) verify and full keygen/sign/verify; ML-KEM-768 (FIPS 203)
  • AES-128/192/256 CBC/CTR/CFB/GCM; AES-CMAC, AES-CCM, AES-GMAC
  • HMAC + HKDF; ChaCha20-Poly1305; Poly1305
  • X25519 + Ed25519; ECDSA + ECDH (SECP256R1, SP math)
  • RSA-2048 PKCS#1 v1.5 verify (SP math)

What the CHAR_BIT != 8 fixes address

All behind WOLFSSL_WIDE_BYTE (auto-enabled for CHAR_BIT != 8 and known 16-bit-char TI toolchain macros), each a no-op on 8-bit targets:

  • Byte/word aliasing. Serializing a word32/word64 by casting to byte* moves addressable cells, not octets. Replaced with explicit shift-based octet I/O via shared helpers in misc.c (WordsFromBytesBE32/BytesFromWordsBE32, BytesFromWordsLE32, the 64-bit variants, octet-correct readUnalignedWord32/readUnalignedWord64). sp_int.c sp_read_unsigned_bin uses an endian-/CHAR_BIT-agnostic shift loop for its leftover bytes (a 3-byte RSA exponent previously loaded as 1 instead of 65537).
  • (byte)x not truncating to an octet (it keeps 16 bits). Masked with WC_OCTET(x) = (byte)((x) & 0xFF). Used across the ML-KEM/ML-DSA encoders, the SP *_to_bin serializers, AES GETBYTE, base64, and the DRBG.
  • Integer promotion. 1U << n is 16-bit on C28x (use 1UL); a bit width written sizeof(t) * 8 is wrong when CHAR_BIT != 8 (use CHAR_BIT * sizeof(t)); byte operands promote to a 16-bit int.
  • sizeof counting cells, not octets. e.g. CHACHA_CHUNK_BYTES must be 16 * 4, not 16 * sizeof(word32) (= 32 on C28x, which halves the ChaCha block and desyncs the counter).
  • xorbuf word stride. WOLFSSL_WORD_SIZE_LOG2 vs sizeof(word) mismatch left half of each buffer un-XORed on a 16-bit-cell target; corrected for the WC_16BIT_CPU word16 path.

It also adds WOLFSSL_MLDSA_VERIFY_SMALLEST_MEM (streams the signature z vector per-row), which combined with WOLFSSL_MLDSA_ASSIGN_KEY brings ML-DSA-87 verify to ~10.7 KB RAM with zero heap.

Commit layout

  1. wolfcrypt: add WOLFSSL_WIDE_BYTE support for CHAR_BIT != 8 targets (TI C2000 C28x) - core types, misc octet helpers, base64, DRBG
  2. sha: octet-correct SHA-2 byte I/O and 32-bit split Keccak permutation for CHAR_BIT != 8
  3. aes/chacha: octet-correct block, key and keystream I/O for CHAR_BIT != 8
  4. mldsa/mlkem: correct ML-DSA and ML-KEM on CHAR_BIT != 8; add WOLFSSL_MLDSA_VERIFY_SMALLEST_MEM
  5. ecc/25519/sp: octet-correct X25519/Ed25519 and SP byte<->mp conversion for CHAR_BIT != 8
  6. test/benchmark/ci: CHAR_BIT != 8 test vectors, NO_MALLOC benchmark, TI C2000 compile CI and docs

Testing

  • Host: ./configure --enable-dilithium --enable-experimental --enable-shake256 --enable-shake128 && make && ./wolfcrypt/test/testwolfcrypt - passes (RSA, ECC, ML-DSA, ML-KEM, SHA-2/3, all crypto). No behavior change on 8-bit-byte targets.
  • Hardware: on the LAUNCHXL-F28P55X, KATs for every algorithm listed above pass, and wolfcrypt_test crypto passes.
  • CI: IDE/C2000/compile.sh runs cl2000 --compile_only over the CHAR_BIT != 8 wolfCrypt subset; .github/workflows/ti-c2000-compile.yml runs it on PRs (fetches/caches the TI C2000 code generation tools).

Benchmarks (F28P55X @ 150 MHz)

Primitive Throughput
SHA-256 ~284 KiB/s
SHA-384 / SHA-512 ~166 KiB/s
SHA3-224 / 256 / 384 / 512 ~279 / 264 / 206 / 146 KiB/s
SHAKE128 / SHAKE256 ~319 / 264 KiB/s
RNG (Hash-DRBG) ~122 KiB/s

ML-DSA-87: verify ~225 ms/op (~10.7 KB RAM, zero heap); keygen and signing also run (SIGN=1).

Notes

  • wolfcrypt/src/sp_c32.c is generated. The & 0xFF octet masks added to its sp_*_to_bin_* serializers should be folded into the SP generator templates for a permanent fix; the in-tree edit is included here so the C28x build is correct today.
  • Documentation: IDE/C2000/README.md describes the support, the build options, and the benchmark results; the full bare-metal example (with KATs, benchmark, linker scripts, and per-algorithm make toggles) is in wolfssl-examples at embedded/ti-c2000-f28p55x/.

Companion PR

wolfssl-examples: "Add TI LAUNCHXL-F28P55X (C2000 C28x, CHAR_BIT==16) bare-metal wolfCrypt example".
wolfSSL/wolfssl-examples#576

@dgarske dgarske self-assigned this Jun 18, 2026
Copilot AI review requested due to automatic review settings June 18, 2026 00:15

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds and CI-guards a bare-metal wolfCrypt port for TI C2000 C28x targets where CHAR_BIT == 16, introducing gated fixes so hashing, DRBG, ML-DSA verify, and SP-math ECC work correctly when a C “byte” is wider than 8 bits.

Changes:

  • Introduces WOLFSSL_NO_OCTET_BYTE detection and uses octet-wise load/store paths to avoid invalid byte/word aliasing on CHAR_BIT != 8 targets (SHA-256/512 family, SHA-3/SHAKE, Base64 CT decode, DRBG helpers, rotate helpers).
  • Adds “smallest memory” ML-DSA verify mode that streams z per polynomial to reduce pinned RAM in wc_MlDsaKey.
  • Adds TI C2000 compile-only guard scripts plus a GitHub Actions workflow that downloads the TI CGT and compiles a scoped subset.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
wolfssl/wolfcrypt/wc_port.h Makes atomic arg type selection robust for 16-bit int by also checking UINT_MAX.
wolfssl/wolfcrypt/wc_mldsa.h Adds WOLFSSL_MLDSA_VERIFY_SMALLEST_MEM struct layout variant for reduced verify RAM.
wolfssl/wolfcrypt/types.h Adds WOLFSSL_NO_OCTET_BYTE auto-detection; adjusts WC_16BIT_CPU 64-bit availability behavior.
wolfssl/wolfcrypt/sp_int.h Adds support for unsigned char being 16-bit (no native 8-bit type).
wolfssl/wolfcrypt/settings.h Requires explicit opt-in for SP math on 16-bit-int CPUs via WOLFSSL_SP_ALLOW_16BIT_CPU.
wolfssl/wolfcrypt/dilithium.h Adds smallest-mem verify gating and defaults slow Montgomery reduction macros on WC_16BIT_CPU.
wolfcrypt/test/test.c Switches large-digest constants from C strings to byte[] to avoid CHAR_BIT!=8 pitfalls.
wolfcrypt/src/wc_port.c Fixes init-state static assert to use CHAR_BIT instead of hardcoded 8.
wolfcrypt/src/wc_mldsa.c Adds octet-masking for packed bytes and fixes integer-promotion/sign issues on 16-bit int; adds streaming z verify path.
wolfcrypt/src/sha512.c Adds octet-wise word load/store and corrects length carry/length placement for CHAR_BIT!=8.
wolfcrypt/src/sha3.c Forces bytewise Keccak absorb/squeeze for WOLFSSL_NO_OCTET_BYTE and adds squeeze helper.
wolfcrypt/src/sha256.c Adds octet-wise word load/store and corrects length carry/length placement for CHAR_BIT!=8.
wolfcrypt/src/random.c Fixes DRBG serialization/addition helpers for non-8-bit “byte” targets.
wolfcrypt/src/misc.c Fixes rotate helpers to use CHAR_BIT-based bit width when needed.
wolfcrypt/src/coding.c Ensures Base64 CT decode returns 0xFF for invalid chars even when byte is wider than 8 bits.
wolfcrypt/benchmark/benchmark.c Adds static buffers for WOLFSSL_NO_MALLOC benchmarking and adjusts frees/allocations accordingly.
scripts/ti-c2000/user_settings.h Adds minimal CI-only config for cl2000 compile-guard.
scripts/ti-c2000/compile.sh Adds compile-only script to build a scoped source set with TI cl2000.
.github/workflows/ti-c2000-compile.yml Adds CI workflow to download/cache TI CGT and run the compile-only guard.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread wolfssl/wolfcrypt/types.h Outdated
Comment thread wolfcrypt/benchmark/benchmark.c
@dgarske dgarske force-pushed the ti_c25 branch 3 times, most recently from 20e4053 to 39c343a Compare June 23, 2026 14:56

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 2 comments.

Comment thread wolfssl/wolfcrypt/types.h
Comment on lines +429 to 432
#if !defined(MICROCHIP_PIC24) && \
!(defined(SIZEOF_LONG_LONG) && (SIZEOF_LONG_LONG == 8))
#undef WORD64_AVAILABLE
#endif
Comment thread wolfcrypt/src/sha3.c
Comment on lines +691 to +714
void BlockSha3(word64* s)
{
word32* sp = (word32*)s;
const word32* rc = (const word32*)hash_keccak_r;
word32 sl[25], sh[25], nl[25], nh[25], bl[5], bh[5];
word32 i, k;

for (k = 0; k < 25; k++) {
sl[k] = sp[2 * k];
sh[k] = sp[2 * k + 1];
}
for (i = 0; i < 24; i += 2) {
WC_SHA3_THETA(sl, sh);
WC_SHA3_ROWMIX(nl, nh, sl, sh);
nl[0] ^= rc[2 * i]; nh[0] ^= rc[2 * i + 1];
WC_SHA3_THETA(nl, nh);
WC_SHA3_ROWMIX(sl, sh, nl, nh);
sl[0] ^= rc[2 * (i + 1)]; sh[0] ^= rc[2 * (i + 1) + 1];
}
for (k = 0; k < 25; k++) {
sp[2 * k] = sl[k];
sp[2 * k + 1] = sh[k];
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants