@@ -416,8 +416,6 @@ int wh_Client_Sha256Dma(whClientContext* ctx, wc_Sha256* sha, const uint8_t* in,
416416| **`requestSent` flag** | Adds a parameter to the API, but avoids unnecessary round-trips when input is absorbed entirely into the local buffer |
417417| **Snapshot/rollback on send failure** | Small CPU cost to copy the partial buffer, but guarantees SHA state consistency even on transport failures |
418418
419-
420-
421419## RNG: Single-Shot with Caller-Driven Chunking
422420
423421The RNG generate operation is the second algorithm to receive the async
@@ -540,6 +538,88 @@ int wh_Client_RngGenerate(whClientContext* ctx, uint8_t* out, uint32_t size);
540538int wh_Client_RngGenerateDma(whClientContext* ctx, uint8_t* out, uint32_t size);
541539```
542540
541+ ## AES: One-Shot with DMA Support
542+
543+ AES modes (CBC, CTR, ECB, GCM) are all **one-shot** operations — every call
544+ consumes a fixed buffer of input and returns a fixed buffer of output in a
545+ single round-trip. There is no client-side partial-block accumulation across
546+ Request/Response pairs the way SHA's `Update` family has, which makes the
547+ async split significantly simpler than SHA. (CBC and CTR still carry
548+ inter-call IV / counter state on the `Aes` struct — see *Mutable state*
549+ below — but that state is updated atomically by the Response, not buffered
550+ across calls.)
551+
552+ - **No partial-block buffering** on the client. The entire plaintext or
553+ ciphertext is handed to one Request and the full result comes back in
554+ one Response.
555+ - **No `requestSent` flag.** Each call sends exactly one request and
556+ expects exactly one response. If the request's serialised size would
557+ exceed `WOLFHSM_CFG_COMM_DATA_LEN`, the inline Request returns
558+ `WH_ERROR_BADARGS` up front; DMA variants bypass the cap for payload
559+ data.
560+ - **No snapshot/rollback.** There is no local buffer to corrupt: the key
561+ lives on `aes->devKey` (or as a cached keyId), the IV on `aes->reg`, and
562+ these are read-only until the Response arrives.
563+
564+ ### Mutable state: IV and counter
565+
566+ For **CBC** and **CTR**, the Response updates mutable state on the `Aes`
567+ struct so subsequent calls chain correctly:
568+
569+ - **CBC** — `aes->reg` is updated with the last ciphertext block. For
570+ decryption, the Request captures the last ciphertext block from the
571+ input buffer into `aes->reg` *before* sending, so in-place (input
572+ pointer == output pointer) operation still produces the right chaining
573+ state after the Response overwrites the plaintext.
574+ - **CTR** — `aes->reg`, `aes->tmp`, and `aes->left` are updated from the
575+ Response so the counter advances correctly for subsequent calls. CTR
576+ is symmetric: callers should use `AES_ENCRYPTION` for the key schedule
577+ and pass `enc = 1` in both directions.
578+
579+ **ECB** and **GCM** carry no inter-call state on the `Aes` struct. For
580+ GCM, the IV, AAD, and (on decrypt) the expected tag are passed as explicit
581+ arguments on each call.
582+
583+ ### DMA variant contract
584+
585+ The DMA pairs follow the same pattern as SHA DMA:
586+
587+ 1. **Fail-fast** on `wh_CommClient_IsRequestPending()` before acquiring
588+ any DMA mapping, so a Request cannot be issued while another call is
589+ still outstanding and cannot leak a translated address if
590+ `wh_Client_SendRequest` later rejects the call.
591+ 2. **PRE-translate** input, output, and (for GCM) AAD buffers. Non-DMA
592+ payload fields (key material, IV, auth tag) stay inline in the
593+ request message.
594+ 3. **Stash** the translated addresses in `ctx->dma.asyncCtx.aes` so the
595+ matching Response can issue POST cleanup.
596+ 4. **POST cleanup** runs on every non-`WH_ERROR_NOTREADY` return from the
597+ Response, so the caller's buffers are safe to read regardless of
598+ success or error.
599+ 5. The caller must keep the input, output, and AAD buffers valid until
600+ the Response returns something other than `WH_ERROR_NOTREADY`.
601+
602+ ### API Reference
603+
604+ Inline (non-DMA) pairs:
605+
606+ - `wh_Client_AesCbcRequest` / `wh_Client_AesCbcResponse`
607+ - `wh_Client_AesCtrRequest` / `wh_Client_AesCtrResponse`
608+ - `wh_Client_AesEcbRequest` / `wh_Client_AesEcbResponse`
609+ - `wh_Client_AesGcmRequest` / `wh_Client_AesGcmResponse`
610+
611+ DMA pairs (require `WOLFHSM_CFG_DMA`):
612+
613+ - `wh_Client_AesCbcDmaRequest` / `wh_Client_AesCbcDmaResponse`
614+ - `wh_Client_AesCtrDmaRequest` / `wh_Client_AesCtrDmaResponse`
615+ - `wh_Client_AesEcbDmaRequest` / `wh_Client_AesEcbDmaResponse`
616+ - `wh_Client_AesGcmDmaRequest` / `wh_Client_AesGcmDmaResponse`
617+
618+ The existing blocking wrappers (`wh_Client_AesCbc`, `wh_Client_AesCtr`,
619+ `wh_Client_AesEcb`, `wh_Client_AesGcm`, and their `*Dma` variants) are now
620+ thin shells that call the new async primitives in a poll loop, so blocking
621+ and async paths share identical wire behaviour.
622+
543623## Roadmap: Remaining Algorithms
544624
545625The async split pattern will be applied algorithm by algorithm to all crypto
@@ -555,15 +635,15 @@ the full set of operations and their planned async status.
555635| SHA-384 | Update/Final Request/Response | Shares SHA-512 wire format |
556636| SHA-512 | Update/Final Request/Response | Non-DMA and DMA variants |
557637| RNG Generate | `wh_Client_RngGenerate{Request,Response}` and DMA variants | Single-shot per call; non-DMA callers chunk against `WH_MESSAGE_CRYPTO_RNG_MAX_INLINE_SZ`, DMA has no per-call cap |
638+ | AES-CBC | `wh_Client_AesCbc{,Dma}{Request,Response}` | Non-DMA and DMA variants |
639+ | AES-CTR | `wh_Client_AesCtr{,Dma}{Request,Response}` | Non-DMA and DMA variants |
640+ | AES-ECB | `wh_Client_AesEcb{,Dma}{Request,Response}` | Non-DMA and DMA variants |
641+ | AES-GCM | `wh_Client_AesGcm{,Dma}{Request,Response}` | Non-DMA and DMA variants; AAD supports DMA |
558642
559643**Planned:**
560644
561645| Algorithm | Functions | Complexity | Notes |
562646|-------------------|--------------------------------------------|------------|-------|
563- | AES-CBC | `wh_Client_AesCbc{Request,Response}` | Low | Single-shot; straightforward split |
564- | AES-CTR | `wh_Client_AesCtr{Request,Response}` | Low | Single-shot |
565- | AES-ECB | `wh_Client_AesEcb{Request,Response}` | Low | Single-shot |
566- | AES-GCM | `wh_Client_AesGcm{Request,Response}` | Low | Single-shot; AAD + ciphertext in one message |
567647| RSA Sign/Verify | `wh_Client_RsaFunction{Request,Response}` | Low | Single-shot; may need auto-import removed from Request |
568648| RSA Get Size | `wh_Client_RsaGetSize{Request,Response}` | Low | Trivial query |
569649| ECDSA Sign | `wh_Client_EccSign{Request,Response}` | Low | Single-shot |
@@ -572,7 +652,7 @@ the full set of operations and their planned async status.
572652| Curve25519 | `wh_Client_Curve25519SharedSecret{Request,Response}` | Low | Single-shot |
573653| Ed25519 Sign | `wh_Client_Ed25519Sign{Request,Response}` | Low | Single-shot |
574654| Ed25519 Verify | `wh_Client_Ed25519Verify{Request,Response}`| Low | Single-shot |
575- | CMAC | `wh_Client_Cmac{Request,Response}` | Low | Already has partial split pattern |
655+ | CMAC | `wh_Client_Cmac{Request,Response}` | Medium | Streaming (Init/Update/Final), so follows SHA-style pattern rather than the one-shot AES pattern |
576656| ML-DSA Sign | `wh_Client_MlDsaSign{Request,Response}` | Low | Post-quantum; single-shot |
577657| ML-DSA Verify | `wh_Client_MlDsaVerify{Request,Response}` | Low | Post-quantum; single-shot |
578658
0 commit comments