Skip to content

Commit 5d90121

Browse files
widgetiiclaude
andauthored
agent: add membw command for bare-metal DDR bandwidth test (#100)
## Summary Closes #99. - New `CMD_MEMBW` (0x0D) / `RSP_MEMBW` (0x87) — runs memset / read-scan / memcpy kernels via ARM `ldmia`/`stmia` (r4-r11, 32 B per loop iter) against a scratch DDR buffer with the MMU cache on, timed with the ARMv7 PMU cycle counter (CCNT). - New `defib agent membw [--size 4MB] [--iters 8] [--addr 0] [--port ...] [--output human|json]` CLI command. - Reports `cycles/byte` (CPU-clock-invariant — the metric that actually isolates DDR fabric from CPU clock variance) plus `MB/s` when the architectural generic timer's `CNTFRQ` is set by an earlier boot stage. If `CNTFRQ == 0` the host transparently falls back to cycles/byte. - Agent v4: bumps `AGENT_VERSION`, advertises `CAP_MEMBW` in INFO so the host can check support before sending the command. - ARMv7 (Cortex-A7 V4 / V5 / V6 family) only. ARMv5 (ARM926, `hi3516cv300`) cleanly rejects with `ACK_FLASH_ERROR` via `#ifdef CPU_ARM926` — different PMU register layout, out of scope for the motivating use case. ## Why From the issue: when investigating an encoder fps gap between OpenIPC and vendor firmware on identical `gk7205v300` silicon, the key question — *"is the DDR fabric slow, or is Linux slow on top of it?"* — can't be cleanly answered from inside Linux. CMA reservations, cache attributes, libc memcpy variance and scheduler noise all muddy any userspace number. defib already runs a bare-metal agent in DDR right after SPL brings memory up. That's the exact moment we want to measure raw DDR throughput, before any kernel/ISP/VENC traffic. `defib agent membw` gives a reproducible apples-to-apples bandwidth number per firmware. ## How ### Agent C (`agent/main.c`, `agent/protocol.h`) Three inline-asm kernels with `ldmia`/`stmia` over r4-r11 (8 words = 32 B per memory operation), so OpenIPC vs vendor builds produce identical instruction streams. Cache is on (write-back / write-allocate per `startup.S` page-table fill); the buffer is sized well above L1+L2 so DDR is the actual bottleneck. CCNT is calibrated against `CNTPCT` (architectural generic timer, fixed frequency from `CNTFRQ`) over a 10 ms window. If `CNTFRQ` was never written by the bootrom — and on the V4 family it isn't — the agent returns `timer_hz = 0` and the host falls back to the cycles/byte metric. That number alone already answers the original question because it normalises for CPU-clock differences across firmwares, which is the gotcha that bit the reporter in the original investigation. ### Agent footprint guard The default scratch sits at `LOAD_ADDR + 8 MiB` (a new `AGENT_LOAD_ADDR` macro is passed in via Makefile `CFLAGS`). `handle_membw` rejects any user-supplied `addr` whose `[addr, addr + 2*size)` range overlaps `[LOAD_ADDR - 64 KB, LOAD_ADDR + 8 MiB]` — otherwise an 8 MiB memcpy on the default V4 layout would stomp the running agent's own code. This was found during real-hardware testing — see the validation section. ### Python host (`src/defib/agent/client.py`, `cli/app.py`) - `MembwResult` dataclass with `cycles_per_byte(ticks, write_amp=1)` and `mbps(ticks, write_amp=1)` helpers (returns `None` for `mbps` when `timer_hz == 0`). - `FlashAgentClient.membw(size_bytes, iters, addr)` async method. - `defib agent membw` Typer command with `human` and `json` output modes. - `agent info` now lists `membw` in the capabilities line when reported by the agent. ### Tests - **Agent C** (`agent/test_agent.c`): round-trip framing tests for the 12 B request and 32 B response packets. - **Python** (`tests/test_agent_protocol.py::TestMembw`): four tests using `MockTransport` — field parsing, MB/s + cycles/byte math, `timer_hz == 0` graceful degradation, ARMv5 (`ACK_FLASH_ERROR`) rejection path. ## Validation **Real hardware, 2026-05-14:** | Test | hi3516ev300 (V4) | gk7205v300 (V4) | |---|---|---| | Agent v4 advertises `membw` | ✓ | ✓ | | memset 4 MiB × 8 | 0.345 cyc/B | 0.345 cyc/B | | read 4 MiB × 8 | 0.512 cyc/B | 0.513 cyc/B | | memcpy 4 MiB × 8 (R+W) | 0.446 cyc/B | 0.446 cyc/B | | 8 MiB × 16 + 16 MiB × 8 | — | flat to 0.2% — past cache | Both SoCs agree to 0.2% — expected, same V4 silicon family with the same DDR config. `CNTFRQ == 0` on both, so MB/s shows `n/a` and the cycles/byte fallback activates automatically. **Tests / lint / cross-build (all green):** - `make -C agent test HOST_CC=gcc` — 5412/5412 pass (includes 2 new framing tests) - `uv run pytest tests/ -x --ignore=tests/fuzz` — 494 pass, 2 skip (includes 4 new TestMembw tests) - `uv run ruff check src/ tests/` — clean - `uv run mypy src/defib/ --ignore-missing-imports` — clean - Cross-build verified: `gk7205v300`, `hi3516ev300`, `hi3516cv300` (ARMv5 reject path), `hi3516cv610`; `make all-socs` builds all four default targets. ## Test plan - [x] Agent C unit tests pass (`make -C agent test HOST_CC=gcc`) - [x] Python tests pass (`uv run pytest tests/`) - [x] Ruff + mypy clean - [x] Cross-compile every default SoC - [x] Real-hardware smoke on hi3516ev300 (ARMv7) - [x] Real-hardware smoke on gk7205v300 (ARMv7, the motivating SoC) - [x] Real-hardware edge cases: 4 MiB×8, 8 MiB×16, 16 MiB×8 — cycles/byte stable - [ ] Real-hardware smoke on hi3516cv300 (ARMv5 reject path — needs an ARMv5 board) - [ ] Run from OpenIPC U-Boot and vendor U-Boot on the same `gk7205v300` silicon, diff `cycles_per_byte` — that's the motivating measurement the issue was asking for. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Dmitry Ilyin <widgetii@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 3430c6a commit 5d90121

8 files changed

Lines changed: 628 additions & 4 deletions

File tree

agent/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -167,6 +167,7 @@ CFLAGS = -mcpu=$(CPU_TYPE) -marm -O2 -ffreestanding -nostdlib \
167167
-DUART_BASE=$(UART_BASE) -DUART_CLOCK=$(UART_CLOCK) $(CPU_FLAG) \
168168
-DFLASH_MEM=$(FLASH_MEM) -DFMC_BASE=$(FMC_BASE) -DRAM_BASE=$(RAM_BASE) -DWDT_BASE=$(WDT_BASE) \
169169
-DCRG_BASE=$(CRG_BASE) -DSYSCTRL_REBOOT=$(SYSCTRL_REBOOT) \
170+
-DAGENT_LOAD_ADDR=$(LOAD_ADDR) \
170171
$(if $(UART_CKSEL_REG),-DUART_CKSEL_REG=$(UART_CKSEL_REG) -DUART_CKSEL_BIT=$(UART_CKSEL_BIT)) \
171172
-mno-unaligned-access -Wall -Wextra
172173

agent/main.c

Lines changed: 210 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -123,8 +123,10 @@ static int addr_readable(uint32_t addr, uint32_t size) {
123123
* v3 added: flash_mem at INFO bytes 24..27 (so the host knows which
124124
* memory-mapped flash window CMD_CRC32/CMD_READ should target on
125125
* SoCs where it isn't 0x14000000 — e.g. hi3520dv200 has it at
126-
* 0x58000000). */
127-
#define AGENT_VERSION 3
126+
* 0x58000000).
127+
* v4 added: CMD_MEMBW for bare-metal DDR bandwidth measurement
128+
* (ARMv7 only; ACK_FLASH_ERROR on ARMv5). */
129+
#define AGENT_VERSION 4
128130

129131
/* Capability flags — advertise supported features */
130132
#define CAP_FLASH_STREAM (1 << 0) /* CMD_FLASH_STREAM with double-buffer */
@@ -134,9 +136,15 @@ static int addr_readable(uint32_t addr, uint32_t size) {
134136
#define CAP_REBOOT (1 << 4) /* CMD_REBOOT */
135137
#define CAP_SELFUPDATE (1 << 5) /* CMD_SELFUPDATE */
136138
#define CAP_SCAN (1 << 6) /* CMD_SCAN */
139+
#ifndef CPU_ARM926
140+
#define CAP_MEMBW (1 << 7) /* CMD_MEMBW (ARMv7 PMU cycle counter) */
141+
#else
142+
#define CAP_MEMBW 0
143+
#endif
137144

138145
#define AGENT_CAPS (CAP_FLASH_STREAM | CAP_SECTOR_BITMAP | CAP_PAGE_SKIP | \
139-
CAP_SET_BAUD | CAP_REBOOT | CAP_SELFUPDATE | CAP_SCAN)
146+
CAP_SET_BAUD | CAP_REBOOT | CAP_SELFUPDATE | CAP_SCAN | \
147+
CAP_MEMBW)
140148

141149
static void handle_info(void) {
142150
uint8_t resp[28];
@@ -240,6 +248,202 @@ static void handle_crc32_cmd(const uint8_t *data, uint32_t len) {
240248
proto_send(RSP_CRC32, resp, 4);
241249
}
242250

251+
/*
252+
* CMD_MEMBW: DDR bandwidth test. ARMv7 (Cortex-A7) only.
253+
*
254+
* Request: [size:4LE][iters:4LE][addr:4LE]
255+
* size = 0 → 4 MiB default; otherwise must be 256B-aligned, ≤ 16 MiB
256+
* iters = 0 → 8 default; max 256
257+
* addr = 0 → RAM_BASE + MEMBW_SCRATCH_OFF (auto-pick)
258+
*
259+
* Response: [base:4LE][size:4LE][iters:4LE][timer_hz:4LE]
260+
* [memset_ticks:4LE][read_ticks:4LE][memcpy_ticks:4LE][cpu_arch:4LE]
261+
*
262+
* timer_hz = CCNT frequency in Hz, calibrated against the architectural
263+
* generic timer; 0 if CNTFRQ wasn't set up. Host can still
264+
* compute cycles/byte (CPU-clock-invariant) when timer_hz==0.
265+
*
266+
* Cache state: MMU is on with DDR mapped as write-back / write-allocate
267+
* (see startup.S page-table fill). Test runs cached — apples-to-apples
268+
* with userspace memcpy/memset, with the buffer sized well above L1+L2.
269+
*/
270+
#ifndef CPU_ARM926
271+
static inline void pmccntr_init(void) {
272+
uint32_t v;
273+
asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r"(v));
274+
v |= (1u << 0); /* E: enable all counters */
275+
v |= (1u << 2); /* C: reset CCNT */
276+
asm volatile("mcr p15, 0, %0, c9, c12, 0" :: "r"(v));
277+
asm volatile("mcr p15, 0, %0, c9, c12, 1" :: "r"(0x80000000u));
278+
asm volatile("isb");
279+
}
280+
281+
static inline uint32_t pmccntr_read(void) {
282+
uint32_t v;
283+
asm volatile("isb\n\t"
284+
"mrc p15, 0, %0, c9, c13, 0" : "=r"(v));
285+
return v;
286+
}
287+
288+
/* Calibrate CCNT (CPU cycles) against CNTPCT (architectural timer, fixed
289+
* frequency from CNTFRQ). Returns CCNT ticks per second, or 0 if CNTFRQ
290+
* wasn't initialised by an earlier boot stage. */
291+
static uint32_t pmccntr_calibrate_hz(void) {
292+
uint32_t cntfrq;
293+
asm volatile("mrc p15, 0, %0, c14, c0, 0" : "=r"(cntfrq));
294+
/* Sanity: most hi-silicon BL1 sets this to 24 MHz. Anything outside
295+
* 1 MHz..100 MHz is almost certainly an uninitialised register. */
296+
if (cntfrq < 1000000u || cntfrq > 100000000u) return 0;
297+
298+
uint32_t lo0, hi0, lo1, hi1;
299+
asm volatile("mrrc p15, 0, %0, %1, c14" : "=r"(lo0), "=r"(hi0));
300+
uint32_t target = cntfrq / 100; /* 10 ms window */
301+
pmccntr_init();
302+
uint32_t c0 = pmccntr_read();
303+
do {
304+
asm volatile("mrrc p15, 0, %0, %1, c14" : "=r"(lo1), "=r"(hi1));
305+
} while ((lo1 - lo0) < target);
306+
uint32_t c1 = pmccntr_read();
307+
return (c1 - c0) * 100u;
308+
}
309+
310+
/* Write 8 words per stm — 32 B per loop iteration. r4-r11 are AAPCS
311+
* callee-saved; listing them as clobbers makes GCC push/pop them in
312+
* the prologue. */
313+
static void __attribute__((noinline)) membw_memset(uint32_t addr, uint32_t bytes) {
314+
asm volatile(
315+
"mov r4, %[v]\n\t"
316+
"mov r5, %[v]\n\t"
317+
"mov r6, %[v]\n\t"
318+
"mov r7, %[v]\n\t"
319+
"mov r8, %[v]\n\t"
320+
"mov r9, %[v]\n\t"
321+
"mov r10, %[v]\n\t"
322+
"mov r11, %[v]\n\t"
323+
"1:\n\t"
324+
"stmia %[p]!, {r4, r5, r6, r7, r8, r9, r10, r11}\n\t"
325+
"cmp %[p], %[end]\n\t"
326+
"blo 1b\n\t"
327+
: [p] "+r"(addr)
328+
: [end] "r"(addr + bytes), [v] "r"(0xA5A5A5A5u)
329+
: "r4", "r5", "r6", "r7", "r8", "r9", "r10", "r11",
330+
"cc", "memory"
331+
);
332+
}
333+
334+
/* Read 8 words per ldm. No store — pure read bandwidth. */
335+
static void __attribute__((noinline)) membw_read(uint32_t addr, uint32_t bytes) {
336+
asm volatile(
337+
"1:\n\t"
338+
"ldmia %[p]!, {r4, r5, r6, r7, r8, r9, r10, r11}\n\t"
339+
"cmp %[p], %[end]\n\t"
340+
"blo 1b\n\t"
341+
: [p] "+r"(addr)
342+
: [end] "r"(addr + bytes)
343+
: "r4", "r5", "r6", "r7", "r8", "r9", "r10", "r11",
344+
"cc", "memory"
345+
);
346+
}
347+
348+
/* Copy 8 words per ldm/stm pair — 32 B in, 32 B out per iteration. */
349+
static void __attribute__((noinline)) membw_memcpy(uint32_t dst, uint32_t src, uint32_t bytes) {
350+
asm volatile(
351+
"1:\n\t"
352+
"ldmia %[s]!, {r4, r5, r6, r7, r8, r9, r10, r11}\n\t"
353+
"stmia %[d]!, {r4, r5, r6, r7, r8, r9, r10, r11}\n\t"
354+
"cmp %[s], %[end]\n\t"
355+
"blo 1b\n\t"
356+
: [s] "+r"(src), [d] "+r"(dst)
357+
: [end] "r"(src + bytes)
358+
: "r4", "r5", "r6", "r7", "r8", "r9", "r10", "r11",
359+
"cc", "memory"
360+
);
361+
}
362+
#endif /* !CPU_ARM926 */
363+
364+
#define MAX_MEMBW_SIZE (16u * 1024u * 1024u)
365+
/* Agent footprint guard: protect [AGENT_LOAD_ADDR - 64 KB, AGENT_LOAD_ADDR
366+
* + 8 MiB) from the test buffer. The lower margin covers the 16 KB stack
367+
* that lives below _start; the upper margin (8 MiB) is generous head-room
368+
* for .text/.data/.bss including the 16 KB-aligned page table. The default
369+
* scratch sits at AGENT_LOAD_ADDR + 8 MiB so even 16 MiB × memcpy (32 MiB
370+
* total span) fits inside the 128 MiB cached DDR window. */
371+
#define MEMBW_AGENT_GUARD_LO ((uint32_t)AGENT_LOAD_ADDR - 0x10000u)
372+
#define MEMBW_AGENT_GUARD_HI ((uint32_t)AGENT_LOAD_ADDR + 0x800000u)
373+
#define MEMBW_DEFAULT_ADDR ((uint32_t)AGENT_LOAD_ADDR + 0x800000u)
374+
375+
static void handle_membw(const uint8_t *data, uint32_t len) {
376+
#ifdef CPU_ARM926
377+
(void)data; (void)len;
378+
/* ARMv5 (ARM926EJ-S) has a different PMU register layout. Out of
379+
* scope — the motivating use case (gk7205v300 DDR fabric audit) is
380+
* ARMv7. */
381+
proto_send_ack(ACK_FLASH_ERROR);
382+
#else
383+
if (len < 12) { proto_send_ack(ACK_CRC_ERROR); return; }
384+
385+
uint32_t size = read_le32(&data[0]);
386+
uint32_t iters = read_le32(&data[4]);
387+
uint32_t addr = read_le32(&data[8]);
388+
389+
if (size == 0) size = 4u * 1024u * 1024u;
390+
if (iters == 0) iters = 8;
391+
if (addr == 0) addr = MEMBW_DEFAULT_ADDR;
392+
393+
if (iters > 256 || size > MAX_MEMBW_SIZE || (size & 0xFFu) != 0) {
394+
proto_send_ack(ACK_FLASH_ERROR); return;
395+
}
396+
/* Fit dst = addr + size and src = addr inside the cached DDR
397+
* window (128 MiB from RAM_BASE per startup.S page-table fill). */
398+
if (addr < RAM_BASE) { proto_send_ack(ACK_FLASH_ERROR); return; }
399+
uint32_t off = addr - RAM_BASE;
400+
if (off + 2u * size > 128u * 1024u * 1024u) {
401+
proto_send_ack(ACK_FLASH_ERROR); return;
402+
}
403+
/* Reject scratch ranges that would overlap the agent's own footprint
404+
* (its code, stack, page table). memcpy would otherwise overwrite
405+
* the running agent and the device would hang. */
406+
uint32_t scratch_end = addr + 2u * size;
407+
if (scratch_end > MEMBW_AGENT_GUARD_LO &&
408+
addr < MEMBW_AGENT_GUARD_HI) {
409+
proto_send_ack(ACK_FLASH_ERROR); return;
410+
}
411+
412+
uint32_t timer_hz = pmccntr_calibrate_hz();
413+
414+
uint32_t t0, t1;
415+
416+
pmccntr_init();
417+
t0 = pmccntr_read();
418+
for (uint32_t i = 0; i < iters; i++) membw_memset(addr, size);
419+
t1 = pmccntr_read();
420+
uint32_t memset_ticks = t1 - t0;
421+
422+
pmccntr_init();
423+
t0 = pmccntr_read();
424+
for (uint32_t i = 0; i < iters; i++) membw_read(addr, size);
425+
t1 = pmccntr_read();
426+
uint32_t read_ticks = t1 - t0;
427+
428+
pmccntr_init();
429+
t0 = pmccntr_read();
430+
for (uint32_t i = 0; i < iters; i++) membw_memcpy(addr + size, addr, size);
431+
t1 = pmccntr_read();
432+
uint32_t memcpy_ticks = t1 - t0;
433+
434+
uint8_t resp[32];
435+
write_le32(&resp[0], addr);
436+
write_le32(&resp[4], size);
437+
write_le32(&resp[8], iters);
438+
write_le32(&resp[12], timer_hz);
439+
write_le32(&resp[16], memset_ticks);
440+
write_le32(&resp[20], read_ticks);
441+
write_le32(&resp[24], memcpy_ticks);
442+
write_le32(&resp[28], 1); /* cpu_arch: 1 = ARMv7 Cortex-A */
443+
proto_send(RSP_MEMBW, resp, sizeof(resp));
444+
#endif
445+
}
446+
243447
/* Forward declaration */
244448
static void handle_flash_write(const uint8_t *data, uint32_t len);
245449

@@ -1131,6 +1335,9 @@ int main(void) {
11311335
case CMD_MARK_BAD:
11321336
handle_mark_bad(cmd_buf, data_len);
11331337
break;
1338+
case CMD_MEMBW:
1339+
handle_membw(cmd_buf, data_len);
1340+
break;
11341341
case CMD_SET_BAUD:
11351342
handle_set_baud(cmd_buf, data_len);
11361343
break;

agent/protocol.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
#define CMD_FLASH_PROGRAM 0x0A
2121
#define CMD_FLASH_STREAM 0x0B
2222
#define CMD_MARK_BAD 0x0C /* NAND only: write 0x00 to OOB[0] of page 0 of a block */
23+
#define CMD_MEMBW 0x0D /* DDR bandwidth test (ARMv7 only): see handle_membw */
2324

2425
/* Responses (device → host) */
2526
#define RSP_INFO 0x81
@@ -28,6 +29,7 @@
2829
#define RSP_CRC32 0x84
2930
#define RSP_READY 0x85
3031
#define RSP_SCAN 0x86
32+
#define RSP_MEMBW 0x87
3133

3234
/* ACK status codes */
3335
#define ACK_OK 0x00

agent/test_agent.c

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -448,6 +448,79 @@ static void test_cobs_roundtrip_all_crc_patterns(void) {
448448
}
449449
}
450450

451+
/*
452+
* CMD_MEMBW request framing: host sends [size:4LE][iters:4LE][addr:4LE].
453+
* The handler runs on ARM hardware (CCNT register), but we can verify
454+
* the request and response packets round-trip through proto_send/recv
455+
* with the right shape — that's what catches wire-format mismatches
456+
* between agent C and host Python.
457+
*/
458+
static void test_proto_membw_request_framing(void) {
459+
mock_reset();
460+
461+
/* Host → device: 12-byte payload */
462+
uint8_t req[12];
463+
uint32_t size = 4u * 1024u * 1024u;
464+
uint32_t iters = 8;
465+
uint32_t addr = 0x40400000u;
466+
req[0] = (size >> 0) & 0xFF; req[1] = (size >> 8) & 0xFF;
467+
req[2] = (size >> 16) & 0xFF; req[3] = (size >> 24) & 0xFF;
468+
req[4] = (iters >> 0) & 0xFF; req[5] = (iters >> 8) & 0xFF;
469+
req[6] = (iters >> 16) & 0xFF; req[7] = (iters >> 24) & 0xFF;
470+
req[8] = (addr >> 0) & 0xFF; req[9] = (addr >> 8) & 0xFF;
471+
req[10] = (addr >> 16) & 0xFF; req[11] = (addr >> 24) & 0xFF;
472+
473+
proto_send(CMD_MEMBW, req, 12);
474+
475+
memcpy(mock_rx, mock_tx, mock_tx_len);
476+
mock_rx_len = mock_tx_len;
477+
mock_rx_pos = 0;
478+
479+
uint8_t buf[MAX_PAYLOAD + 16];
480+
uint32_t len = 0;
481+
uint8_t cmd = proto_recv(buf, &len, 1000);
482+
ASSERT(cmd == CMD_MEMBW, "membw request: command opcode");
483+
ASSERT(len == 12, "membw request: payload length");
484+
ASSERT(memcmp(buf, req, 12) == 0, "membw request: payload bytes");
485+
}
486+
487+
static void test_proto_membw_response_framing(void) {
488+
mock_reset();
489+
490+
/* Device → host: 32-byte response. Build with synthetic values that
491+
* exercise all 8 little-endian word fields. */
492+
uint8_t resp[32];
493+
uint32_t fields[8] = {
494+
0x40400000u, /* base */
495+
4u << 20, /* size = 4 MiB */
496+
8u, /* iters */
497+
24000000u, /* timer_hz */
498+
123456u, /* memset_ticks */
499+
654321u, /* read_ticks */
500+
999999u, /* memcpy_ticks */
501+
1u, /* cpu_arch */
502+
};
503+
for (int i = 0; i < 8; i++) {
504+
resp[i*4 + 0] = (fields[i] >> 0) & 0xFF;
505+
resp[i*4 + 1] = (fields[i] >> 8) & 0xFF;
506+
resp[i*4 + 2] = (fields[i] >> 16) & 0xFF;
507+
resp[i*4 + 3] = (fields[i] >> 24) & 0xFF;
508+
}
509+
510+
proto_send(RSP_MEMBW, resp, 32);
511+
512+
memcpy(mock_rx, mock_tx, mock_tx_len);
513+
mock_rx_len = mock_tx_len;
514+
mock_rx_pos = 0;
515+
516+
uint8_t buf[MAX_PAYLOAD + 16];
517+
uint32_t len = 0;
518+
uint8_t cmd = proto_recv(buf, &len, 1000);
519+
ASSERT(cmd == RSP_MEMBW, "membw response: command opcode");
520+
ASSERT(len == 32, "membw response: payload length");
521+
ASSERT(memcmp(buf, resp, 32) == 0, "membw response: payload bytes");
522+
}
523+
451524
/*
452525
* page_is_ff helper: verify it correctly identifies all-0xFF pages
453526
* and rejects pages with even a single non-0xFF byte.
@@ -548,6 +621,8 @@ int main(void) {
548621
test_proto_recv_bad_crc();
549622
test_proto_max_payload();
550623
test_proto_multiple_packets();
624+
test_proto_membw_request_framing();
625+
test_proto_membw_response_framing();
551626

552627
printf("Cross-compatibility:\n");
553628
test_cobs_matches_python();

0 commit comments

Comments
 (0)