From c519f8eaa9e4192bbc0db6f2447761938f7dbf73 Mon Sep 17 00:00:00 2001 From: Dmitry Ilyin Date: Mon, 11 May 2026 23:41:38 +0300 Subject: [PATCH] agent: post-erase verify must use register-mode read past 1 MB MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit flash_verify_erased read sample bytes directly from FLASH_MEM (the memory-mapped window), which on hi3516ev300 wraps at 1 MB. For any sector at offset ≥ 0x100000 the verify read returned bytes from sector (offset % 0x100000) instead of the actual just-erased sector, so the smoke test saw non-0xFF data and reported ACK_FLASH_ERROR even though the erase succeeded. Effect: write_flash to any offset past 1 MB on hi3516ev300 (and any other SoC where the boot-mode memory window wraps at 1 MB) silently failed. Visible on W25Q128 (16 MB NOR) — 12 sectors of a kernel write completed, then sector 13 at flash offset 0x110000 failed with ACK_FLASH_ERROR (0x02). Same chip programmed cleanly via U-Boot's `sf write`, which the agent's CRC32-based higher-level path also verified, so the bug was localised to the post-erase smoke test. Fix: route the verify reads through flash_read() (register-mode SPI READ via FMC normal-mode), the same path flash_read_full has used since the 1 MB-window workaround landed. The 1 MB-window-wraps hazard exists for the verify path with identical reasoning. Confirmed on hardware against rack pod 10.216.128.69 (hi3516ev300 + W25Q128): Before fix: 0x00050000: OK in 6.4s CRC match=True ← <1 MB 0x000C0000: OK in 6.8s CRC match=True ← <1 MB 0x00110000: FAIL in 6.1s ← =1 MB+0x10000 0x00350000: FAIL in 6.1s ← 3.3 MB 0x00F00000: FAIL in 6.1s ← 15 MB After fix: 0x00050000: OK in 6.4s CRC match=True 0x000C0000: OK in 6.3s CRC match=True 0x00110000: OK in 6.2s CRC match=True ✓ 0x00350000: OK in 6.3s CRC match=True ✓ 0x00F00000: OK in 6.3s CRC match=True ✓ Full nor-neo install through the agent (kernel 2.0 MB + rootfs 4.2 MB) now completes end-to-end in 92 s at 81 KB/s sustained, Linux boots to `openipc-hi3516ev300 login:`. Suite: 480 passed / 2 skipped; agent C tests: 5406/5406 passed. Co-Authored-By: Claude Opus 4.7 (1M context) --- agent/spi_flash.c | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-) diff --git a/agent/spi_flash.c b/agent/spi_flash.c index fda4a2c..96e1f34 100644 --- a/agent/spi_flash.c +++ b/agent/spi_flash.c @@ -649,16 +649,26 @@ uint8_t flash_read_status(void) { * Samples first/last 16 bytes of the range — fast (one register-mode * cycle each) and high-signal: real silent-erase leaves the original * data verbatim, which is essentially never all-FF in practice. - * Returns 0 if all sampled bytes are 0xFF, -1 otherwise. */ + * Returns 0 if all sampled bytes are 0xFF, -1 otherwise. + * + * Uses flash_read() (register-mode SPI READ) instead of direct + * memory-mapped access. The boot-mode memory window at FLASH_MEM + * wraps at 1 MB on some SoCs (hi3516ev300 confirmed), so for any + * sector at offset ≥ 0x100000 a direct memory-mapped read returns + * stale data from sector (addr % 0x100000) and falsely fails the + * verify even though the erase succeeded. flash_read() goes through + * the FMC's normal-mode SPI READ path which addresses the full chip. */ static int flash_verify_erased(uint32_t addr, uint32_t len) { - const uint8_t *p = (const uint8_t *)(FLASH_MEM + addr); - uint32_t head = len < 16 ? len : 16; + uint8_t buf[16]; + uint32_t head = len < sizeof(buf) ? len : sizeof(buf); + flash_read(addr, buf, head); for (uint32_t i = 0; i < head; i++) { - if (p[i] != 0xFF) return -1; + if (buf[i] != 0xFF) return -1; } if (len > 32) { - for (uint32_t i = len - 16; i < len; i++) { - if (p[i] != 0xFF) return -1; + flash_read(addr + len - sizeof(buf), buf, sizeof(buf)); + for (uint32_t i = 0; i < sizeof(buf); i++) { + if (buf[i] != 0xFF) return -1; } } return 0;