Skip to content

Commit b07ce50

Browse files
widgetiiclaude
andauthored
agent: add hi3520dv200 + HISFC350 SPI flash driver (#96)
## Summary - Adds support for **hi3520dv200** (V1-era HiSilicon DVR/NVR SoC, Cortex-A9, Macronix MX25L25635E 32 MiB NOR on CS1) to the bare-metal flash agent. - Introduces a **HISFC350** SPI flash driver (\`spi_flash_hisfc350.c\`) selectable via \`SPI_DRIVER=hisfc350\` in the per-SoC Makefile stanza. Default stays FMC100 for every other supported SoC. - Fixes a long-standing **selfupdate cache-coherency bug** that affected ARMv7-A in general (the trampoline copy never cleaned D-cache, so the BX target read stale memory; survived on Cortex-A7 by coincidence, deterministically broken on Cortex-A9). - Bumps INFO response to v3 with a \`flash_mem\` field so the host stops hardcoding \`0x14000000\` (hi3520dv200 has FLASH_MEM at \`0x58000000\`). ## What's new | File | Change | |---|---| | \`agent/spi_flash_hisfc350.c\` (new) | NOR-only HISFC350 driver. Probes both CS lines. EAR-based bank-switching for >16 MiB chips. | | \`agent/main.c\` | INFO v3 (+\`flash_mem\`); cache-aware selfupdate + trampoline; widened \`addr_readable\`/\`io_region\` for V1 peripheral block. | | \`agent/uart.c\` | Preserve bootrom IBRD/FBRD (works on chips where bootrom UART runs off a slow reference); divisor-scaling \`set_baud\`; bounded FIFO drain; IBRD safety check. | | \`agent/Makefile\` | \`SPI_DRIVER\` selector; hi3520dv200 SoC stanza. | | \`agent/README.md\` | Update SoC table; document HISFC350 vs FMC100. | | \`src/defib/agent/client.py\` | Parse \`flash_mem\` from INFO v3+; expose \`client.flash_mem\`; add hi3520dv200 to \`chip_to_agent\`. | | \`src/defib/cli/app.py\` | Use \`client.flash_mem\` instead of hardcoded \`0x14000000\` in \`agent install\`, \`agent read\`, flash-doctor. | ## Why HISFC350 instead of EN4B for 32 MiB chips EN4B (true 4-byte mode, 0xB7) was tried first on the MX25L25635E. The HISFC350's \`GLOBAL_CONFIG.ADDR_MODE_4B\` bit latched cleanly, but reads at offsets ≥16 MiB returned a fixed repeating pattern instead of real data — the chip never actually entered 4-byte mode, for reasons we haven't figured out. Switching to **WREAR (0xC5) + EAR-banking** (3-byte addressing with a 25th-bit selector) makes both halves accessible. The vendor U-Boot driver \`hisfc350_spi_mx25l25635e.c\` follows the same approach. ## Why selfupdate needed cache fixes The old trampoline did a byte-copy then \`BX r3\`. On ARMv7-A with write-back D-cache: 1. Byte writes go to D-cache, not memory. 2. I-fetch at the destination misses or hits stale lines. 3. Result: the BX jumps into garbage (or, when binaries happen to be byte-identical, into the right code "by accident"). This worked on the existing Cortex-A7 boards because we rarely deployed a binary with diverging early-startup bytes. On Cortex-A9 (hi3520dv200) it broke deterministically. Fix: - The trampoline now walks the destination range invalidating I-cache and cleaning D-cache per line (DCCMVAU + ICIMVAU + DSB + ISB + BX). - \`handle_selfupdate\` also cleans the trampoline location itself before calling the function pointer — without this, the trampoline bytes themselves are in D-cache and the I-fetch reads stale memory. ## Verified locally - \`make -C agent test HOST_CC=gcc\` — 5406 / 5406 unit tests pass. - \`uv run pytest tests/ -x -q --ignore=tests/fuzz\` — 490 pass. - \`uv run ruff check src/ tests/\` — clean. - \`uv run mypy src/defib/ --ignore-missing-imports\` — clean. - All 10 supported SoCs build: hi3516ev300, hi3516ev200, gk7205v200, gk7205v300, hi3516cv300, hi3516cv500, hi3518ev200, hi3516cv610, hi3519v101, hi3520dv200. ## Verified on real hardware (hi3520dv200 on /dev/ttyUSB3, manual power-cycle) - Agent uploads via \`defib agent upload\`, sends READY, reports \`JEDEC c22019 / 32768 KB / sector 64 KB / RAM 0x80000000 / FLASH_MEM 0x58000000 / agent v3\`. - Reads from the lower 16 MiB return real vendor U-Boot bytes; reads from the upper 16 MiB return DIFFERENT real data (bank-switching works end to end). - Selfupdate verified end-to-end (no power-cycle needed for subsequent agent updates). ## Known limitations / follow-ups Tracked separately in [#97] (issue opened in this PR sequence). Briefly: - High baud (\`set_baud\` >115200) is **rejected** by the IBRD safety check on hi3520dv200 because the bootrom hands UART running off a ~2 MHz reference. Vendor U-Boot \`board.c\` clears bit 13 of \`CRG+0xE4\` to switch UART onto APB; we tried that but the resulting UART rate didn't match any guess at the actual APB clock. Need to identify the real clock empirically (probably needs a power-cycle relay so we can iterate without manual intervention). - Stock-firmware dump and write/erase tests deferred until high-baud works (32 MiB at 115200 ≈ 54 min round-trip). ## Test plan - [ ] On a power-cycle relay-equipped rig: run \`defib agent upload -c hi3520dv200 -p \$PORT\`, confirm JEDEC + 32 MiB. - [ ] Dump full 32 MiB to a file, verify CRC, store as stock-firmware backup. - [ ] Erase a known-empty sector (e.g. \`0xFE0000\`), verify it reads as 0xFF. - [ ] Write a small known pattern, read back, verify CRC matches. - [ ] Re-flash the original sector contents, verify chip is back to stock. - [ ] Test scan over the full 32 MiB. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Dmitry Ilyin <widgetii@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 89576b4 commit b07ce50

8 files changed

Lines changed: 692 additions & 45 deletions

File tree

agent/.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
11
*.o
22
*.elf
33
*-padded.bin
4+
# Host-built unit-test binaries (built by `make test HOST_CC=gcc`).
5+
test_agent
6+
test_cobs_detail
7+
test_firmware_data

agent/Makefile

Lines changed: 44 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -119,8 +119,39 @@ else ifeq ($(SOC),hi3519v101)
119119
WDT_BASE = 0x12080000
120120
CRG_BASE = 0x12010000
121121
SYSCTRL_REBOOT = 0x12020004
122+
else ifeq ($(SOC),hi3520dv200)
123+
# V1-era DVR/NVR generation — Cortex-A9 single-core, 0x2xxxxxxx peripheral
124+
# map, HISFC350 SPI controller (NOT the FMC100 used by V3+/V4+/V5/V6).
125+
# UART_CLOCK = CFG_CLK_BUS / 2 (per vendor Linux mach-godarm/core.c
126+
# uart_clk_init: busclk=396 MHz / 2 = 198 MHz). The bootrom hands us a
127+
# UART running off a slow ~2 MHz reference (IBRD=1, FBRD=5), so uart_init
128+
# must first clear UART_CKSEL bit 13 of CRG+0xE4 to switch UART to APB
129+
# clock — only then can we hit baud rates above 115200.
130+
# FLASH_MEM at 0x58000000 per vendor mach-godarm SPI_MEM_BASE;
131+
# FMC_BASE at 0x10010000 (HISFC350 register block).
132+
CPU_TYPE = cortex-a9
133+
UART_BASE = 0x20080000
134+
# The bootrom hands UART running off a slow ~2 MHz reference clock
135+
# (IBRD=1 / FBRD=5 → 115200 baud). uart_init preserves those divisors
136+
# so 115200 keeps working without any CRG poking. set_baud above
137+
# 115200 cannot succeed at this clock (would need IBRD<1) and is
138+
# rejected by the IBRD safety check. Switching to APB via vendor's
139+
# `CRG+0xE4 bit 13` produced an unidentified clock rate that we
140+
# couldn't decode at any standard baud — see GitHub issue tracking
141+
# the high-baud TODO. UART_CLOCK is the vendor U-Boot value
142+
# (CFG_CLK_BUS/4 per include/configs/hi3520d.h) but is unused while
143+
# the bootrom-divisor preservation path is active.
144+
UART_CLOCK = 99000000
145+
LOAD_ADDR = 0x81000000
146+
FLASH_MEM = 0x58000000
147+
FMC_BASE = 0x10010000
148+
RAM_BASE = 0x80000000
149+
WDT_BASE = 0x20040000
150+
CRG_BASE = 0x20030000
151+
SYSCTRL_REBOOT = 0x20050004
152+
SPI_DRIVER = hisfc350
122153
else
123-
$(error Unknown SOC: $(SOC). Supported: hi3516ev300 hi3516ev200 gk7205v200 gk7205v300 hi3516cv300 hi3516cv500 hi3518ev200 hi3516cv610 hi3519v101)
154+
$(error Unknown SOC: $(SOC). Supported: hi3516ev300 hi3516ev200 gk7205v200 gk7205v300 hi3516cv300 hi3516cv500 hi3518ev200 hi3516cv610 hi3519v101 hi3520dv200)
124155
endif
125156

126157
# Per-SoC CPU. V3 chips (cv300) are ARM926EJ-S (ARMv5TEJ);
@@ -136,6 +167,7 @@ CFLAGS = -mcpu=$(CPU_TYPE) -marm -O2 -ffreestanding -nostdlib \
136167
-DUART_BASE=$(UART_BASE) -DUART_CLOCK=$(UART_CLOCK) $(CPU_FLAG) \
137168
-DFLASH_MEM=$(FLASH_MEM) -DFMC_BASE=$(FMC_BASE) -DRAM_BASE=$(RAM_BASE) -DWDT_BASE=$(WDT_BASE) \
138169
-DCRG_BASE=$(CRG_BASE) -DSYSCTRL_REBOOT=$(SYSCTRL_REBOOT) \
170+
$(if $(UART_CKSEL_REG),-DUART_CKSEL_REG=$(UART_CKSEL_REG) -DUART_CKSEL_BIT=$(UART_CKSEL_BIT)) \
139171
-mno-unaligned-access -Wall -Wextra
140172

141173
LDFLAGS = -nostdlib -T link.ld -Ttext=$(LOAD_ADDR) --defsym=LOAD_ADDR=$(LOAD_ADDR)
@@ -145,7 +177,17 @@ LDFLAGS = -nostdlib -T link.ld -Ttext=$(LOAD_ADDR) --defsym=LOAD_ADDR=$(LOAD_ADD
145177
# but linking it costs nothing.
146178
LIBGCC := $(shell $(CC) -mcpu=$(CPU_TYPE) -print-libgcc-file-name)
147179

148-
SRCS_C = main.c uart.c protocol.c cobs.c spi_flash.c
180+
# SPI flash driver — default FMC100 (V3+/V4+/V5/V6 chips). V1-era chips
181+
# (hi3520dv200, hi3518ev100, hi3516cv100, hi3535) need HISFC350 instead.
182+
# Per-SoC stanza above sets SPI_DRIVER when needed.
183+
SPI_DRIVER ?= fmc100
184+
ifeq ($(SPI_DRIVER),fmc100)
185+
SPI_FLASH_SRC = spi_flash.c
186+
else
187+
SPI_FLASH_SRC = spi_flash_$(SPI_DRIVER).c
188+
endif
189+
190+
SRCS_C = main.c uart.c protocol.c cobs.c $(SPI_FLASH_SRC)
149191
SRCS_S = startup.S
150192
OBJS = $(SRCS_S:.S=.o) $(SRCS_C:.c=.o)
151193

agent/README.md

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -78,15 +78,30 @@ Requires `arm-none-eabi-gcc` (Arch: `pacman -S arm-none-eabi-gcc arm-none-eabi-n
7878
| hi3516cv500 | 0x12100000 | 0x14000000 | 0x80000000 | 0x81000000 |
7979
| hi3518ev200 | 0x12100000 | 0x14000000 | 0x80000000 | 0x81000000 |
8080
| hi3516cv610 | 0x11040000 | 0x14000000 | 0x40000000 | 0x41000000 |
81+
| hi3519v101 | 0x12100000 | 0x14000000 | 0x80000000 | 0x81000000 |
82+
| hi3520dv200 | 0x20080000 | 0x58000000 | 0x80000000 | 0x81000000 |
8183

8284
Addresses from [qemu-hisilicon](https://github.com/OpenIPC/LoTool) hardware definitions.
8385

86+
`hi3520dv200` is a V1-era DVR/NVR SoC: Cortex-A9 (single core), `0x2xxxxxxx`
87+
peripheral map, and a HISFC350 SPI flash controller (NOT the FMC100 used by
88+
all other supported SoCs). The HISFC350 driver lives in
89+
`spi_flash_hisfc350.c` and is selected via `SPI_DRIVER=hisfc350` in the
90+
per-SoC Makefile stanza.
91+
8492
## Agent Binary Details
8593

86-
- **Size**: ~4 KB code + 1 KB CRC32 table = ~5 KB total
94+
- **Size**: ~4 KB code + 1 KB CRC32 table = ~5 KB total (HISFC350 build is
95+
somewhat larger because of the bank-switching machinery)
8796
- **Runs bare-metal**: no OS, no U-Boot, no libc
88-
- **UART**: ARM PrimeCell PL011 (polled I/O, 115200 8N1)
89-
- **Flash**: HiSilicon FMC controller (memory-mapped read, register write/erase)
97+
- **UART**: ARM PrimeCell PL011 (polled I/O, 115200 8N1). On hi3520dv200 the
98+
bootrom hands UART running off a slow ~2 MHz reference; we preserve those
99+
divisors at startup so 115200 keeps working without any CRG poking.
100+
- **Flash**: HiSilicon FMC controller (default) — memory-mapped read,
101+
register write/erase. V1-era chips (hi3520dv200) use the HISFC350
102+
controller instead, with the same external API (`flash_read`,
103+
`flash_erase_sector`, `flash_write_page`, ...) so `main.c` is
104+
controller-agnostic.
90105
- **Protocol**: COBS framing + CRC-32 per packet
91106

92107
### Startup sequence

agent/main.c

Lines changed: 90 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -100,18 +100,31 @@ static int addr_readable(uint32_t addr, uint32_t size) {
100100
/* RAM region: RAM_BASE to RAM_BASE + 128MB */
101101
if (addr >= RAM_BASE && (addr + size) <= (RAM_BASE + 128 * 1024 * 1024))
102102
return 1;
103-
/* I/O register regions (CRG, FMC controller, system controller) */
103+
/* I/O register regions — V3+/V4+ default whitelist. */
104104
if (addr >= 0x10000000 && (addr + size) <= 0x10001000) return 1; /* FMC regs */
105105
if (addr >= 0x12010000 && (addr + size) <= 0x12020000) return 1; /* CRG */
106106
if (addr >= 0x12020000 && (addr + size) <= 0x12030000) return 1; /* SYS_CTRL */
107+
/* Per-SoC controller regions (FMC + CRG + UART) — covers V1-era
108+
* chips where these don't sit in the default whitelist (e.g.
109+
* hi3520dv200 has FMC at 0x10010000, CRG at 0x20030000, UART at
110+
* 0x20080000). 4 KiB window per controller is enough for any
111+
* HiSilicon SPI-flash controller. UART access is needed to
112+
* verify what divisors the bootrom programmed. */
113+
if (addr >= FMC_BASE && (addr + size) <= (FMC_BASE + 0x1000)) return 1;
114+
if (addr >= CRG_BASE && (addr + size) <= (CRG_BASE + 0x1000)) return 1;
115+
if (addr >= UART_BASE && (addr + size) <= (UART_BASE + 0x1000)) return 1;
107116
/* Flash memory-mapped window — only after flash_init succeeds */
108117
if (flash_readable && addr >= FLASH_MEM && (addr + size) <= (FLASH_MEM + 32 * 1024 * 1024))
109118
return 1;
110119
return 0;
111120
}
112121

113-
/* Agent protocol version — increment on protocol changes */
114-
#define AGENT_VERSION 2
122+
/* Agent protocol version — increment on protocol changes.
123+
* v3 added: flash_mem at INFO bytes 24..27 (so the host knows which
124+
* memory-mapped flash window CMD_CRC32/CMD_READ should target on
125+
* SoCs where it isn't 0x14000000 — e.g. hi3520dv200 has it at
126+
* 0x58000000). */
127+
#define AGENT_VERSION 3
115128

116129
/* Capability flags — advertise supported features */
117130
#define CAP_FLASH_STREAM (1 << 0) /* CMD_FLASH_STREAM with double-buffer */
@@ -126,7 +139,7 @@ static int addr_readable(uint32_t addr, uint32_t size) {
126139
CAP_SET_BAUD | CAP_REBOOT | CAP_SELFUPDATE | CAP_SCAN)
127140

128141
static void handle_info(void) {
129-
uint8_t resp[24];
142+
uint8_t resp[28];
130143
/* JEDEC ID in first 4 bytes (3 bytes + padding) */
131144
resp[0] = flash_info.jedec_id[0];
132145
resp[1] = flash_info.jedec_id[1];
@@ -137,7 +150,8 @@ static void handle_info(void) {
137150
write_le32(&resp[12], flash_info.sector_size); /* NOR=64K, NAND=128K */
138151
write_le32(&resp[16], AGENT_VERSION);
139152
write_le32(&resp[20], AGENT_CAPS);
140-
proto_send(RSP_INFO, resp, 24);
153+
write_le32(&resp[24], FLASH_MEM);
154+
proto_send(RSP_INFO, resp, 28);
141155
}
142156

143157
static void handle_read(const uint8_t *data, uint32_t len) {
@@ -153,8 +167,13 @@ static void handle_read(const uint8_t *data, uint32_t len) {
153167
}
154168

155169
/* I/O registers require 32-bit word-aligned access (ldr not ldrb).
156-
* RAM and flash can use byte access. */
157-
int io_region = (addr >= 0x10000000 && addr < 0x13000000);
170+
* RAM and flash can use byte access. Cover the V3+/V4+/V5/V6
171+
* peripheral block (0x10000000..0x13000000) as well as the V1-era
172+
* regions actually used by V1 SoCs (FMC, CRG, UART). */
173+
int io_region = (addr >= 0x10000000 && addr < 0x13000000)
174+
|| (addr >= FMC_BASE && addr < FMC_BASE + 0x1000)
175+
|| (addr >= CRG_BASE && addr < CRG_BASE + 0x1000)
176+
|| (addr >= UART_BASE && addr < UART_BASE + 0x1000);
158177

159178
uint16_t seq = 0;
160179
uint32_t offset = 0;
@@ -693,19 +712,21 @@ static void handle_flash_stream(const uint8_t *data, uint32_t len) {
693712

694713
/*
695714
* ARM32 position-independent trampoline (machine code).
696-
* Copies r2 bytes from r1 to r0, then branches to r0-r2 (original dst).
715+
* Copies r2 bytes from r1 to r0, then branches to r0 (original dst).
697716
*
698-
* mov r3, r0 @ save dst
699-
* cmp r2, #0
700-
* beq done
701-
* loop:
702-
* ldrb r4, [r1], #1
703-
* strb r4, [r0], #1
704-
* subs r2, r2, #1
705-
* bne loop
706-
* done:
707-
* bx r3 @ jump to saved dst
717+
* ARMv7-A variant additionally cleans D-cache and invalidates I-cache
718+
* over the destination range before BX so the I-fetch unit pulls the
719+
* just-written bytes from memory instead of stale lines from when the
720+
* old agent occupied that address. Without this, selfupdate either
721+
* runs with stale instructions (silent crash, no UART) or — when both
722+
* binaries happen to be byte-identical at the cached lines — works by
723+
* coincidence.
724+
*
725+
* ARM926 (ARMv5TEJ) keeps the original short trampoline; if/when we
726+
* actually exercise selfupdate on ARM926 boards we'll need the same
727+
* treatment with V5 cache-op encodings.
708728
*/
729+
#ifdef CPU_ARM926
709730
static const uint32_t trampoline_arm[] = {
710731
0xe1a03000, /* mov r3, r0 ; save dst */
711732
0xe3520000, /* cmp r2, #0 */
@@ -718,6 +739,36 @@ static const uint32_t trampoline_arm[] = {
718739
/* done: */
719740
0xe12fff13, /* bx r3 ; jump to dst */
720741
};
742+
#else
743+
/* ARMv7-A: copy + per-line DCCMVAU + ICIMVAU + DSB + ISB + BX.
744+
* Source: agent/tramp_v7.S, assembled with arm-none-eabi-gcc -mcpu=cortex-a7.
745+
* Walks the destination in 32-byte cache-line steps (smallest line size
746+
* among supported ARMv7-A cores; over-walking is harmless). */
747+
static const uint32_t trampoline_arm[] = {
748+
0xe1a03000, /* mov r3, r0 ; save dst */
749+
0xe1a05002, /* mov r5, r2 ; save size */
750+
0xe3520000, /* cmp r2, #0 */
751+
0x0a000003, /* beq done */
752+
/* copy_loop: */
753+
0xe4d14001, /* ldrb r4, [r1], #1 */
754+
0xe4c04001, /* strb r4, [r0], #1 */
755+
0xe2522001, /* subs r2, r2, #1 */
756+
0x1afffffb, /* bne copy_loop */
757+
/* done: */
758+
0xe1a00003, /* mov r0, r3 ; restore dst start */
759+
0xe0836005, /* add r6, r3, r5 ; end = dst + size */
760+
0xe3c0001f, /* bic r0, r0, #31 ; round down to cache line */
761+
/* cache_loop: */
762+
0xee070f3b, /* mcr p15, 0, r0, c7, c11, 1 ; DCCMVAU */
763+
0xee070f35, /* mcr p15, 0, r0, c7, c5, 1 ; ICIMVAU */
764+
0xe2800020, /* add r0, r0, #32 */
765+
0xe1500006, /* cmp r0, r6 */
766+
0xbafffffa, /* blt cache_loop */
767+
0xf57ff04f, /* dsb sy */
768+
0xf57ff06f, /* isb sy */
769+
0xe12fff13, /* bx r3 */
770+
};
771+
#endif
721772

722773
/*
723774
* CMD_SELFUPDATE: receive new agent binary, verify, copy and jump.
@@ -779,6 +830,27 @@ static void handle_selfupdate(const uint8_t *data, uint32_t len) {
779830
for (uint32_t i = 0; i < sizeof(trampoline_arm) / sizeof(uint32_t); i++)
780831
tramp_dst[i] = trampoline_arm[i];
781832

833+
/* CRITICAL: writes above land in D-cache. Without explicit cache
834+
* maintenance the CPU's I-fetch at RAM_BASE+0x200 reads stale memory
835+
* contents (whatever was there before — typically zeros) instead of
836+
* the trampoline bytes, and execution branches into garbage.
837+
* Clean D-cache for the trampoline range to push writes to memory,
838+
* then invalidate the entire I-cache so new fetches hit memory. */
839+
#ifndef CPU_ARM926
840+
{
841+
uintptr_t start = (uintptr_t)tramp_dst & ~31u;
842+
uintptr_t end = (uintptr_t)tramp_dst + sizeof(trampoline_arm);
843+
for (uintptr_t a = start; a < end; a += 32) {
844+
asm volatile("mcr p15, 0, %0, c7, c11, 1" :: "r"(a) : "memory");
845+
}
846+
asm volatile("dsb" ::: "memory");
847+
asm volatile("mcr p15, 0, %0, c7, c5, 0" :: "r"(0) : "memory"); /* ICIALLU */
848+
asm volatile("mcr p15, 0, %0, c7, c5, 6" :: "r"(0) : "memory"); /* BPIALL */
849+
asm volatile("dsb" ::: "memory");
850+
asm volatile("isb" ::: "memory");
851+
}
852+
#endif
853+
782854
void (*tramp)(uint32_t, uint32_t, uint32_t) =
783855
(void (*)(uint32_t, uint32_t, uint32_t))(void *)(RAM_BASE + 0x200);
784856
tramp(addr, staging, size);

0 commit comments

Comments
 (0)