Skip to content

agent: bump client.crc32 timeout for V1-era SoCs#98

Merged
widgetii merged 1 commit into
masterfrom
agent-crc32-timeout-v1soc
May 13, 2026
Merged

agent: bump client.crc32 timeout for V1-era SoCs#98
widgetii merged 1 commit into
masterfrom
agent-crc32-timeout-v1soc

Conversation

@widgetii
Copy link
Copy Markdown
Member

Summary

client.crc32()'s timeout formula max(10, 5 + size_MiB) is too aggressive for V1-era HiSilicon SoCs (hi3520dv200 / HISFC350 controller) where the agent walks flash via the AHB STD READ window at ~150 KB/s.

A 4 MiB CRC32 needed ~27 s on hi3520dv200 but the host bailed at 10 s and reported "No packet received within 10s" — misleading: the device kept computing and the response was already in flight.

New formula: max(15, 5 + size / (100 * 1024)) — assumes 100 KB/s of effective throughput (safe under the slow V1 path) with a 15 s baseline. V3+/V4+ DMA reads at multi-MB/s still complete well under the ceiling.

Why this matters

Caught after a real install on hi3520dv200: a relocation script crashed at the 10 s CRC timeout, forcing me to split the operation into two scripts. The second script's "erase old location" then wiped 2.74 MiB of the freshly-written-and-verified rootfs because it didn't subtract the new write's footprint — see OpenIPC/firmware#2089 (closed) for the full incident write-up.

The bug was operator-level (not in defib's blessed install flow), but the too-tight CRC timeout was the contributing factor that drove me to the staged-script approach where the overlap mistake became easy to make. Fixing this is the minimum-viable defib change that would have averted the chain.

Test plan

  • All 490 Python tests pass (uv run pytest tests/ -x -q --ignore=tests/fuzz)
  • ruff + mypy clean
  • Verified on real hi3520dv200 hardware: 3.74 MiB CRC completes in ~25 s and matches the source CRC32

🤖 Generated with Claude Code

The old timeout formula `max(10, 5 + size/MiB)` assumed a fast DMA read
path through the agent's flash_crc32. That's true on V3+/V4+/V5/V6 SoCs
(FMC100, multi-MB/s), but on V1-era HiSilicon parts the agent uses an
AHB STD READ via the memory-mapped window (HISFC350, hi3520dv200) which
tops out at ~150 KB/s when the agent walks each byte through the
controller. At that rate a 4 MiB CRC32 takes ~27 s — the old formula
bailed at 10 s and surfaced misleading "agent stopped responding" /
"No packet received within 10s" errors even though the device was
still computing the CRC and the response was already on its way.

Bumping to `max(15, 5 + size/100KB)` gives 100 KB/s of headroom — well
under the worst-case V1 path's actual rate, plus a 15 s baseline for
round-trip latency. V3+/V4+ chips at multi-MB/s still complete in well
under the new ceiling.

Verified against hi3520dv200 (MX25L25635E 32 MiB NOR on CS1): full
3.74 MiB rootfs CRC32 now completes in ~25 s and matches every time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@widgetii widgetii merged commit 3430c6a into master May 13, 2026
13 checks passed
@widgetii widgetii deleted the agent-crc32-timeout-v1soc branch May 13, 2026 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant