Skip to content

Commit deabdc2

Browse files
widgetiiclaude
andauthored
burn/install: route through pod-side fastboot when power=rack (#88)
## Summary `defib burn` and `defib install` drove the HiSilicon SPL upload from the host even when the transport went through a rack pod's WiFi-bridged UART, where the per-frame ACK loop (150 ms × dozens of frames per upload) doesn't survive the round-trip latency — both commands failed at the very first PRESTEP0 frame. Now, when `power_controller` is a `RackController`, the CLI calls the new `defib.recovery.rack_fastboot.run_rack_fastboot()` helper instead of `session.run()`. The helper packages profile + SPL + agent into the binary blob the pod's `POST /fastboot` expects, posts it, and turns the pod's phase-by-phase JSON into a `RecoveryResult` so the rest of the CLI (terminal mode, download_process detection, TFTP scripting) stays unchanged. The pod takes exclusive UART access during the upload, so the host transport is opened only **after** fastboot returns. ## Live verification on the prototype ``` $ DEFIB_POWER_TYPE=rack DEFIB_RACK_HOST=10.216.128.69 \ defib burn -c hi3516ev300 -p tcp://10.216.128.69:9000 \ --power-cycle --break Power: rack pod HTTP API Pod-side fastboot in progress… rack fastboot: spl=17408 agent=236195 spl_addr=0x4010500 ddr_addr=0x4013000 uboot_addr=0x41000000 Done! (25678ms) ``` ``` $ # camera halted at the freshly-uploaded U-Boot prompt > version U-Boot 2016.11-g131d3f2 (May 08 2026 - 11:58:25 +0000) hi3516ev300 OpenIPC # ``` Build `g131d3f2` is **distinct from** the in-flash build (`g6d2ed0c-dirty`, Mar 2023) — proves the burn landed in RAM and the chip jumped to the new image rather than falling through to flash. ## Install + restore scope - **`install`'s Phase 1** (burn-to-RAM) now uses the same fastboot path; Phase 2 (U-Boot `tftp` + `sf write` scripting) goes over the bridge as ordinary text commands and is already known to work — TFTP-through-pod-NAPT was verified during the earlier manual kernel restore at 167 KB/s. - **`restore`** has its own shape (frame-blast started before power-on, then power-on triggers the catch) that doesn't map cleanly onto fastboot's all-in-one semantics. Left out of scope for this PR; can be a follow-up if needed. ## Architecture note The SPL-boundary detection (`HiSiliconStandard._detect_spl_size`) and the 0xFF-run zeroing (`_zero_long_ff_runs`) stay on the host. The pod gets ready-to-send bytes. This keeps the pod firmware minimal and ensures the two paths (host-driven and pod-driven) stay byte-identical for any chip we test. ## Test plan - [ ] `uv run pytest tests/ -x -v --ignore=tests/fuzz` (**461 passed / 2 skipped**) - [ ] `uv run ruff check src/defib/ tests/` - [ ] `uv run mypy src/defib/cli/app.py src/defib/recovery/rack_fastboot.py --ignore-missing-imports` - [ ] 4 new `TestRunRackFastboot` cases cover success path, PRESTEP0 failure attribution, profile-address packing, and the `agent_payload` override used by agent-flash. - [ ] Regression: existing local-UART burn / install paths unchanged — both still go through `session.run` when power controller is RouterOS / Vectis / None. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Dmitry Ilyin <widgetii@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 004c09d commit deabdc2

3 files changed

Lines changed: 330 additions & 34 deletions

File tree

src/defib/cli/app.py

Lines changed: 96 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -145,6 +145,13 @@ def _dl_progress(done: int, total: int) -> None:
145145
if output == "human":
146146
console.print(f"Power: [cyan]{power_controller.name()}[/cyan]")
147147

148+
# Detect rack-pod power early: the pod runs the entire HiSilicon SPL
149+
# upload locally over its UART, so we MUST NOT have a TCP client on
150+
# tcp://<pod>:9000 during the upload. Defer transport-open until
151+
# after fastboot returns.
152+
from defib.power.rack import RackController
153+
use_rack_fastboot = isinstance(power_controller, RackController)
154+
148155
try:
149156
session = RecoverySession(
150157
chip=chip, firmware_path=firmware_path,
@@ -163,21 +170,24 @@ def _dl_progress(done: int, total: int) -> None:
163170
console.print(f"Protocol: [cyan]{session.protocol_name}[/cyan]")
164171
console.print(f"Port: [cyan]{port}[/cyan]")
165172

166-
# Use platform-aware transport factory
167-
try:
168-
from defib.transport.serial_platform import create_transport, normalize_port_name
169-
transport = await create_transport(normalize_port_name(port))
170-
except Exception as e:
171-
if output == "json":
172-
print(json_mod.dumps({"event": "error", "message": f"Serial port error: {e}"}))
173-
else:
174-
console.print(f"[red]Failed to open serial port:[/red] {e}")
175-
raise typer.Exit(2)
173+
# Use platform-aware transport factory. Skip on rack-fastboot path —
174+
# the pod needs exclusive UART access during the upload.
175+
from defib.transport.serial_platform import create_transport, normalize_port_name
176+
transport = None
177+
if not use_rack_fastboot:
178+
try:
179+
transport = await create_transport(normalize_port_name(port))
180+
except Exception as e:
181+
if output == "json":
182+
print(json_mod.dumps({"event": "error", "message": f"Serial port error: {e}"}))
183+
else:
184+
console.print(f"[red]Failed to open serial port:[/red] {e}")
185+
raise typer.Exit(2)
176186

177187
# Vectis: hand the live RFC 2217 transport (or legacy raw TCP) to
178188
# the controller so RTS/DTR toggles ride the same connection that
179189
# the UART data uses — Vectis only allows one client at a time.
180-
if power_controller is not None:
190+
if power_controller is not None and transport is not None:
181191
from defib.power.vectis import VectisController
182192
from defib.transport.rfc2217 import Rfc2217Transport
183193
from defib.transport.socket import SocketTransport
@@ -238,12 +248,42 @@ def on_log(event: LogEvent) -> None:
238248
console.print(f"[{style}]{event.message}[/{style}]")
239249

240250
try:
241-
result = await session.run(
242-
transport,
243-
on_progress=on_progress,
244-
on_log=on_log,
245-
send_break=send_break,
246-
)
251+
if use_rack_fastboot:
252+
# Pod-side fastboot: pod runs handshake + DDR + SPL + U-Boot
253+
# locally on its UART (microsecond ACK latency). No host
254+
# transport during the upload.
255+
from pathlib import Path
256+
from defib.recovery.rack_fastboot import run_rack_fastboot
257+
assert isinstance(power_controller, RackController)
258+
on_log(LogEvent(level="info", message="Pod-side fastboot in progress…"))
259+
firmware_bytes = Path(firmware_path).read_bytes()
260+
result = await run_rack_fastboot(
261+
power_controller, chip, firmware_bytes,
262+
)
263+
# Now open the transport for the post-burn flow (terminal /
264+
# download_process detection / etc.).
265+
if result.success:
266+
transport = await create_transport(normalize_port_name(port))
267+
if send_break:
268+
# Spam Ctrl-C briefly to break U-Boot autoboot — the
269+
# host-side path does this inside send_firmware, but
270+
# we skipped that.
271+
import asyncio as _aio
272+
end = _aio.get_event_loop().time() + 2.0
273+
while _aio.get_event_loop().time() < end:
274+
try:
275+
await transport.write(b"\x03")
276+
except Exception:
277+
break
278+
await _aio.sleep(0.05)
279+
else:
280+
assert transport is not None # opened above when not use_rack_fastboot
281+
result = await session.run(
282+
transport,
283+
on_progress=on_progress,
284+
on_log=on_log,
285+
send_break=send_break,
286+
)
247287
finally:
248288
if progress_ctx is not None:
249289
progress_ctx.stop()
@@ -264,8 +304,10 @@ def on_log(event: LogEvent) -> None:
264304
console.print(f"\n[red bold]Failed:[/red bold] {result.error}")
265305

266306
if not result.success:
267-
await transport.close()
307+
if transport is not None:
308+
await transport.close()
268309
raise typer.Exit(1)
310+
assert transport is not None # success ⇒ transport opened
269311

270312
# Terminal mode: stream serial output until Ctrl-C
271313
# Auto-detects download_process mode and bridges XHEAD/XCMD framing.
@@ -1954,6 +1996,11 @@ async def _install_async(
19541996
if output == "human":
19551997
console.print(f" Power: [cyan]{power_controller.name()}[/cyan]")
19561998

1999+
# Detect rack-pod power: the pod runs the SPL/DDR/U-Boot upload
2000+
# locally, requires exclusive UART, so we open the transport AFTER.
2001+
from defib.power.rack import RackController
2002+
use_rack_fastboot = isinstance(power_controller, RackController)
2003+
19572004
session = RecoverySession(
19582005
chip=chip, firmware_path=str(cached),
19592006
power_controller=power_controller, poe_port=poe_port,
@@ -1964,16 +2011,18 @@ async def _install_async(
19642011
if not power_cycle:
19652012
console.print(" [yellow]Power-cycle the camera now![/yellow]")
19662013

1967-
transport = await create_transport(normalize_port_name(port))
2014+
transport = None
2015+
if not use_rack_fastboot:
2016+
transport = await create_transport(normalize_port_name(port))
19682017

1969-
# Vectis: share the TCP transport for Ctrl+P delivery (see burn).
1970-
if power_controller is not None:
1971-
from defib.power.vectis import VectisController
1972-
from defib.transport.socket import SocketTransport
1973-
if isinstance(power_controller, VectisController) and isinstance(
1974-
transport, SocketTransport
1975-
):
1976-
power_controller.attach_transport(transport)
2018+
# Vectis: share the TCP transport for Ctrl+P delivery (see burn).
2019+
if power_controller is not None:
2020+
from defib.power.vectis import VectisController
2021+
from defib.transport.socket import SocketTransport
2022+
if isinstance(power_controller, VectisController) and isinstance(
2023+
transport, SocketTransport
2024+
):
2025+
power_controller.attach_transport(transport)
19772026

19782027
def on_log(event: LogEvent) -> None:
19792028
if output == "human":
@@ -1984,19 +2033,32 @@ def on_progress(event: ProgressEvent) -> None:
19842033
if output == "human" and event.message:
19852034
console.print(f" {event.message}")
19862035

1987-
result = await session.run(
1988-
transport,
1989-
on_progress=on_progress,
1990-
on_log=on_log,
1991-
send_break=False,
1992-
)
2036+
if use_rack_fastboot:
2037+
from defib.recovery.rack_fastboot import run_rack_fastboot
2038+
assert isinstance(power_controller, RackController)
2039+
on_log(LogEvent(level="info", message="Pod-side fastboot in progress…"))
2040+
result = await run_rack_fastboot(
2041+
power_controller, chip, cached.read_bytes(),
2042+
)
2043+
if result.success:
2044+
transport = await create_transport(normalize_port_name(port))
2045+
else:
2046+
assert transport is not None # opened above when not use_rack_fastboot
2047+
result = await session.run(
2048+
transport,
2049+
on_progress=on_progress,
2050+
on_log=on_log,
2051+
send_break=False,
2052+
)
19932053

19942054
if not result.success:
19952055
console.print(f"[red]Burn failed:[/red] {result.error}")
1996-
await transport.close()
2056+
if transport is not None:
2057+
await transport.close()
19972058
if power_controller:
19982059
await power_controller.close()
19992060
raise typer.Exit(1)
2061+
assert transport is not None # success ⇒ transport opened
20002062

20012063
if output == "human":
20022064
console.print(f" [green]U-Boot loaded in {result.elapsed_ms:.0f}ms[/green]")
Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
"""Pod-side fastboot bring-up for rack-pod-controlled cameras.
2+
3+
When a rack pod is the power controller, the host can't reliably drive
4+
the HiSilicon SPL boot protocol over its WiFi-bridged UART — the
5+
per-frame ACK loop (150 ms timeout × dozens of frames per upload)
6+
doesn't survive WiFi RTT. Instead the pod runs the entire upload
7+
sequence locally on its UART and returns a phase-by-phase JSON
8+
summary; this module is the host-side adapter that:
9+
10+
1. Loads the SoC profile (PRESTEP0 / DDRSTEP0 / optional PRESTEP1 +
11+
load addresses).
12+
2. Detects the SPL boundary in the firmware blob with the same logic
13+
`HiSiliconStandard._send_spl` uses on the host path, then zeroes
14+
long 0xFF runs (cv500-family bootrom RX bug).
15+
3. Calls :meth:`RackController.fastboot` with the assembled bundle.
16+
4. Returns a :class:`RecoveryResult`-shaped object so callers can drop
17+
it into the same post-burn flow they use for `session.run()`.
18+
19+
The CLI's burn / install / agent-upload paths use this when
20+
``power_controller`` is a :class:`RackController`. Note that the pod
21+
takes exclusive UART access during the upload, so callers MUST NOT
22+
have a TCP client connected to ``tcp://<pod>:9000`` when this runs —
23+
open the transport only after this function returns.
24+
"""
25+
26+
from __future__ import annotations
27+
28+
import logging
29+
import time
30+
from pathlib import Path
31+
32+
from defib.power.rack import RackController
33+
from defib.profiles.loader import load_profile
34+
from defib.protocol.hisilicon_standard import HiSiliconStandard
35+
from defib.recovery.events import RecoveryResult, Stage
36+
37+
logger = logging.getLogger(__name__)
38+
39+
40+
async def run_rack_fastboot(
41+
rack: RackController,
42+
chip: str,
43+
firmware: bytes | str | Path,
44+
agent_payload: bytes | None = None,
45+
timeout: float = 180.0,
46+
) -> RecoveryResult:
47+
"""Run the pod-side SPL/DDR/U-Boot upload via ``POST /fastboot``.
48+
49+
Args:
50+
rack: configured :class:`RackController`.
51+
chip: SoC name (e.g. ``"hi3516ev300"``) for profile lookup.
52+
firmware: u-boot bytes (or path) used to derive the SPL portion
53+
and — if ``agent_payload`` is omitted — as the U-Boot blob
54+
loaded at ``profile.uboot_address``. Matches the host-path
55+
behaviour of ``defib burn``: the same binary contains both
56+
the SPL boundary and the U-Boot to run.
57+
agent_payload: optional override for the blob loaded at
58+
``profile.uboot_address``. Used by the agent-flash path,
59+
which sends ``u-boot.bin`` as SPL and the flash agent as
60+
U-Boot. ``None`` (the default) uses ``firmware`` for both.
61+
timeout: HTTP timeout for the fastboot POST.
62+
63+
Returns:
64+
:class:`RecoveryResult` with ``success`` / ``elapsed_ms`` /
65+
``error`` populated. ``stages_completed`` reflects the phases
66+
the pod reported reaching.
67+
"""
68+
firmware_bytes = (
69+
firmware if isinstance(firmware, (bytes, bytearray))
70+
else Path(firmware).read_bytes()
71+
)
72+
profile = load_profile(chip)
73+
74+
# Same SPL-boundary detection the host path uses — keep both paths
75+
# byte-identical so a chip that works on one works on the other.
76+
scan_buf = agent_payload if agent_payload is not None else firmware_bytes
77+
spl_size = HiSiliconStandard._detect_spl_size(
78+
scan_buf, profile.spl_max_size, sram_limit=profile.spl_sram_limit,
79+
)
80+
spl_bytes = firmware_bytes[:spl_size].ljust(spl_size, b"\x00")
81+
spl_bytes = HiSiliconStandard._zero_long_ff_runs(spl_bytes)
82+
uboot_bytes = agent_payload if agent_payload is not None else firmware_bytes
83+
84+
logger.info(
85+
"rack fastboot: spl=%d agent=%d profile=%s spl_addr=0x%x ddr_addr=0x%x uboot_addr=0x%x",
86+
len(spl_bytes), len(uboot_bytes), profile.name,
87+
profile.spl_address, profile.ddr_step_address, profile.uboot_address,
88+
)
89+
90+
t0 = time.monotonic()
91+
response = await rack.fastboot(
92+
spl_address=profile.spl_address,
93+
ddr_step_address=profile.ddr_step_address,
94+
uboot_address=profile.uboot_address,
95+
prestep0=profile.prestep_data or b"",
96+
ddrstep0=profile.ddr_step_data,
97+
prestep1=profile.prestep1_data,
98+
spl=spl_bytes,
99+
agent=uboot_bytes,
100+
timeout=timeout,
101+
)
102+
elapsed_ms = (time.monotonic() - t0) * 1000.0
103+
104+
stages: list[Stage] = []
105+
last_phase = str(response.get("last_phase", ""))
106+
# Pod's phase names map onto defib's Stage enum where they exist.
107+
if last_phase in ("frame_for_start", "prestep0", "ddrstep0", "prestep1", "spl", "agent", "done"):
108+
stages.append(Stage.HANDSHAKE)
109+
if last_phase in ("prestep0", "ddrstep0", "prestep1", "spl", "agent", "done"):
110+
stages.append(Stage.DDR_INIT)
111+
if last_phase in ("spl", "agent", "done"):
112+
stages.append(Stage.SPL)
113+
if last_phase in ("agent", "done"):
114+
stages.append(Stage.UBOOT)
115+
if last_phase == "done":
116+
stages.append(Stage.COMPLETE)
117+
118+
if response.get("success"):
119+
return RecoveryResult(
120+
success=True, stages_completed=stages, elapsed_ms=elapsed_ms,
121+
)
122+
123+
failed = response.get("failed_phase", "unknown")
124+
err = response.get("error", "unknown")
125+
return RecoveryResult(
126+
success=False, stages_completed=stages,
127+
error=f"rack fastboot failed at {failed}: {err}",
128+
elapsed_ms=elapsed_ms,
129+
)

0 commit comments

Comments
 (0)