Skip to content

RP2040: avoid XIP hangs during flash operations with scheduler=cores#5411

Open
rdon-key wants to merge 7 commits into
tinygo-org:devfrom
rdon-key:rp2040-flashsafe-section
Open

RP2040: avoid XIP hangs during flash operations with scheduler=cores#5411
rdon-key wants to merge 7 commits into
tinygo-org:devfrom
rdon-key:rp2040-flashsafe-section

Conversation

@rdon-key
Copy link
Copy Markdown
Contributor

@rdon-key rdon-key commented May 20, 2026

Fixes #5288

RP2040 flash program / erase / command operations temporarily disable XIP.
With scheduler=cores, the other core may continue executing instructions from XIP flash during that window, which can cause a system hang.

What was done

This change adds an RP2040-specific flash-safe section.

Runtime changes:

  • For scheduler=cores, send a SIO FIFO command to the other core before starting a flash operation.
  • The interrupted core enters a RAM-resident flash-safe handler and waits there until the flash operation is complete.
  • The core performing the flash operation waits until the other core has entered the handler before disabling XIP.
  • For non-cores schedulers, keep the existing local interrupt disable / restore behavior.
  • Add a build-only RP2350 stub handler so shared RP2 runtime code continues to compile without changing RP2350 behavior.

Machine changes:

  • Wrap flash_range_write
  • Wrap flash_erase_blocks
  • Wrap flash_do_cmd

These flash operations now run inside rp2040EnterFlashSafeSection / rp2040ExitFlashSafeSection.

Notes

This is intentionally limited to RP2040 flash operations that temporarily disable XIP.
It is not intended to be a general multicore lock.

Other shared peripherals should be protected by their own ownership or locking rules.

RP2350 behavior is intentionally left unchanged.
The RP2350 handler added here is only a build-only stub for shared RP2 runtime code.

If the monitor output becomes corrupted under scheduler=cores, for example:

sta100
multi-core scheduler:
slected flash : 0
seedflashs0
run0oe: at
p e
ease re

please also test this reproducer with pull request #5391:

https://github.com/tinygo-org/tinygo/pull/5391

That output corruption appears to be a separate USB CDC multicore output issue, and it can make the flash test result difficult to read.

Reproducer

Warning: Flash memory has a limited number of program/erase cycles. This reproducer repeatedly erases and writes flash, so run it only when needed.

main.go
//go:build tinygo && rp2040

package main

import (
	"machine"
	"runtime"
	"sync/atomic"
	"time"
	"unsafe"
)

const maxRounds uint32 = 100
const writeSize = 4096

const sioCPUID = uintptr(0xd0000000)

func coreID() uint32 {
	return *(*uint32)(unsafe.Pointer(sioCPUID))
}

func alignDown(v uintptr, align uintptr) uintptr {
	return v & ^(align - 1)
}

func roundUp(v int64, align int64) int64 {
	return (v + align - 1) &^ (align - 1)
}

func safeFlashOffset(round uint32) int64 {
	start := uintptr(machine.FlashDataStart())
	end := uintptr(machine.FlashDataEnd())
	eraseSize := uintptr(machine.Flash.EraseBlockSize())
	testSpan := uintptr(roundUp(writeSize, int64(eraseSize)))

	if eraseSize == 0 || eraseSize&(eraseSize-1) != 0 {
		println("invalid erase block size")
		for {
		}
	}

	if end <= start || end-start < testSpan {
		println("not enough writable flash data area")
		for {
		}
	}

	// Compute the number of usable erase-aligned regions and rotate the offset
	// per round so that we don't repeatedly erase/program the same sector.
	// main rotates DOWN from the top of the data region.
	lastAligned := alignDown(end-testSpan, eraseSize)
	available := (lastAligned-start)/eraseSize + 1

	rotated := uintptr(round) % available
	absOffset := lastAligned - rotated*eraseSize
	relOffset := absOffset - start

	println("selected flash abs offset:", absOffset)
	println("selected flash rel offset:", relOffset)

	return int64(relOffset)
}

var writeBuf [writeSize]byte
var readBuf [writeSize]byte

// Worker observability.
//   workerDone:   0 = continue, 1 = stop requested, 2 = stopped
//   workerCore:   99 = not started, otherwise the core id worker is running on
//   workerWrites: number of successful flash writes performed by the worker
var (
	workerDone   uint32
	workerCore   uint32 = 99
	workerWrites uint32
)

// Worker writes to a fixed low offset, well separated from main's rotating
// range (main covers the top sectors of the data region). The two never write
// to the same sector during a single test run.
const workerWriteOffset = int64(0)

var workerWriteBuf [writeSize]byte

// Exactly 256 bytes: 4 lines x 64 chars.
const chunk256 = "" +
	"0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef" +
	"fedcba9876543210fedcba9876543210fedcba9876543210fedcba9876543210" +
	"00112233445566778899aabbccddeeffffeeddccbbaa99887766554433221100" +
	"rp2040xipcacheflashworkerrandomaccesstestdataAAAAAAAAAAAAAAAAAAA"

// 4096 bytes (4 KB).
const block4k = chunk256 + chunk256 + chunk256 + chunk256 +
	chunk256 + chunk256 + chunk256 + chunk256 +
	chunk256 + chunk256 + chunk256 + chunk256 +
	chunk256 + chunk256 + chunk256 + chunk256

// 65536 bytes (64 KB). Large enough to overflow the 16 KB XIP cache on RP2040.
const flashData = block4k + block4k + block4k + block4k +
	block4k + block4k + block4k + block4k +
	block4k + block4k + block4k + block4k +
	block4k + block4k + block4k + block4k

func fillWriteBuffer(round uint32) {
	seed := byte(0x5a ^ byte(round*37))
	for i := range writeBuf {
		writeBuf[i] = byte(i) ^ seed
	}
}

func readFlash(label string, off int64) {
	for i := range readBuf {
		readBuf[i] = 0
	}
	n, err := machine.Flash.ReadAt(readBuf[:], off)
	if err != nil {
		println(label, "readback error:", err.Error())
		for {
		}
	}
	if n != len(readBuf) {
		println(label, "readback length mismatch:", n)
		for {
		}
	}
}

func verifyWrite(round uint32) {
	for i := range readBuf {
		if readBuf[i] != writeBuf[i] {
			println("write readback mismatch round:", round, "index:", i, "got:", readBuf[i], "want:", writeBuf[i])
			for {
			}
		}
	}
	println("write readback ok")
}

func verifyErase(round uint32) {
	for i := range readBuf {
		if readBuf[i] != 0xff {
			println("erase readback mismatch round:", round, "index:", i, "got:", readBuf[i])
			for {
			}
		}
	}
	println("erase readback ok")
}

func eraseFlash(label string, off int64) {
	eraseSize := machine.Flash.EraseBlockSize()
	eraseStart := off / eraseSize
	eraseBlocks := roundUp(writeSize, eraseSize) / eraseSize

	println(label, "erase start")
	err := machine.Flash.EraseBlocks(eraseStart, eraseBlocks)
	if err != nil {
		println(label, "erase error:", err.Error())
		for {
		}
	}
	println(label, "erase done")
}

// worker exercises the system from the second core:
//   1. Continuous random reads from a 64 KB flash-resident const array.
//      This defeats the 16 KB XIP cache and produces sustained flash
//      instruction/data traffic, which is required to make the bug
//      observable on stock TinyGo.
//   2. Periodic flash writes to a fixed low offset. This forces the
//      worker and main to race for rp2040EnterFlashSafeSection, which
//      is what the flash-safe spinlock fix is supposed to serialize.
func worker() {
	core := coreID()
	atomic.StoreUint32(&workerCore, core)
	println("worker started on core:", core)

	led := machine.LED
	led.Configure(machine.PinConfig{Mode: machine.PinOutput})

	// Distinct pattern so the worker's flash content differs from main's.
	for i := range workerWriteBuf {
		workerWriteBuf[i] = byte(i) ^ 0xa5
	}

	eraseSize := machine.Flash.EraseBlockSize()

	x := uint32(0x12345678)
	sum := uint32(0)
	writes := uint32(0)

	for atomic.LoadUint32(&workerDone) == 0 {
		// Hammer the XIP cache with random reads.
		for n := 0; n < 8192; n++ {
			x = x*1664525 + 1013904223
			i := int(x % uint32(len(flashData)))
			sum += uint32(flashData[i])
		}
		led.Set((sum & 0x800000) != 0)

		// Periodically write to flash. (sum & 0xff) == 0 fires roughly
		// once every couple hundred outer iterations -> enough overlap
		// chance with main's flash ops without burning the sector too
		// hard.
		if (sum & 0xff) == 0 {
			if err := machine.Flash.EraseBlocks(workerWriteOffset/eraseSize, 1); err == nil {
				if _, err := machine.Flash.WriteAt(workerWriteBuf[:], workerWriteOffset); err == nil {
					writes++
					atomic.StoreUint32(&workerWrites, writes)
				}
			}
		}
	}

	led.Low()
	atomic.StoreUint32(&workerDone, 2)
}

func main() {
	time.Sleep(2 * time.Second)

	numCPU := runtime.NumCPU()
	workerStarted := numCPU > 1

	println("start")
	println("NumCPU:", numCPU)
	println("main core:", coreID())
	println("flashData size:", len(flashData))
	println("max rounds:", maxRounds)

	if workerStarted {
		println("multi-core scheduler: worker may run concurrently (with concurrent flash writes)")
	} else {
		println("single-core scheduler: worker will not be started")
	}

	// Sanity-check sizes at runtime as well.
	if len(chunk256) != 256 {
		println("WARN: chunk256 size unexpected:", len(chunk256))
	}
	if len(flashData) != 65536 {
		println("WARN: flashData size unexpected:", len(flashData))
	}

	for round := uint32(0); round < maxRounds; round++ {
		fillWriteBuffer(round)
	}

	if workerStarted {
		go worker()
	}

	for round := uint32(0); round < maxRounds; round++ {
		flashOffset := safeFlashOffset(round)
		fillWriteBuffer(round)

		println("round:", round,
			"main core:", coreID(),
			"worker core:", atomic.LoadUint32(&workerCore),
			"worker writes:", atomic.LoadUint32(&workerWrites),
		)

		eraseFlash("prepare", flashOffset)
		readFlash("prepare", flashOffset)
		verifyErase(round)

		println("write start")
		n, err := machine.Flash.WriteAt(writeBuf[:], flashOffset)
		if err != nil {
			println("write error:", err.Error())
			for {
			}
		}
		println("write done:", n)
		if n != len(writeBuf) {
			println("write length mismatch:", n)
			for {
			}
		}

		readFlash("write", flashOffset)
		verifyWrite(round)

		eraseFlash("final", flashOffset)
		readFlash("final", flashOffset)
		verifyErase(round)
	}

	if workerStarted {
		println("requesting worker stop")
		atomic.StoreUint32(&workerDone, 1)

		const maxSpin = 100_000_000
		spun := 0
		for atomic.LoadUint32(&workerDone) != 2 && spun < maxSpin {
			spun++
		}

		if atomic.LoadUint32(&workerDone) == 2 {
			println("worker stopped cleanly")
		} else {
			println("worker did not stop within bound")
		}

		println("worker core:", atomic.LoadUint32(&workerCore))
		println("worker writes:", atomic.LoadUint32(&workerWrites))
	} else {
		println("worker was not started")
	}

	println("test finished")

	// Pure busy loop: keep main pinned to its current core.
	for {
	}
}

Test results

Tested on RP2040/Pico.

With -scheduler=cores (the failing case before this PR)

Reproducer: a worker goroutine on core 1 performs continuous random reads
from a 64 KB flash-resident const slice (which defeats the 16 KB XIP cache)
and periodic flash writes to a fixed low offset. Main on core 0 does
100 rounds of erase / write / read-back / verify on the top sectors of the
data region.

Before this PR: hangs at the first write start on core 0.

After this PR:

$ tinygo flash -target=pico -scheduler=cores -monitor main.go
start
NumCPU: 2
main core: 0
flashData size: 65536
max rounds: 100
multi-core scheduler: worker may run concurrently (with concurrent flash writes)
selected flash abs offset: 0x101ff000
worker started on core: 1
selected flash rel offset: 0x001eb000
round: 0 main core: 0 worker core: 1 worker writes: 0
prepare erase start
prepare erase done
erase readback ok
write start
write done: 4096
write readback ok
final erase start
final erase done
erase readback ok
...
round: 50 main core: 0 worker core: 1 worker writes: 79
prepare erase start
prepare erase done
erase readback ok
write start
write done: 4096
write readback ok
final erase start
final erase done
erase readback ok
...
round: 99 main core: 0 worker core: 1 worker writes: 155
prepare erase start
prepare erase done
erase readback ok
write start
write done: 4096
write readback ok
final erase start
final erase done
erase readback ok
requesting worker stop
worker stopped cleanly
worker core: 1
worker writes: 156
test finished

Evidence:

  • worker started on core: 1 and worker core: 1 in every round header
    confirm the test condition (worker is actually running on core 1, not 0).
  • worker writes: N grows from 0 to 156, meaning the worker successfully
    performed 156 flash writes concurrently with main's flash ops. Each of
    those is a moment when both cores were contending for the flash-safe
    section, exercising the spinlock added in this PR.
  • All 100 rounds of main's erase / write / read-back complete without
    any mismatch or error, so the cross-core lockout protocol did not
    corrupt either core's data.
  • worker stopped cleanly confirms no deadlock at shutdown.

With -scheduler=tasks (non-regression check)

$ tinygo flash -target=pico -scheduler=tasks -monitor main.go
start
NumCPU: 1
main core: 0
flashData size: 65536
max rounds: 100
single-core scheduler: worker will not be started
selected flash abs offset: 0x101ff000
selected flash rel offset: 0x001fb000
round: 0 main core: 0 worker core: 99 worker writes: 0
prepare erase start
prepare erase done
erase readback ok
write start
write done: 4096
write readback ok
final erase start
final erase done
erase readback ok
...
round: 99 main core: 0 worker core: 99 worker writes: 0
prepare erase start
prepare erase done
erase readback ok
write start
write done: 4096
write readback ok
final erase start
final erase done
erase readback ok
worker was not started
test finished

Single-core builds go through runtime_rp2040_flashsafe_single.go, which
is a plain interrupt.Disable() / interrupt.Restore() pair. 100 rounds
complete with no regression. worker core: 99 is the sentinel value
meaning the worker goroutine was never started (NumCPU == 1).

Symbol placement (//go:section .ramfuncs)

$ llvm-nm test.elf | grep rp2FlashSafe
10003314 t __Thumbv6MABSLongThunk_runtime.rp2FlashSafeInterruptHandler
20001184 t runtime.rp2FlashSafeInterruptHandler

rp2FlashSafeInterruptHandler is placed at 0x20001184 (RP2040 SRAM),
not in the XIP-mapped flash region. The flash-side symbol is the long-branch
thunk LLVM auto-generates for Cortex-M0; it is fetched while XIP is still
enabled, so it is safe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RP2040: is cross-core synchronization unnecessary in existing flash operations?

1 participant