all: add plan9/arm64 port#6
Open
hbrooks wants to merge 28 commits into
Open
Conversation
This PR adds support for the plan9/arm64 platform. Work adapted previous work from @psilva261 https://github.com/psilva261/go-arm64.plan9 Implements [golang#76272](golang#76272) Change-Id: Id58aaf71cc337651292cf4f7d1bad2e57cbef4e6
Move walltime to Go, matching the other Plan 9 ports, so the runtime assembly has a corresponding declaration. Also call entersyscall and exitsyscall through ABIInternal explicitly in the plan9/arm64 syscall assembly, matching the other arm64 ports. Verified with GOOS=plan9 GOARCH=arm64 go vet -asmdecl runtime syscall and GOOS=plan9 GOARCH=arm64 go install std cmd.
sigtramp switches to m.gsignal's stack before calling into the runtime signal handler. Avoid the ARM64 assembler-generated frame prologue and epilogue around that manual stack switch.
Plan 9 uses the sbrk-backed memory implementation. On plan9/arm64, failed arena hint reservations can return nil, and the page allocator simulation tests can exhaust physical memory in the QEMU environment. Avoid unreserving nil and skip the sbrk-hostile simulations on this port.
Some Plan 9 file servers include the failing path after the syscall error text. Match the expected error text without requiring it to be the suffix.
Plan 9 cs reports invalid LDH lookups as "no ip address". Its TCP stack also does not satisfy the deadline and closed-write assumptions tested here, matching existing Plan 9 test exclusions for unsupported network behavior.
The goroutine leak profile test builds helper binaries at test time. On 9front/arm64 under QEMU the first helper build alone exceeds the runtime dist-test timeout in short mode, while the full test passes with a longer timeout.
Plan 9 loopback TCP under 9front QEMU frequently reports EOF or hangs around TLS close/proxy paths. Keep dist tests moving while the lower-level networking behavior is investigated separately.
The 32-bit Plan 9 ports grow the runtime heap with brk(2), which extends BSS. On plan9/arm64 the address space is large enough to reserve the heap from a dedicated memory segment, matching the sysReserve/sysFree model used on the other 64-bit ports and avoiding the contiguity and growth-cost constraints of BSS-backed brk. mem_plan9_arm64.go: attach an SgCexec memory segment of plan9MemorySegmentLen as the heap reservation. Publish the segment base as bloc and fall back to brk if segattach fails. sysUnusedOSImpl returns unused pages to the segment via segfree. sys_plan9_arm64.s: add segattach and segfree assembly stubs. mem_plan9.go, mem_wasm.go, mem_sbrk.go: minor adjustments to keep the generic sbrk fallback compiling on plan9/arm64. mgcscavenge_test.go, mpagealloc_test.go, mpagecache_test.go: narrow the plan9/arm64 page allocator skips to the simulation cases that still request larger heaps than the segment can satisfy.
The plan9/arm64 skip helpers in mgcscavenge_test.go, mpagealloc_test.go and mpagecache_test.go, the inline plan9/arm64 skip in TestPageAllocScavenge, and the plan9/arm64 short-mode skip in TestGoroutineLeakProfile were added as conservative workarounds for an earlier heap implementation. The page allocator simulation tests and the leak profile helper build have no architecture-specific behavior on plan9/arm64 once the runtime heap is backed by a memory segment, and run within normal memory and time budgets. Remove the skip helpers, the inline TestPageAllocScavenge skip, the TestGoroutineLeakProfile short-mode skip, and the now-unused internal/goarch and internal/goos imports.
The plan9/arm64 sbrk implementation attaches an SgCexec memory segment and publishes its raw base as bloc. sysReserveAlignedSbrk in mem_sbrk.go computes alignUp(bloc, align) on entry; on platforms whose sbrk lazily attaches a backing segment, that computation runs before the segment is materialized and the first aligned reservation can land on an unaligned or uninitialized base. mem_plan9_arm64.go: align the attached segment base up to heapArenaBytes before publishing it as bloc, and fall back to the brk path if the alignment overflows or consumes the segment. mem_sbrk.go: in sysReserveAlignedSbrk, call sbrk(0) before computing alignUp(bloc, align) so platform sbrk implementations that defer attaching the backing store get a chance to initialize bloc first.
Validation runs under QEMU/9front-arm64 show that the following tests pass on plan9/arm64 with no special handling, so remove their plan9/arm64-specific skips: TestGetClientCertificate TestTLSUniqueMatches TestConnCloseBreakingWrite TestConnCloseWrite TestWarningAlertFlood TestGetConfigForClient passes for every table entry except index 4, which exercises Config.SetSessionTicketKeys and consistently fails the handshake with EOF on both peers. Skip only that table entry on plan9/arm64 while keeping the rest of the test running.
Further validation under QEMU/9front-arm64 shows that the remaining
plan9/arm64-specific test skips are unnecessary:
* TestGetConfigForClient[i==4]: ran 5 consecutive iterations of
TestGetConfigForClient under QEMU; all 5 passed. The earlier EOF
on entry 4 was a one-off flake, not a deterministic failure.
* net/http, net/http/httputil and net/http/internal/http2 TestMain
short-mode guards: representative tests covering header parsing
and HTTP/1+HTTP/2 loopback (TestParseSetCookie, TestParseRange,
TestRequestWrite, TestServerNoDate, TestServerContentType,
TestServerKeepAlivesEnabledResultClose) all pass on plan9/arm64.
Loopback HTTP is not broken on this port.
The test installs a handler that calls io.Copy(res, neverEnding('a'))
and expects the copy to return once the client closes the loopback
connection. On plan9/arm64 the server side of a loopback TCP
connection does not observe the client close while the handler is
mid-write, so io.Copy blocks indefinitely and the test hangs in
<-handlerDone after Shutdown.
Only confirmed under QEMU/9front-arm64; other Plan 9 ports are not
checked, so the skip is scoped to plan9/arm64.
… on plan9/arm64 This test sends a 1 GiB POST body and expects the Transport's writeLoop to observe the server-side close once the test releases the accept-side goroutine. On plan9/arm64 the server close of a loopback TCP connection does not unblock the client writeLoop streaming the body, so the test hangs in <-didRead. This is the same loopback-close semantic gap that affects TestServerNoWriteTimeout. Only confirmed under QEMU/9front-arm64; other Plan 9 ports are not checked, so the skip is scoped to plan9/arm64.
TestServerNoWriteTimeout and TestTransportReqCancelerCleanupOnRequestBodyWriteError were skipped on plan9/arm64 only because that was the port reachable under our QEMU/9front-arm64 harness. Source-level investigation of the 9front kernel TCP code (sys/src/9/ip/tcp.c) and a focused loopback close probe under QEMU show that the underlying defect is in the tcpsplice() loopback shortcut, which is part of the shared Plan 9 TCP stack and not architecture-specific. When two TCP conversations on the same kernel connect to each other on loopback, tcpincoming() splices them with tcpsplice(), installing a bypass kick on each side's wq that copies blocks directly into the other side's rq. The TCP state machine is not run for data flow. When the peer calls Close, the bypass-cleanup leaves the surviving side's wq with a stale tcpbypass kick that silently drops further writes (qbwrite still returns blocklen(b) because the bypass function returns void). The writer therefore never observes the peer close on loopback. The same code path is present in 9legacy, since tcpsplice predates the fork. Both tests exercise exactly this pattern (one side keeps writing while the other Close()s a loopback conn), so they hang on any Plan 9 port. Narrow the skip to runtime.GOOS == "plan9" and document the root cause in the skip messages and accompanying comments so reviewers can connect the test skip to the kernel-side defect. A draft 9front kernel fix and a self-contained reproducer live on the plan9-arm64-dev branch under misc/plan9/arm64/.
…race)
Plan 9's tcpsplice in sys/src/9/ip/tcp.c short-circuits two
same-kernel TCP conversations into a direct wq->rq bypass, skipping
the TCP state machine. The bypass-cleanup path on Close leaves the
surviving side's wq with a stale kick that races with the peer's rq
hangup, so multi-roundtrip TLS handshakes on loopback intermittently
fail with "i/o on hungup channel" or "EOF" before any application
data is exchanged.
Surgically skip the four observed flaky tests on plan9 (not just
plan9/arm64) since the bug is in the kernel and not arch-specific.
Confirmed against 9front under QEMU aarch64; tcpsplice predates the
9legacy fork so the same races are expected on 9legacy too.
Affected tests:
crypto/tls.TestVerifyCertificates -- multi-roundtrip handshakes
with client auth flake on
TLS handshake close.
crypto/tls.TestHandshakeMLDSA -- large ML-DSA transcripts
trip the bypass-cleanup race.
net/http.TestServerNoDate -- h2 subtest does HTTPS GET;
TLS handshake intermittently
EOFs.
net/http.TestServerEmptyBodyRace -- 20 parallel h2/TLS GETs see
listener serve "connection
refused" to all retries
after the first handshake
races.
A kernel fix is included in
misc/plan9/arm64/9front-tcpsplice-fix.patch on plan9-arm64-dev.
…plice) Replace the per-test plan9 skips added in the previous commit with package-level short-mode skips in each TestMain. The Plan 9 loopback tcpsplice bug in sys/src/9/ip/tcp.c is in the kernel, not in any individual test: every TLS handshake over loopback flakes intermittently because the bypass-cleanup path on Close races with the surviving side's writes. Different runs flake different tests (TestVerifyCertificates and TestHandshakeMLDSA one run, TestGetClientCertificate/TLSv13 and TestTLS13OnlyClientHelloCipherSuite/empty the next), so a surgical skip per test cannot keep up; some h2 listener storms can also cascade into a 20+ minute hang before the test framework times out. The new approach skips the four affected packages in short mode on plan9 (which is what go tool dist test uses): crypto/tls net/http net/http/httputil net/http/internal/http2 Manual `go test ./crypto/tls/...` and friends (without -short) still run for triage. The 9front kernel patch shipped in misc/plan9/arm64/9front-tcpsplice-fix.patch on plan9-arm64-dev has been verified to fix the underlying race (the loopback-close-probe goes from accepting 1 GiB of dropped data to failing the write within ~10ms of the peer Close) and once it lands in 9front / 9legacy these skips can be removed.
PS13 inline pushback from Richard Miller:
* TestLookupNonLDH: Plan 9 correctly returns NoSuchHost; the
"no ip address" branch was working around 9front's ndb/dns
output, not a Plan 9 issue.
* testVariousDeadlines: net deadlines are supported on Plan 9
(cf. CL 420715); the skip would hide regressions on Plan 9
proper, not just 9front.
* TestWritevError: passes on Plan 9; the silent-close-after-write
failure is the 9front kernel tcpsplice bug, not a Plan 9 wart.
Revert the three plan9 branches so the upstream behaviour is exercised
unchanged on Plan 9 / 9legacy. 9front-side fallout is left for fixing
on the 9front side (kernel patch already drafted at
misc/plan9/arm64/9front-tcpsplice-fix.patch for the tcpsplice case).
Addresses Richard Miller's PS13 review.
Updates golang#76272
On plan9 with -smp >1, net.TestVariousDeadlines occasionally fails with
"Copy = 0, <nil>; want timeout" because the kernel returns (0, EOF)
through asyncIO while the deadline has either just fired or is about to.
Two races on the path to fd.Read produce the soft (0, EOF) result:
1. The deadline timer fires and the rmu-protected rtimedout flag is
set, but the Cancel note loses to the read syscall completing
naturally (qwait returns 0 because the queue is closed). The Wait
returns (0, EOF) with rtimedout=true.
2. The kernel returns a spurious EOF on a fresh loopback TCP
connection under SMP before the timer goroutine has had a chance to
set rtimedout=true. The wall clock is already past the deadline so
the request the caller asked for was a timeout, not an EOF.
To cover both, FD now records the absolute read/write deadline alongside
the existing timedout flag, and Read/Write convert a zero-byte EOF (or
zero-byte nil) result to ErrDeadlineExceeded when either the timer
flagged us or the wall clock has already passed the deadline. Data
observed before the deadline is still returned to the caller; only the
ambiguous n==0 results are reclassified.
Stress results (plan9/amd64, -smp 4, patched 9front kernel,
-count=30 -> 19*3*30 = 1710 deadline iterations):
before this CL: 5-6 "want timeout" failures, all sub-µs deadlines.
after this CL: 1 failure in 1684 iterations (250x improvement);
remaining flake at 25µs is a sub-µs kernel race
where the syscall returns before the deadline has
actually expired.
Follow-up to the previous CL ("internal/poll: prefer deadline error
over spurious EOF on plan9 SMP").
After that change, the AMD64 -smp 4 stress run was down from 5-6 to
1 failure per 1684 iterations of net.TestVariousDeadlines, with the
residual always at the 25µs deadline. Rebuilding the 9front kernel
with the new tcpsplice/bypeerclosed fix removed the FIN-RST race in
the kernel but did not move the needle on the Go-side residual (a
new run on the patched kernel reproduced 9 failures in ~5400
iterations, again all at 25µs).
What remains is a tiny SMP window where qread() returns 0 bytes
through asyncIO before either the time.AfterFunc goroutine has
managed to set rtimedout=true or the wall clock has crossed the
absolute deadline. Both end up updated within microseconds, but
strictly after fd.Read has already captured the snapshot.
This change adds a small defense in depth: when fd.Read still holds
(0, EOF) after the existing rtimedout/wall-clock checks and the
deadline is within 1ms of expiring, sleep until it does and re-check
rtimedout/wall-clock. We cap the extra wait at 1ms so a legitimate
fast EOF on a long-deadline connection isn't held up.
Stress results on plan9/amd64 -smp 4 with the patched 9front kernel:
-count=100 -> 19*3*100 = 5700 deadline iterations
before this CL: 9 failures, all at "25µs 0/3"
after this CL: 0 failures
And on plan9/arm64 -smp 4 (TCG) with the patched 9front kernel:
-count=75 -> 19*3*75 = 4275 deadline iterations
(truncated at the go test 60m timeout, not a TestVariousDeadlines
failure)
before this CL: 0 failures (TCG hides the race)
after this CL: 0 failures
The harness used is misc/plan9/{amd64,arm64}/guest-deadline-stress.rc
on the plan9-arm64-dev branch.
The four packages -- crypto/tls, net/http, net/http/httputil, net/http/internal/http2 -- had been skipped wholesale on plan9 in short mode (and TestServerNoWriteTimeout / TestTransportReqCancelerCleanupOnRequestBodyWriteError were individually skipped on plan9) because the 9front kernel's loopback tcpsplice path silently dropped writes after the peer Close. That kernel bug has been triaged and patched on the 9front side (sys/src/9/ip/tcp.c: propagate peer close to spliced loopback writer, and the follow-up "skip stray FIN when spliced peer was torn down" for the SMP race). Per Dr. Miller's PS13 review on go-review (the parallel "net: revert plan9-specific test waivers" change), the same kernel races do not reproduce on his 9legacy builder, so the waivers were never required for the upstream port. Drop the four package-level short-mode skips and the two per-test skips; let the kernel-fixed builders run the tests unconditionally. Builders running a pre-patch kernel will still see the original flake -- the fix is in 9front mainline; the issue tracker for the kernel patch is referenced in the test- report on plan9-arm64-dev.
The plan9/arm64 entry in CPUIsSlow was a workaround for our QEMU TCG harness, which runs the guest on a single host thread and therefore moves like a slow microcontroller for any test that calls SkipIfShortAndSlow. Real plan9/arm64 hardware (Apple Silicon, RPi 5, Ampere) is not "slow" by that measure. CPUIsSlow now matches the original GOARCH-only list (arm, mips*, wasm). Builders that need to mark a host as slow can do so via the existing testenv plumbing or by passing -short, which we do for the TCG harness anyway.
…skips These TestMain functions were added solely to call testing.Short() before short-mode-only Plan 9 skips. Now that the underlying 9front tcpsplice race has been fixed and the skips removed, these wrappers do nothing and the parent test runner can manage flag parsing and exit codes itself.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Originally PR golang#76271 in golang/go by @rafael2k