[UNMAINTAINED] feat(vnc): add VNC server with H.264#1447
Conversation
Replace the WebRTC-specific VideoStart/VideoStop calls with a named- consumer refcount. The first acquire starts the native pipeline and pauses the HDMI sleep ticker; the last release stops it. WebRTC is the only consumer today, so behavior is unchanged. This sets up the infrastructure for additional consumers (RTSP, VNC) to share the capture pipeline without each duplicating its own start/stop logic or fighting over the sleep timer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure-Go RFB 3.8 protocol library, no JetKVM dependencies. Includes the version handshake, security negotiation, VNCAuth (DES challenge), ClientInit/ServerInit, all client-message decoders, and Raw + OpenH264 (pseudo-encoding 50) + DesktopSize encoding writers. OpenH264 follows the wire format from rfbproto PR jetkvm#39 and TigerVNC PR jetkvm#1194 — payload is U32 length, U32 flags, then an Annex-B H.264 bytestream. Compatible with TigerVNC ≥ 1.13.0 built with FFmpeg. Tests cover the handshake byte sequence, VNCAuth match against a DES-derived reference, message decoder round-trips, and encoding writers byte-for-byte. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Translates X11 keysyms (carried by RFB KeyEvent) to USB HID Usage IDs
(consumed by JetKVM's USB gadget). Covers ASCII printable, F1–F24, all
modifiers, navigation cluster, full keypad, lock keys, Print/Pause/Menu.
Shifted ASCII keysyms ('!', '@', etc.) map to the same scancode as
their unshifted counterpart; clients send Shift_L/R presses
separately, and the host applies the modifier itself. This matches
TigerVNC and PiKVM behaviour.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Renders a "Please use TigerVNC >= 1.13.0 with H.264 support" message on a dark blue background, sized to the framebuffer dimensions. Used as a Raw rectangle (encoding 0) when a client connects but doesn't advertise OpenH264 (encoding 50) in its SetEncodings list. Uses x/image/font/basicfont (Face7x13) — adds golang.org/x/image to go.mod, pinned to v0.18.0 for Go 1.24 compatibility. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add VncEnabled, VncPort (default 5900), VncPassword, VncAllowOverWAN fields to Config plus JSON-RPC getters/setters and a vncLogger. The new vnc.go exposes Start/StopVNCServer entry points that subsequent commits fill in; today they just log. The VncPassword setter accepts a "********" sentinel meaning "leave unchanged", so the frontend can submit the form without re-typing the password and never round-trips the plaintext value back from the server. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fills out the VNC scaffold from the previous commit into a working server. Clients connect, authenticate (VNCAuth, or none if no password is set), receive H.264 frames as RFB encoding-50 rectangles, and drive the host's USB HID keyboard and mouse. Wiring: - native.go's frame callback now also calls vncServer.WriteFrame(), which fans the encoded frame out to per-client bounded channels (cap 2; non-blocking sends drop on full so slow clients can't back-pressure WebRTC). - OnVideoStateChange notifies VNC clients of resolution changes; the dispatcher emits a DesktopSize pseudo-encoding update plus the encoding-50 reset-decoder bit on the next outgoing rect. - main.go starts the VNC server when config.VncEnabled. - webrtc.go's resolveCodec forces H.264 when VNC is enabled, so a Chrome session can't renegotiate to H.265 mid-stream and desync any TigerVNC clients (encoding 50 is H.264-only). Protocol fidelity: - Cached SPS/PPS are prepended to the first encoding-50 rectangle a new client sees, so its decoder is primed before the next IDR arrives. We can't ask the encoder for an immediate keyframe, so non-IDR frames are dropped at the dispatcher until the natural IDR arrives (typically <= 1-2 s). - Clients that don't advertise OpenH264 (encoding 50) get a Raw rectangle with the "please use TigerVNC >= 1.13.0" placeholder. - VncAllowOverWAN gates non-loopback binding so a casually-deployed device doesn't expose VNCAuth on the public internet. splitAnnexB() is duplicated from PR jetkvm#1329's rtsp.go; if that PR lands first, follow up with a consolidation commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a /settings/vnc route with toggle, port, password, and allow-over-WAN switch, wired to the JSON-RPC getVNCConfig / setVNCConfig methods registered in the previous commit. Includes a help banner pointing users to TigerVNC >= 1.13.0 with H.264 support for video. Localization keys are added to all 14 language files; only the English values are localized, the rest hold the same English text and need translation in a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The flag's failure mode was opaque: if a user enabled VNC without also flipping VncAllowOverWAN, the JSON-RPC call returned an error that surfaced as a generic "internal error" toast in the frontend with no actionable hint. Two settings to enable one feature is also a poor experience. Drop the field everywhere. VNCAuth's well-known weaknesses are now documented in the VncPassword godoc and the help banner on the settings page already points users to TigerVNC + tunneling for untrusted networks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When VNC was the first consumer of the video stream (no browser open when the first VNC client connected), acquireVideoStream() called VideoStart() without first calling VideoSetCodecType, so the codec stayed at whatever the previous WebRTC session had left it. If a browser had negotiated H.265 before disconnecting, VNC then shipped H.265 NALs over OpenH264 (encoding 50), which the spec defines as H.264-only — clients silently fail to decode. Split acquireVideoStream into a thin wrapper plus acquireVideoStreamWithCodec(consumer, codecType). VNC always passes 0 (H.264). WebRTC passes the codec it negotiated. VideoSetCodecType runs only when the caller is the first consumer (i.e. about to call VideoStart), preserving the "must not be called mid-stream" invariant. Also log the client's full SetEncodings list (info-level) when a client connects, so it is obvious from the logs whether TigerVNC is actually advertising encoding 50. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
If a browser tab negotiated H.265 before VNC was toggled on, the capture pipeline keeps emitting H.265 NALs. resolveCodec only forces H.264 for *new* WebRTC sessions; the in-flight one is not disturbed. So when a TigerVNC client subsequently connected, it received H.265 NALs marked as OpenH264 (encoding 50), which the spec defines as H.264-only — the client's decoder failed silently and the user kept seeing the (already-rendered) placeholder. After enabling VNC via JSON-RPC, check the running codec; if it is not H.264, cycle the pipeline (VideoStop → SetCodecType(0) → VideoStart). The active WebRTC session will see frames briefly disappear and then need to refresh — the warning log says so. Also adds trace-level diagnostics so the user can see what is happening with `JETKVM_LOG_TRACE=vnc`: - codec value at the moment a VNC client connects - dropped non-IDR frames during the keyframe wait - rect emissions (OpenH264 size/flags or Raw placeholder) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three issues from Cursor Bugbot's review on commit 1d41410, plus the golangci-lint failures from the same CI run. 1. Deadlock in writeUpdate (HIGH severity). Conn.LockWrite acquired writeMu, then Conn.Flush at the end re-acquired the same non-reentrant mutex. Every VNC client hung on its first FramebufferUpdate; this is most likely the actual cause of the placeholder-forever symptom seen on-device. The intended pattern was a multi-call composite write with a private flushLocked variant — but in practice we have a strict single-writer model: serveClient drives the handshake on its own goroutine, then the dispatcher goroutine is the only writer afterward. The reader goroutine never writes. So the mutex adds no safety, only a deadlock. Strip writeMu, LockWrite, UnlockWrite, and flushLocked. Document the single-writer requirement on Conn. 2. SetEncodings allocated on the wire-supplied count (LOW severity). make([]EncodingType, count) was allocating up to 65 535 * 4 bytes regardless of the maxEncodings cap (256). Allocate based on the capped n; the loop still drains the full count from the wire so framing for the next message stays aligned. 3. Save button hidden when VNC was disabled in the UI (MEDIUM severity). The button lived inside the {settings.enabled && (...)} block, so unchecking the toggle hid the button — the disabled state could not be persisted. Move the button outside the conditional. Linter follow-ups (golangci-lint v2.4 in CI): - goimports re-aligned const blocks in keysym.go and placeholder.go. - Removed the unused acquireVideoStream wrapper now that all callers pass an explicit codec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The kvm-package-level vnc_h264.go and vnc_h264_test.go were the only tests in package kvm, and that package transitively imports internal/native — which links libjknative.a, only built for ARM. Running 'go test ./...' on the CI's x86_64 Linux host therefore failed at link time as soon as a test file made package kvm participate in the test build. Move the H.264 Annex-B splitter (and its tests) into internal/rfb, where it logically belongs: the encoding-50 path is the only consumer, and the rfb package has no platform-specific imports so it tests cleanly on any host. Export it as rfb.SplitAnnexB. Drop the trivial scaleCoord/rising helpers' tests with this; the functions are five lines each and exercised end-to-end on the device. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
c88a6e1 to
a816f53
Compare
Two more bugs from Cursor Bugbot's review on a816f53. 1. Right-click became middle-click on the host. RFB (RFC 6143 §7.5.5) and USB HID disagree on the order of bits 1 and 2 in the mouse-button byte: RFB layout: bit 0 = left, bit 1 = MIDDLE, bit 2 = RIGHT USB HID: bit 0 = left, bit 1 = RIGHT, bit 2 = MIDDLE handlePointerEvent was passing the RFB mask straight to rpcAbsMouseReport with only the wheel pseudo-bits stripped, so a VNC client's right-click reached the host as a middle-click and vice versa. Add rfb.PointerMaskToHIDButtons that does the bit 1<->2 swap and drops wheel bits, with a unit test for the obvious combinations. 2. Data race on the vncServer global. StartVNCServer / StopVNCServer wrote vncServer under vncServerMu, but the OnVideoStateChange and OnVideoFrameReceived callbacks in native.go read it without taking the lock. Even though the nil-receiver methods were safe, the read itself is undefined behaviour under Go's memory model and the race detector would flag it on every frame. Convert the global to atomic.Pointer[VNCServer]: hot-path readers use Load() (no locking, no allocations); the start/stop pair keeps a serialising mutex so a config change can't half-create or half-destroy the server, and uses Store / Swap to publish the pointer atomically. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Logs the raw keysym, down flag, mapped HID code, and whether the keysym was in our table — under TRC level so it only fires with JETKVM_LOG_TRACE=vnc. Motivated by an on-device report from a macOS TightVNC user where left Cmd reached the host as Option, left Option was not received, and Caps Lock didn't toggle. Mac VNC clients vary widely in how they map modifiers to X11 keysyms; without seeing the actual keysyms on the wire it is impossible to tell whether the client mapping is wrong, our table is wrong, or macOS itself is intercepting the key. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirror what we did for KeyEvent: log the raw RFB button-mask, the pointer position, and the resulting HID button byte at trace level on every PointerEvent. Lets a user grep the device log to discover exactly what their client sends when they click the side buttons on a Logitech-style mouse. Mouse buttons 4 and 5 (back/forward) have no canonical RFB representation — the 8-bit button-mask reserves bits 0..2 for primary buttons and 3..6 for wheel emulation. Some clients send side buttons on bit 7 or via the Extended PointerEvent pseudo- encoding (not implemented here). Wire up the actual bits once we see what a real client emits. The JetKVM USB gadget descriptor already advertises buttons 1..5 (Usage Maximum 5), so the host side will accept buttons 4 and 5 as soon as we forward them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…logging) User reported visual artifacts during scrolling on the VNC stream that don't appear in the WebRTC stream. Most likely cause: my non-blocking per-client frame channel (cap 2) drops H.264 frames under brief TCP backpressure when the encoder produces large P-frames. Each dropped P-frame breaks the decoder's prediction chain and stays broken until the next IDR (1–2 s later). WebRTC doesn't show this because pion's RTP path runs over UDP and doesn't queue at the application layer the same way. Three changes, all aimed at making the TCP path keep up: 1. SetNoDelay(true) on accepted connections. The frame is sent as a composite of small writes (FBU header + rect header + length + flags + payload); Nagle would coalesce them with up to 40 ms of added latency, which is enough to overflow my channel during high-motion frames. 2. Per-client frame buffer cap raised from 2 to 8 — enough headroom for ~130 ms of TCP write stalls at 60 fps without dropping. 3. A WARN log every ~60th drop so the user can confirm the channel is overflowing if artifacts persist, plus the cumulative dropped count on disconnect. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Firing on every FramebufferUpdate is one log line per frame at 60 fps per VNC client; with JETKVM_LOG_TRACE=vnc enabled, that floods the device log so badly that the actually-useful traces (key event, pointer event, dropped frames) get pushed off. The information was only ever useful for the original "is VNC even sending rects?" diagnosis, which is no longer in question. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Trace logs from a user on macOS TightVNC showed the client emits non-standard keysyms for Cmd / Option: Mac key keysym meaning on Linux ---------- ---------------------- -------------------- Left Cmd Alt_L (e9) Left Alt Right Cmd Super_L (eb) Left Super Left Option Mode_switch (7e) AltGr / unmapped Right Option ISO_Level3_Shift (03) AltGr (works) So in the default Linux-convention table: - Left Cmd reaches the host as Left Option, - Right Cmd reaches the host as Left Cmd, - Left Option doesn't reach the host at all. Globally remapping these keysyms would break Linux/Windows clients that send Alt_L for the actual Alt key. Add a per-server keymap config option instead: vnc_keymap = "default" (or empty) -- current behaviour vnc_keymap = "macos" -- TightVNC-on-Mac overrides The macos overrides are applied before the default table lookup, covering only the three keysyms we have evidence for. Other Mac VNC clients (Apple Screen Sharing, Chicken, TigerVNC's Mac viewer) may need additional entries — extend the override map as those traces come in. UI: dropdown on the VNC settings panel; localization keys added in English to all 14 locale files (translations are a follow-up). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TigerVNC's pseudo-encoding -316 widens the PointerEvent button-mask
from 7 bits to 15 bits, which is what carries Logitech-style back /
forward side buttons (X buttons 8 and 9 → HID buttons 4 and 5).
Source of truth for the wire format: TigerVNC's CMsgWriter.cxx /
SMsgReader.cxx / SMsgWriter.cxx. The extension works as follows:
1. Client advertises -316 in SetEncodings.
2. Server confirms by sending a fake FramebufferUpdate with a single
0x0 rectangle at (0,0) encoded as -316.
3. The PointerEvent message gains a marker bit at bit 7 of the byte
that follows the message type. Marker = 0 keeps the legacy 6-byte
format; marker = 1 selects the extended 7-byte form, where the
last byte holds RFB mask bits 7..14.
Server-side changes:
- rfb.PointerEventMessage.ButtonMask widened to uint16; parser
reads the trailing high byte when SetExtendedMouseButtons(true)
has been called and the marker is set. Without negotiation, bit 7
is treated as reserved (ignored), matching TigerVNC's writer
which always clears it on legacy frames.
- rfb.PointerMaskToHIDButtons routes RFB bit 7 (back) → HID bit 3
and RFB bit 8 (forward) → HID bit 4. The JetKVM USB gadget
descriptor already advertises Buttons 1..5, so the host sees
them as soon as we forward.
- vnc_conn detects the encoding in SetEncodings, queues an
announce rect on the next FramebufferUpdate, and flips the
rfb.Conn into extended mode so subsequent PointerEvents are
parsed in the new format.
Tests cover the legacy/extended parser switch, marker-bit
boundary handling, the announce rect's wire bytes, and the
back/forward → HID translation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| // wire format. Set by SetExtendedMouseButtons after the server | ||
| // has sent the announce rectangle. Single-writer model also | ||
| // applies to this flag. | ||
| extendedMouseButtons bool |
There was a problem hiding this comment.
Data race on extendedMouseButtons between goroutines
Medium Severity
The extendedMouseButtons field on rfb.Conn is a plain bool written by the dispatch goroutine via SetExtendedMouseButtons(true) and read concurrently by the read goroutine in ReadClientMessage. This is a data race under Go's memory model — no synchronization (mutex, atomic, or channel) ensures the reader sees the updated value. If the reader sees a stale false, it will parse an extended 7-byte PointerEvent as the legacy 6-byte format, consuming one byte too few and permanently misaligning all subsequent message parsing on that connection.
Additional Locations (2)
Reviewed by Cursor Bugbot for commit ba5cd0a. Configure here.
Three small tweaks aimed at the latency / artifact gap a user is
seeing between VNC and WebRTC. None of them is a silver bullet —
RFB is TCP-only and that's the architectural ceiling — but each one
removes a low-grade source of pain:
- rfb.NewConn now sizes its bufio.Writer at 64 KB instead of the
default 4 KB. With NoDelay enabled the dispatcher's flush
boundary becomes the kernel's TCP write boundary, so a 50 KB
P-frame previously turned into ~12 syscalls / TCP segments.
Most P-frames now go out in one Write.
- serveClient bumps SO_SNDBUF to 4 MB on the accepted TCP conn so
a bursty I-frame doesn't immediately backpressure the dispatcher
against a default ~64 KB kernel buffer.
- The first-frame SPS/PPS prepend now only fires when the IDR
*doesn't* already include parameter sets. Most modern H.264
encoders emit SPS+PPS+IDR in a single access unit, so the
previous unconditional prepend put SPS/PPS on the wire twice.
Most decoders tolerate it but it's still an unusual stream
shape — easier to debug if we don't generate it. Includes a
helper frameHasParameterSets that walks the NALs once.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When my non-blocking channel send drops an H.264 P-frame, the
client's decoder loses the prediction chain and produces blocky
artifacts until the next IDR — typically ~1 s of GOP. The user's
recording confirmed this: smear/blocks during scroll that heal
exactly at GOP boundaries.
Today, after a drop the dispatcher kept feeding the client every
subsequent P-frame, all of which the decoder was already incapable
of decoding. That extends the visible glitch unnecessarily and
keeps the TCP send buffer full while doing nothing useful.
Convert waitingForIDR from a stateMu-guarded bool into an
atomic.Bool so WriteFrame can flip it from the producer goroutine
on every drop. Combined with the existing dispatcher logic that
already drops non-IDR frames in this state, the effect is:
- First drop -> resync mode on for the connection.
- WriteFrame stops queuing P-frames into a doomed channel
(also relieves TCP backpressure faster).
- The dispatcher pulls and discards anything still queued until
the next IDR arrives.
- When the IDR arrives, we reset c.primed (so SPS/PPS get
re-prepended if the IDR doesn't already include them), set
needsResetCtx so the rect carries OpenH264FlagResetContext,
and the decoder re-initialises cleanly on a single keyframe.
This doesn't reduce the drop count — TCP backpressure under WiFi /
Tailscale is what it is — but each drop now produces a brief freeze
followed by a clean keyframe, instead of a full GOP of corruption.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 83ca5c2. Configure here.
| // client regardless of whether we prepended or not — once a | ||
| // decodable frame has reached the client, the subsequent | ||
| // stream is self-contained. | ||
| c.primed = true |
There was a problem hiding this comment.
Data race on primed field between goroutines
Medium Severity
The primed field is read (line 308) and written (line 330) in writeUpdate without holding stateMu, but WriteFrame sets c.primed = false under stateMu from the producer goroutine. This is a data race between the dispatcher and producer goroutines. The field needs to be read/written under the same lock in both locations, or moved into the existing stateMu-guarded block at the top of writeUpdate.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 83ca5c2. Configure here.
This doesn't seem correct to me. The native binary is open-source and in this repo, and it captures raw YUV data: kvm/internal/native/cgo/video.c Line 452 in 79c23ec That YUV data is currently being given to the RK_MPI video encoder, but it could be processed directly by the VNC code. (IDK if VNC supports YUV data, so you might need to ask for RGB data instead.) You could probably even selectively feed portions of a frame to the hardware H.264 or JPEG encoders after calculating damage. |
| // waitingForIDR signals that the client's H.264 decoder needs an | ||
| // IDR keyframe to (re)synchronise — set on initial connect and | ||
| // also flipped back to true by WriteFrame whenever a drop occurs, | ||
| // so the dispatcher can skip every subsequent P-frame instead of | ||
| // shoveling broken references at the decoder for the rest of the | ||
| // GOP. Cleared once the IDR has been emitted with the OpenH264 | ||
| // reset-decoder flag. |
There was a problem hiding this comment.
Without closing the feedback loop to the encoder, this is going to cause significant latency to build up if the client isn't able to request frames at the same rate as the encoder is producing them, until eventually the queue is full and it (re)starts with a new I-frame.
The update request should be plumbed all the way through to the encoder loop so it only encodes the next frame when there is a client that needs it:
kvm/internal/native/cgo/video.c
Line 621 in 79c23ec
…eVideo Introduces acquireVideoStream / releaseVideoStream / videoStreamHasConsumers helpers in video.go that gate the native encoder's VideoStart / VideoStop and the HDMI sleep ticker on a set of named consumers. The encoder runs iff at least one consumer holds it; the 0→1 transition runs VideoStart so the first delivered frame is an IDR. Each WebRTC Session gets a stable id (uuid in newSession) and its own consumer key "webrtc:<id>". The slot is acquired when ICE reaches Connected and released when it reaches Closed. pauseVideo / resumeVideo JSON-RPC notifications toggle the calling session's own slot, handled inline in onRPCMessage so the dispatcher's session reference is in scope (the generic reflection-based dispatch doesn't pass *Session). This data model fixes by construction the two race classes Cursor Bugbot flagged on earlier drafts of this PR: - Handover-while-paused: the new session's own acquire is what starts the encoder; no special-case "restart if the outgoing session was paused" code at the handover site. - Stale-session pause: a pauseVideo from the soon-to-close session releases only its own slot, not the new session's; no need for a session != currentSession gate. The 1s handover overlap can briefly deliver frames to a paused session's track if the other session is keeping the encoder alive; bounded by the existing peer-connection close delay and acceptable for a feature aimed at sustained idle bandwidth saving. The refcount helpers are byte-identical to those in PR jetkvm#1447 (commit dec25e1), so that PR will merge cleanly on top. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
I've decided to no longer pursue this feature. My original reasons for adding VNC support, was because I wanted a better client experience. After realizing that this was going to become very difficult by going the VNC route, I've decide to instead write a custom client, leveraging WebRTC as protocol. This solves all my needs and thus I an no longer interested in adding VNC support. Having said that, there is still value in this work so I won't close this PR (unless the maintainers want me to). Someone can start from this and maybe implement the suggested (delta) JPEG encoding to
|
…eVideo Introduces acquireVideoStream / releaseVideoStream / videoStreamHasConsumers helpers in video.go that gate the native encoder's VideoStart / VideoStop and the HDMI sleep ticker on a set of named consumers. The encoder runs iff at least one consumer holds it; the 0→1 transition runs VideoStart so the first delivered frame is an IDR. Each WebRTC Session gets a stable id (uuid in newSession) and its own consumer key "webrtc:<id>". The slot is acquired when ICE reaches Connected and released when it reaches Closed. pauseVideo / resumeVideo JSON-RPC notifications toggle the calling session's own slot, handled inline in onRPCMessage so the dispatcher's session reference is in scope (the generic reflection-based dispatch doesn't pass *Session). This data model fixes by construction the two race classes Cursor Bugbot flagged on earlier drafts of this PR: - Handover-while-paused: the new session's own acquire is what starts the encoder; no special-case "restart if the outgoing session was paused" code at the handover site. - Stale-session pause: a pauseVideo from the soon-to-close session releases only its own slot, not the new session's; no need for a session != currentSession gate. The 1s handover overlap can briefly deliver frames to a paused session's track if the other session is keeping the encoder alive; bounded by the existing peer-connection close delay and acceptable for a feature aimed at sustained idle bandwidth saving. The refcount helpers are byte-identical to those in PR jetkvm#1447 (commit dec25e1), so that PR will merge cleanly on top. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…eVideo Introduces acquireVideoStream / releaseVideoStream / videoStreamHasConsumers helpers in video.go that gate the native encoder's VideoStart / VideoStop and the HDMI sleep ticker on a set of named consumers. The encoder runs iff at least one consumer holds it; the 0→1 transition runs VideoStart so the first delivered frame is an IDR. Each WebRTC Session gets a stable id (uuid in newSession) and its own consumer key "webrtc:<id>". The slot is acquired when ICE reaches Connected and released when it reaches Closed. pauseVideo / resumeVideo JSON-RPC notifications toggle the calling session's own slot, handled inline in onRPCMessage so the dispatcher's session reference is in scope (the generic reflection-based dispatch doesn't pass *Session). This data model fixes by construction the two race classes Cursor Bugbot flagged on earlier drafts of this PR: - Handover-while-paused: the new session's own acquire is what starts the encoder; no special-case "restart if the outgoing session was paused" code at the handover site. - Stale-session pause: a pauseVideo from the soon-to-close session releases only its own slot, not the new session's; no need for a session != currentSession gate. The 1s handover overlap can briefly deliver frames to a paused session's track if the other session is keeping the encoder alive; bounded by the existing peer-connection close delay and acceptable for a feature aimed at sustained idle bandwidth saving. The refcount helpers are byte-identical to those in PR jetkvm#1447 (commit dec25e1), so that PR will merge cleanly on top. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>


Closes #117
Summary
Adds a VNC server that an H.264-capable VNC client (TigerVNC ≥ 1.13.0 built with FFmpeg, or PiKVM's mdevaev/tigervnc build) can connect to as an alternative to the WebRTC web UI. Frames are shipped using RFB pseudo-encoding 50 (rfbproto PR #39, TigerVNC PR #1194) — the JetKVM device emits H.264 NALs, and the server forwards them to the client unchanged. No software decode on the device.
Why this approach
The "obvious" path — implement a normal RFB server that ships raw RGB pixels — is not viable here. JetKVM's video pipeline only emits already-encoded H.264/H.265 NALs from an opaque native binary; there is no path to raw frames or JPEG snapshots. Software-decoding H.264 just to re-encode as Tight/Raw would burn most of the CPU budget on the embedded ARM SoC.
PiKVM solved the same problem with RFB pseudo-encoding 50 ("Open H.264 Encoding"), which carries an Annex-B H.264 bytestream straight to the client. JetKVM already produces exactly the right bytes. Compatible with TigerVNC ≥ 1.13.0 built with FFmpeg.
RDP was considered (and the user asked) but rejected: the RDP stack (TPKT/X.224 → MCS → TLS/CredSSP/NLA → RDPGFX for H.264) is roughly an order of magnitude more code than RFB, and there is no mature pure-Go RDP server.
What's in the PR
Layered bottom-up across 9 commits, each independently reviewable and not breaking the build:
video: refcount the capture pipeline— replaces the WebRTC-specificVideoStart/VideoStopwith named-consumer refcount (acquireVideoStream/releaseVideoStream). Behaviour-preserving prerequisite.internal/rfb: protocol primitives— pure-Go RFB 3.8 server library (handshake, VNCAuth viacrypto/des, message decoders, Raw + OpenH264 + DesktopSize encoding writers). 38 unit tests, no JetKVM imports.internal/rfb: keysym to HID Usage ID mapping— full ~150-entry table: ASCII printable, F1–F24, modifiers, navigation, full keypad, lock keys, Latin-1.internal/rfb: placeholder framebuffer for non-H.264 clients— "Please use TigerVNC ≥ 1.13.0 with H.264 support" rendered viagolang.org/x/image/font/basicfont. Pinned to v0.18.0 for Go 1.24 compatibility.vnc: scaffold (config, JSON-RPC, logger)— addsVncEnabled/VncPort/VncPasswordconfig fields, JSON-RPC getters/setters,vncLogger, and stubStart/StopVNCServer.vnc: server, frame fan-out, HID forwarding— fills out the scaffold: TCP listener, per-client bounded channels (cap=2, non-blocking sends drop on full so slow clients can't backpressure WebRTC), SPS/PPS caching for late-joiners, IDR-frame filtering, HID forwarding (PointerEvent→rpcAbsMouseReport+ wheel translation;KeyEvent→ keysym lookup →rpcKeypressReport).resolveCodecforces H.264 when VNC is enabled so a Mac/Windows browser can't renegotiate H.265 mid-VNC-session.ui: VNC settings panel—/settings/vncroute with toggle, port, password. Localization keys added to all 14 language files (English values only — translations are a follow-up).vnc: drop VncAllowOverWAN— removed an opaque non-loopback-bind guard that surfaced as a generic "internal error" in the frontend. Operators are responsible for tunneling VNCAuth via SSH/Tailscale on untrusted networks; godoc onVncPasswordsays so.vnc: pin H.264 codec when VNC starts the capture pipeline— fixes a subtle bug uncovered by on-device testing: when VNC was the first consumer of the capture pipeline (no browser open),VideoSetCodecTypewas never called, so the codec stayed at whatever the previous WebRTC session left it. With H.265 sessions on Mac/Windows browsers, VNC then shipped H.265 NALs over OpenH264 (which is H.264-only) and clients silently failed to decode.acquireVideoStreamWithCodec(consumer, codecType)now sets the codec only when the caller is the first consumer, preserving the "must not be called mid-stream" invariant.Tests
internal/rfbcovering handshake byte sequences, VNCAuth (DES), message round-trips, Raw + OpenH264 + DesktopSize encoding writers, keysym→HID lookup, and placeholder rendering.kvmpackage for the H.264 Annex-B splitter, mouse-coord scaling, and wheel-button rising-edge detection.Known limitations / follow-ups
ForceKeyFrameRPC exists in the native binary. New clients wait for the next natural IDR (typically ≤ 1–2 s). Documented; the right fix is a native-binary change.LocalLoopbackOnly. The repo already has TLS infrastructure (web_tls.go) that VeNCrypt can reuse later.splitAnnexBhelper duplicated here can be consolidated. Per its review status I chose not to base on its branch — the conflict is one line innative.goeither way.Logging
The default log level is WARN. To see VNC diagnostics (server listening, client connections, advertised encodings):
JETKVM_LOG_INFO=vnc /userdata/jetkvm/bin/jetkvm_app # or full trace: JETKVM_LOG_TRACE=vncSet persistently via a systemd unit override at
/etc/systemd/system/jetkvm.service.d/log.conf.Checklist
make test_e2elocally and passedCloses #117)🤖 Generated with Claude Code
Note
High Risk
High risk because it introduces a new TCP-exposed remote-access service with authentication handling and changes capture-pipeline start/stop/codec coordination shared with WebRTC, which can impact availability and video behavior.
Overview
Adds a configurable VNC server (RFB 3.8) that listens on a user-set port, supports
Noneor classicVNCAuth, and streams video using OpenH264 pseudo-encoding50(with a rendered Raw placeholder for non-H.264 clients), including resolution-change signaling and an extended mouse-buttons negotiation path.Integrates VNC into the runtime: new config fields + JSON-RPC
getVNCConfig/setVNCConfig, newvnclogger, VNC start on boot when enabled, and per-frame fan-out from the native video callback with SPS/PPS caching, IDR resync logic, and HID input forwarding.Refactors capture-pipeline lifecycle to a named-consumer refcount (
acquireVideoStreamWithCodec/releaseVideoStream) used by both WebRTC and VNC, updates HDMI sleep behavior to depend on active consumers, and forces H.264 codec selection whenever VNC is enabled to avoid H.265/VNC decoder desync.Adds a new
/settings/vncUI route (with localization keys) to manage enable/port/password/keymap, and introducesgolang.org/x/imagedependency for placeholder rendering.Reviewed by Cursor Bugbot for commit 83ca5c2. Bugbot is set up for automated code reviews on this repo. Configure here.