Skip to content

Add mdata-client: Rust rewrite of SmartOS metadata client#19

Draft
nwilkens wants to merge 8 commits intomainfrom
mdata-client
Draft

Add mdata-client: Rust rewrite of SmartOS metadata client#19
nwilkens wants to merge 8 commits intomainfrom
mdata-client

Conversation

@nwilkens
Copy link
Copy Markdown
Member

Summary

Rust implementation of the SmartOS metadata client (mdata-get, mdata-put, mdata-list, mdata-delete), replacing the existing C implementation in triton/mdata-client.

  • Implements both V1 and V2 metadata protocols with automatic negotiation
  • Supports Unix domain sockets (SmartOS zones) and serial ports (KVM/HVM guests)
  • Platform detection tries zone socket paths first, then serial ports — works on illumos, Linux, and BSD
  • ~1,100 lines of Rust; uses base64, crc32fast, and libc crates
  • Uses poll() for I/O timeouts across all platforms (vs platform-specific event ports/epoll/kqueue in the C version)
  • Integrates into the monitor-reef workspace

Test plan

Tested on two live instances with full CRUD operations:

SmartOS zone (smartos-build1, base-64-lts@24.4.1 — Unix socket transport):

# mdata-list → empty (exit 0) ✓
# mdata-put test_key "hello from rust" → exit 0 ✓
# mdata-get test_key → "hello from rust" (exit 0) ✓
# mdata-list → "test_key" (exit 0) ✓
# mdata-delete test_key → exit 0 ✓
# mdata-get test_key → "No metadata for 'test_key'" (exit 1) ✓
# mdata-get sdc:nics → full NIC JSON (exit 0) ✓

Ubuntu 24.04 bhyve VM (edge_foundry — serial port transport):

# mdata-list → "root_authorized_keys\nroot_pw" (exit 0) ✓
# mdata-get sdc:nics → full NIC JSON (exit 0) ✓
# mdata-put test_key "hello from ubuntu rust" → exit 0 ✓
# mdata-get test_key → "hello from ubuntu rust" (exit 0) ✓
# mdata-delete test_key → exit 0 ✓
# mdata-get test_key → "No metadata for 'test_key'" (exit 1) ✓

Cross-tool interoperability (SmartOS): Native C mdata-put → Rust mdata-get and vice versa — both directions work correctly.

🤖 Generated with Claude Code

Implements the V1/V2 metadata protocol with support for Unix domain
sockets (SmartOS zones) and serial ports (KVM/HVM guests). Provides
four binaries matching the original C tools: mdata-get, mdata-put,
mdata-list, mdata-delete.

Tested on SmartOS (base-64-lts zone) and Ubuntu 24.04 (bhyve VM).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@nwilkens nwilkens marked this pull request as draft March 19, 2026 15:10
nwilkens and others added 7 commits March 19, 2026 11:30
Ran /simplify code review which identified four issues worth fixing:

- Bound retry loops: execute() gives up after 3 timeout retries,
  V2 stale frame handling stops after 5 mismatched frames
- Typed error for request ID mismatch: replace string matching on
  error messages with FrameError::ReqIdMismatch enum variant
- Protocol encapsulation: move base64 encoding and V2 version checks
  into Protocol::put() and Protocol::delete() methods so binaries
  don't depend on protocol internals
- FD leak fix: close serial fd if configure_serial_raw fails

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Split transport module into platform-specific files:
- transport/unix.rs: Unix domain sockets + serial (poll-based I/O)
- transport/windows.rs: COM2 serial port (Win32 API)
- transport/mod.rs: shared types (TransportError, TransportConfig)

Windows transport uses CreateFileW/SetCommState/SetCommTimeouts/
ReadFile/WriteFile with inline FFI definitions (no additional
dependencies). The original mdata-get.exe in sdc-vmtools was a
V1-only GET client from Visual Studio 2010; this provides all
four commands with V2 protocol support.

Verified: compiles for x86_64-pc-windows-msvc target.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Type safety:
- Add Command enum (Get/Put/Delete/Keys) replacing bare strings
- Add ProtocolVersion enum (V1/V2) replacing magic u8 comparisons
- Add #[must_use] and Debug derives on Response

Correctness:
- Replace std::mem::forget with into_raw_fd for fd ownership
- Handle EINTR in send() (was only handled in recv_line)
- Loop on EINTR in poll_readable instead of returning Ok(false)
- Replace File::open("/dev/urandom") with getrandom crate
  (fixes weak request IDs on Windows where /dev/urandom doesn't exist)

Testability:
- Extract MetadataTransport trait from concrete Transport
- Make Protocol generic over MetadataTransport
- Add MockTransport and protocol-level tests (V1 get, negotiation)

Documentation:
- Document PUT's intentional double base64 encoding (wire format)
- Remove unused version() getter

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move negotiate() from free function into Protocol impl (issue 16)
- Extract parse_v2_body() from inner closure in parse_v2_frame (issue 17)
- Add manual Debug impl for Protocol<T> (issue 14)
- Document error handling boundary in module doc (issue 15)
- Document why Windows FFI is hand-rolled vs windows crate (issue 13)
- Remove conditional no-op test from unix.rs (issue 18)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add tracing + tracing-subscriber for debug logging, controlled by
the MDATA_DEBUG=1 env var. Normal operation produces no extra output
(matching the original C tools). Debug mode logs transport detection,
protocol negotiation, and timeout retries to stderr.

- Add init_logging() helper in lib.rs
- Convert retry eprintln to tracing::warn
- Add tracing::debug for transport detection and negotiation
- Call init_logging() from all 4 binaries

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace raw libc/Win32 FFI with std::fs::File and UnixStream for all
read/write/open/close operations. The socket transport (zones, the
common case) now uses zero unsafe.

Unsafe is kept only where no safe Rust equivalent exists:
- termios configuration (tcgetattr/tcsetattr)
- exclusive file locking (fcntl F_SETLK)
- poll-based serial timeouts (File has no set_read_timeout)
- Windows serial config (GetCommState/SetCommState/SetCommTimeouts)

All remaining unsafe blocks have SAFETY comments.

Removed entirely:
- libc::open/close/read/write (replaced by File/UnixStream)
- Custom Drop impls (Rust ownership handles fd/handle cleanup)
- Win32 CreateFileW/CloseHandle/ReadFile/WriteFile (replaced by File)

Unsafe blocks: 29 -> 15 (socket path: 0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Error handling (err-context-chain):
- Use .context() instead of map_err(|e| anyhow!("...: {e}")) in
  decode_b64_payload and generate_request_id to preserve error chains

Correctness:
- Check return values of fcntl(F_SETFL) and tcflush, log on failure

API quality (api-common-traits):
- Add Hash to Command enum
- Add Clone, PartialEq to Response enum
- Use f.write_str() instead of write!() for string literals in
  Command::Display (mem-avoid-format)

Tests:
- Add serial negotiation test (verifies \n reset sent before NEGOTIATE)
- Add PUT double-encoding roundtrip test
- Add stale V2 frame discard test

Test count: 10 -> 13

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant