Skip to content

Commit 94ee495

Browse files
danbugsludfjigandreiltd
authored
feat: i686 page tables, snapshot compaction, and CoW (standalone) (#1385)
* refactor: replace nanvix-unstable with i686-guest and guest-counter features Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com> Co-authored-by: danbugs <danilochiarlone@gmail.com> * feat: i686 protected-mode boot and unified restore path Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com> Co-authored-by: danbugs <danilochiarlone@gmail.com> * feat: i686 page tables, snapshot compaction, and CoW support Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com> Co-authored-by: danbugs <danilochiarlone@gmail.com> * fixup: address PR #1381 review comments - sandbox/snapshot.rs: decode i686 PTEs with `u32::from_le_bytes` instead of `from_ne_bytes`. Page-table entries are defined as little-endian by arch spec; `from_ne_bytes` is incorrect on big-endian hosts and inconsistent with surrounding helpers that already use `from_le_bytes`. - hyperlight_common/vmem.rs: guard the `i686-guest` feature with a `compile_error!` on targets that are neither `x86` (the guest itself) nor `x86_64` (the host that runs the guest). Previously enabling the feature on e.g. aarch64 would silently compile but downstream crates would hit confusing missing-item errors when they reached for `vmem::i686_guest`. - mem/mgr.rs, sandbox/snapshot.rs: persist the per-process PD-roots count in `Snapshot` and rewrite the scratch bookkeeping area (`SCRATCH_TOP_PD_ROOTS_{COUNT,ARRAY}_OFFSET`) during `restore_snapshot`. Scratch is zeroed on restore, so without this a subsequent `snapshot()` call would read count=0 through `read_pd_roots_from_scratch` and fail. Root `i`'s compacted GPA is deterministic (`layout.get_pt_base_gpa() + i * PAGE_SIZE` — same layout `compact_i686_snapshot` used when building the rebuilt PDs), so we only need to store the count. Signed-off-by: danbugs <danilochiarlone@gmail.com> * fix: replace all the remaining nanvix-unstable with i686-guest Signed-off-by: Tomasz Andrzejak <andreiltd@gmail.com> * fix: add i686-guest snapshot tests Signed-off-by: Tomasz Andrzejak <andreiltd@gmail.com> * refactor: address review feedback Refactor i686 page table code to follow the same architecture-independent API as amd64/aarch64, per reviewer feedback. hyperlight_common changes: - Extract shared page table iterators (modify_ptes, read_pte_if_present, require_pte_exist, write_entry_updating) to vmem.rs with PTE_SHIFT const generic for reuse across architectures. - Implement i686/vmem.rs with proper map()/virt_to_phys() using the shared iterators, replacing stubs and the host-side i686_guest submodule that was in amd64/vmem.rs. - Remove PAGE_USER from pte_for_table (PDEs are supervisor-only by default); user-space PDEs get PAGE_USER via post-processing. - Remove SCRATCH_TOP_PD_ROOTS_* and MAX_PD_ROOTS constants (replaced by set_pt_root_finder callback). hyperlight_host changes: - Unify GuestPageTableBuffer<PTE_BYTES> for both architectures, removing i686_pt::Builder, build_initial_i686_page_tables, and compact_i686_snapshot (~500 lines removed from snapshot.rs). - Add root_offset to GuestPageTableBuffer for targeting per-process PDs, with finalize_multi_root() to replicate kernel PDEs, map scratch, and set PAGE_USER across all roots. - Snapshot::new uses filtered_mappings with root-index tagging for per-process PD isolation in a single pass (no double PT walk). - Add set_pt_root_finder callback on MultiUseSandbox, replacing the scratch-based PD roots bookkeeping. - Add CR0 named constants (CR0_PE, CR0_ET, CR0_WP, CR0_PG). - Add compaction_kind() helper to deduplicate kind conversion. - Add 5 i686 unit tests for map/virt_to_phys. - Detailed comments on separate_pt_bytes explaining the map_file_cow GPA overlap constraint. Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com> * Remove separate_pt_bytes, keep PT tail in snapshot blob but exclude from guest memory slot Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com> * Remove pub mod i686_guest, use arch-conditional mod arch for vmem, add user_accessible to Mapping Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com> * Remove cow_map, PT walk already follows CoW'd pointers via resolve_gpa Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com> * Clean up shared PT iterators: restore sealed TableMovability, export arch constants, remove PTE_SHIFT, restore comments Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com> * Clean up snapshot and page table code Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com> * feat: multi-space page-table walk + link primitives Adds SpaceAwareMapping, SpaceReferenceMapping, SpaceId, walk_va_spaces, and space_aware_map in hyperlight_common::vmem. These primitives let a caller walk several PT roots together, detect aliased intermediate tables (the 'kernel-half shared' pattern where multiple process PDs point at the same PT pages), and rebuild those aliases on the write side by linking into already-built tables instead of copying. i686 implements the real walker (depth-1 PT sharing, which is what Nanvix needs) plus space_aware_map that composes a PDE from the owning space's rebuilt table. amd64 gets a non-aliasing single-root walker so the architecture-independent re-export compiles; no current amd64 embedder exercises aliasing. aarch64 remains a TODO with stubs. Signed-off-by: danbugs <danilochiarlone@gmail.com> Signed-off-by: Tomasz Andrzejak <andreiltd@gmail.com> * feat: use multi-space walker in snapshot relocation Replaces the old filtered_mappings / per-root dedup-by-phys_base compaction loop in Snapshot::new with a single walk_va_spaces pass that emits SpaceAwareMapping entries. For ThisSpace leaves the code still compacts into the dense snapshot blob via the phys_seen dedup map. For AnotherSpace entries (produced when a later root reuses an earlier root's intermediate table) it calls space_aware_map to link the rebuilt table in-place. This fixes Nanvix-on-Hyperlight with multiple process PDs: the kernel-half PTs that process PDs share by pointing at the same PT page would previously each get cloned into independent copies, so a post-boot write into kernel memory via one PD would not be visible via another. The linked-table rebuild preserves the shared-PT invariant, and Hello 5/5 now runs cleanly under NANVIX_REPEAT=4. Signed-off-by: danbugs <danilochiarlone@gmail.com> Signed-off-by: Tomasz Andrzejak <andreiltd@gmail.com> * feat: add snapshot generation counter to scratch bookkeeping Signed-off-by: Tomasz Andrzejak <andreiltd@gmail.com> * fix: run cargo fmt Signed-off-by: Tomasz Andrzejak <andreiltd@gmail.com> * fix: make clippy happy Signed-off-by: Tomasz Andrzejak <andreiltd@gmail.com> * review: arch-neutral naming and doc/comment cleanup Addresses a batch of open review comments on #1385: - Rename `pml4_addr` -> `root_pt_addr` in `HyperlightVm::new` and the `standard_{32bit_paging,64bit}_defaults` helpers to reflect that the same path supports both amd64 (PML4) and i686 (PD) guests. - Tighten the `PtRootFinder` docstring to describe the 3rd parameter as the root PT GPA of the currently-executing address space rather than naming CR3 (arch-specific). - Reword `vmem::Mapping::user_accessible` to be phrased in privilege-level terms rather than naming x86 ring-3/PAGE_USER. - Simplify `space_aware_map` safety doc to describe the invariant ("roots must contain populated PTs for any space referenced by the mapping") instead of how the typical caller obtains it. - Drop the Nanvix-specific example from the `walk_va_spaces` doc and keep only the generic topological-ordering invariant. - Add a TODO to amd64's `walk_va_spaces` noting that aliased intermediate-table detection is not yet implemented (follow-up work). - Restore the explanatory comment above `#[cfg(not(feature = "i686-guest"))] let regions = Vec::new()` in `Snapshot::new` — on i686 embedders use `map_file_cow` regions for ramfs images which must survive into the snapshot identity. - Update `mgr.rs` comments on `SCRATCH_TOP_SNAPSHOT_PT_GPA_BASE` and `copy_pt_to_scratch` to explain the real reason for the PT-tail arrangement (avoid overlap with `map_file_cow` regions that follow the snapshot in guest PA space). Signed-off-by: danbugs <danilochiarlone@gmail.com> * review: track snapshot generation per-snapshot, not per-restore Addresses review: the guest-visible counter in scratch should reflect 'how many snapshots have happened in my execution path from init to here', not 'how many restores have occurred into this (possibly cross-tenant) hypervisor partition'. - Rename `SandboxMemoryManager::restore_count` -> `snapshot_count`; increment it in `snapshot()` rather than in `restore_snapshot()`. - Store the generation number on each `Snapshot` as `snapshot_generation`, assigned at snapshot time. - On restore, adopt the restored snapshot's own generation rather than bumping a local restore counter. That way the guest's `SCRATCH_TOP_SNAPSHOT_GENERATION` always tracks 'which snapshot am I currently a clone of', which is stable under snap/restore cycles and independent of other tenants' activity. The scratch-side offset (`SCRATCH_TOP_SNAPSHOT_GENERATION_OFFSET`) already had the correct name and stays put; only the host-side accounting moves. Signed-off-by: danbugs <danilochiarlone@gmail.com> * review: inline copy_pt_to_scratch into update_scratch_bookkeeping The two were always called together, so merging them gives one scratch-setup entry point instead of two. No behavioural change. Signed-off-by: danbugs <danilochiarlone@gmail.com> * review: move read_pte_if_present / require_pte_exist into arch modules The 'present' check is arch-specific — amd64 and i686 use a single bit, but on aarch64 the lower two bits of the entry need to be checked (and the decision also interacts with hugepage encoding). Move the helpers out of the common module so each arch can define what 'present' means locally; crate::vmem no longer depends on PAGE_PRESENT being a single-bit mask. Signed-off-by: danbugs <danilochiarlone@gmail.com> * review: move UpdateParentRoot and UpdateParentTable into amd64 module The impls for both structs (`impl UpdateParent` + `impl TableMovability` via `MayMoveTable`) live entirely in the amd64 module, and i686 statically doesn't support `TableOps` implementations that relocate tables, so the types were never used on i686. Move them to amd64 and leave only the genuinely arch-independent `UpdateParentNone` in the common module. Signed-off-by: danbugs <danilochiarlone@gmail.com> * review: GuestPageTableBuffer takes TableAddr, saves root addrs from alloc_table Two related review points: - `set_root_offset(offset: usize)` back-doored the buffer's internal layout; replace it with `set_root(addr: u64)` that takes a `TableAddr` directly, matching what the rest of the vmem API operates on. Callers create new roots by calling `alloc_table` and feeding the returned address straight in. - The snapshot-rebuild loop pre-allocated N-1 roots for the non-primary spaces and then *computed* each root's GPA from `(base_gpa + idx * PAGE_SIZE)`, which duplicated the buffer's internal layout at call site. Save the `alloc_table` return values in a `root_addrs: Vec<u64>` and index that instead. Also rename the internal `root_offset: Cell<usize>` -> `root: Cell<u64>` and add `initial_root()` so callers never need to name a byte offset inside the buffer. Signed-off-by: danbugs <danilochiarlone@gmail.com> * review: narrow vmem helpers from pub(crate) to pub(in crate::vmem) The helpers factored out of the arch-specific modules (`bits`, `write_entry_updating`, `modify_ptes`, `ModifyPteIterator`, `MapRequest`, `MapResponse`) exist only so the arch implementations can share code — nothing outside `hyperlight_common::vmem` has any business reaching them. Narrow them from `pub(crate)` to `pub(in crate::vmem)` so they are reachable from the `arch` submodules only. `read_pte_if_present` / `require_pte_exist` were moved per-arch earlier in this PR and already use `pub(super)`. Signed-off-by: danbugs <danilochiarlone@gmail.com> * review: drop embedder-region preservation across snapshot/restore Previously `Snapshot::new` preserved the embedder-provided regions into the snapshot (with a `#[cfg(not(feature = "i686-guest"))]` gate that zeroed them only on amd64). This was unnecessary — every VA that used to map into a `map_file_cow` region has already been rewritten during compaction to point at the new copy inside the snapshot blob (see the `guest_page` walk in `Snapshot::new`), so the Hyperlight-core restore path does not need the originals re-mapped for translation to work. Keeping them was also actively unsound: if the new snapshot PAs happen to overlap the old region PAs, re-mapping the originals on restore would corrupt the snapshot data the guest is reading from. Drop the preservation entirely — always emit an empty `regions` list from `Snapshot::new`. Embedders that were relying on the old behaviour (notably Nanvix's ramfs path) need to ensure any data they still care about is reachable via the guest's relocated VAs, which is already the contract the rest of the relocation machinery assumes. Signed-off-by: danbugs <danilochiarlone@gmail.com> * fix: cargo fmt Two sites where nightly fmt wraps lines that exceeded the width after earlier commits in this PR: the import block in `arch/amd64/vmem.rs` (now that the `UpdateParentRoot` / `UpdateParentTable` re-exports were removed) and the `standard_64bit_defaults` call in `hyperlight_vm/x86_64.rs` (grew past the limit with `_root_pt_addr`). Signed-off-by: danbugs <danilochiarlone@gmail.com> --------- Signed-off-by: danbugs <danilochiarlone@gmail.com> Signed-off-by: Tomasz Andrzejak <andreiltd@gmail.com> Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com> Co-authored-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com> Co-authored-by: Tomasz Andrzejak <andreiltd@gmail.com>
1 parent 271e69b commit 94ee495

30 files changed

Lines changed: 1663 additions & 595 deletions

File tree

Justfile

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -268,7 +268,7 @@ test-rust-tracing target=default-target features="":
268268
check-i686 target=default-target:
269269
cargo check -p hyperlight-common --target i686-unknown-linux-gnu --profile={{ if target == "debug" { "dev" } else { target } }}
270270
cargo check -p hyperlight-guest --target i686-unknown-linux-gnu --profile={{ if target == "debug" { "dev" } else { target } }}
271-
cargo check -p hyperlight-common --target i686-unknown-linux-gnu --features nanvix-unstable --profile={{ if target == "debug" { "dev" } else { target } }}
271+
cargo check -p hyperlight-common --target i686-unknown-linux-gnu --features i686-guest --profile={{ if target == "debug" { "dev" } else { target } }}
272272
# Verify that trace_guest correctly fails on i686 (compile_error should trigger)
273273
! cargo check -p hyperlight-guest --target i686-unknown-linux-gnu --features trace_guest --profile={{ if target == "debug" { "dev" } else { target } }} 2>/dev/null
274274

@@ -291,8 +291,8 @@ check:
291291
{{ cargo-cmd }} check -p hyperlight-host --features print_debug {{ target-triple-flag }}
292292
{{ cargo-cmd }} check -p hyperlight-host --features gdb {{ target-triple-flag }}
293293
{{ cargo-cmd }} check -p hyperlight-host --features trace_guest,mem_profile {{ target-triple-flag }}
294-
{{ cargo-cmd }} check -p hyperlight-host --features nanvix-unstable {{ target-triple-flag }}
295-
{{ cargo-cmd }} check -p hyperlight-host --features nanvix-unstable,executable_heap {{ target-triple-flag }}
294+
{{ cargo-cmd }} check -p hyperlight-host --features i686-guest {{ target-triple-flag }}
295+
{{ cargo-cmd }} check -p hyperlight-host --features i686-guest,executable_heap {{ target-triple-flag }}
296296
{{ cargo-cmd }} check -p hyperlight-host --features hw-interrupts {{ target-triple-flag }}
297297

298298
fmt-check: (ensure-nightly-fmt)

src/hyperlight_common/Cargo.toml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,9 @@ fuzzing = ["dep:arbitrary"]
3131
trace_guest = []
3232
mem_profile = []
3333
std = ["thiserror/std", "log/std", "tracing/std"]
34-
nanvix-unstable = []
34+
i686-guest = []
35+
nanvix-unstable = ["i686-guest"]
36+
guest-counter = []
3537

3638
[lib]
3739
bench = false # see https://bheisler.github.io/criterion.rs/book/faq.html#cargo-bench-gives-unrecognized-option-errors-for-valid-command-line-options

src/hyperlight_common/src/arch/aarch64/vmem.rs

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,8 @@ use crate::vmem::{Mapping, TableOps, TableReadOps, Void};
2020

2121
pub const PAGE_SIZE: usize = 4096;
2222
pub const PAGE_TABLE_SIZE: usize = 4096;
23+
pub const PAGE_PRESENT: u64 = 1; // AArch64: bit 0 is the "valid" bit
24+
pub const PTE_ADDR_MASK: u64 = 0x0000_FFFF_FFFF_F000; // bits [47:12]
2325
pub type PageTableEntry = u64;
2426
pub type VirtAddr = u64;
2527
pub type PhysAddr = u64;
@@ -44,6 +46,29 @@ pub unsafe fn virt_to_phys<'a, Op: TableReadOps + 'a>(
4446
core::iter::empty()
4547
}
4648

49+
/// Stub — see [`crate::vmem::walk_va_spaces`].
50+
#[allow(clippy::missing_safety_doc)]
51+
pub unsafe fn walk_va_spaces<Op: TableReadOps>(
52+
_op: &Op,
53+
_roots: &[Op::TableAddr],
54+
_address: u64,
55+
_len: u64,
56+
) -> ::alloc::vec::Vec<(
57+
crate::vmem::SpaceId,
58+
::alloc::vec::Vec<crate::vmem::SpaceAwareMapping>,
59+
)> {
60+
::alloc::vec::Vec::new()
61+
}
62+
63+
/// Stub — see [`crate::vmem::space_aware_map`].
64+
#[allow(clippy::missing_safety_doc)]
65+
pub unsafe fn space_aware_map<Op: TableOps>(
66+
_op: &Op,
67+
_ref_map: crate::vmem::SpaceReferenceMapping,
68+
_built_roots: &::alloc::collections::BTreeMap<crate::vmem::SpaceId, Op::TableAddr>,
69+
) {
70+
}
71+
4772
pub trait TableMovability<Op: TableReadOps + ?Sized, TableMoveInfo> {}
4873
impl<Op: TableOps<TableMovability = crate::vmem::MayMoveTable>> TableMovability<Op, Op::TableAddr>
4974
for crate::vmem::MayMoveTable

0 commit comments

Comments
 (0)