Commit 94ee495
feat: i686 page tables, snapshot compaction, and CoW (standalone) (#1385)
* refactor: replace nanvix-unstable with i686-guest and guest-counter features
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
Co-authored-by: danbugs <danilochiarlone@gmail.com>
* feat: i686 protected-mode boot and unified restore path
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
Co-authored-by: danbugs <danilochiarlone@gmail.com>
* feat: i686 page tables, snapshot compaction, and CoW support
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
Co-authored-by: danbugs <danilochiarlone@gmail.com>
* fixup: address PR #1381 review comments
- sandbox/snapshot.rs: decode i686 PTEs with `u32::from_le_bytes`
instead of `from_ne_bytes`. Page-table entries are defined as
little-endian by arch spec; `from_ne_bytes` is incorrect on
big-endian hosts and inconsistent with surrounding helpers that
already use `from_le_bytes`.
- hyperlight_common/vmem.rs: guard the `i686-guest` feature with a
`compile_error!` on targets that are neither `x86` (the guest
itself) nor `x86_64` (the host that runs the guest). Previously
enabling the feature on e.g. aarch64 would silently compile but
downstream crates would hit confusing missing-item errors when
they reached for `vmem::i686_guest`.
- mem/mgr.rs, sandbox/snapshot.rs: persist the per-process
PD-roots count in `Snapshot` and rewrite the scratch bookkeeping
area (`SCRATCH_TOP_PD_ROOTS_{COUNT,ARRAY}_OFFSET`) during
`restore_snapshot`. Scratch is zeroed on restore, so without
this a subsequent `snapshot()` call would read count=0 through
`read_pd_roots_from_scratch` and fail. Root `i`'s compacted
GPA is deterministic (`layout.get_pt_base_gpa() + i * PAGE_SIZE`
— same layout `compact_i686_snapshot` used when building the
rebuilt PDs), so we only need to store the count.
Signed-off-by: danbugs <danilochiarlone@gmail.com>
* fix: replace all the remaining nanvix-unstable with i686-guest
Signed-off-by: Tomasz Andrzejak <andreiltd@gmail.com>
* fix: add i686-guest snapshot tests
Signed-off-by: Tomasz Andrzejak <andreiltd@gmail.com>
* refactor: address review feedback
Refactor i686 page table code to follow the same architecture-independent
API as amd64/aarch64, per reviewer feedback.
hyperlight_common changes:
- Extract shared page table iterators (modify_ptes, read_pte_if_present,
require_pte_exist, write_entry_updating) to vmem.rs with PTE_SHIFT
const generic for reuse across architectures.
- Implement i686/vmem.rs with proper map()/virt_to_phys() using the
shared iterators, replacing stubs and the host-side i686_guest
submodule that was in amd64/vmem.rs.
- Remove PAGE_USER from pte_for_table (PDEs are supervisor-only by
default); user-space PDEs get PAGE_USER via post-processing.
- Remove SCRATCH_TOP_PD_ROOTS_* and MAX_PD_ROOTS constants (replaced
by set_pt_root_finder callback).
hyperlight_host changes:
- Unify GuestPageTableBuffer<PTE_BYTES> for both architectures,
removing i686_pt::Builder, build_initial_i686_page_tables, and
compact_i686_snapshot (~500 lines removed from snapshot.rs).
- Add root_offset to GuestPageTableBuffer for targeting per-process
PDs, with finalize_multi_root() to replicate kernel PDEs, map
scratch, and set PAGE_USER across all roots.
- Snapshot::new uses filtered_mappings with root-index tagging for
per-process PD isolation in a single pass (no double PT walk).
- Add set_pt_root_finder callback on MultiUseSandbox, replacing the
scratch-based PD roots bookkeeping.
- Add CR0 named constants (CR0_PE, CR0_ET, CR0_WP, CR0_PG).
- Add compaction_kind() helper to deduplicate kind conversion.
- Add 5 i686 unit tests for map/virt_to_phys.
- Detailed comments on separate_pt_bytes explaining the map_file_cow
GPA overlap constraint.
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
* Remove separate_pt_bytes, keep PT tail in snapshot blob but exclude from guest memory slot
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
* Remove pub mod i686_guest, use arch-conditional mod arch for vmem, add user_accessible to Mapping
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
* Remove cow_map, PT walk already follows CoW'd pointers via resolve_gpa
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
* Clean up shared PT iterators: restore sealed TableMovability, export arch constants, remove PTE_SHIFT, restore comments
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
* Clean up snapshot and page table code
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
* feat: multi-space page-table walk + link primitives
Adds SpaceAwareMapping, SpaceReferenceMapping, SpaceId, walk_va_spaces,
and space_aware_map in hyperlight_common::vmem. These primitives let a
caller walk several PT roots together, detect aliased intermediate
tables (the 'kernel-half shared' pattern where multiple process PDs
point at the same PT pages), and rebuild those aliases on the write
side by linking into already-built tables instead of copying.
i686 implements the real walker (depth-1 PT sharing, which is what
Nanvix needs) plus space_aware_map that composes a PDE from the owning
space's rebuilt table. amd64 gets a non-aliasing single-root walker so
the architecture-independent re-export compiles; no current amd64
embedder exercises aliasing. aarch64 remains a TODO with stubs.
Signed-off-by: danbugs <danilochiarlone@gmail.com>
Signed-off-by: Tomasz Andrzejak <andreiltd@gmail.com>
* feat: use multi-space walker in snapshot relocation
Replaces the old filtered_mappings / per-root dedup-by-phys_base
compaction loop in Snapshot::new with a single walk_va_spaces pass
that emits SpaceAwareMapping entries. For ThisSpace leaves the code
still compacts into the dense snapshot blob via the phys_seen dedup
map. For AnotherSpace entries (produced when a later root reuses an
earlier root's intermediate table) it calls space_aware_map to link
the rebuilt table in-place.
This fixes Nanvix-on-Hyperlight with multiple process PDs: the
kernel-half PTs that process PDs share by pointing at the same PT
page would previously each get cloned into independent copies, so a
post-boot write into kernel memory via one PD would not be visible
via another. The linked-table rebuild preserves the shared-PT
invariant, and Hello 5/5 now runs cleanly under NANVIX_REPEAT=4.
Signed-off-by: danbugs <danilochiarlone@gmail.com>
Signed-off-by: Tomasz Andrzejak <andreiltd@gmail.com>
* feat: add snapshot generation counter to scratch bookkeeping
Signed-off-by: Tomasz Andrzejak <andreiltd@gmail.com>
* fix: run cargo fmt
Signed-off-by: Tomasz Andrzejak <andreiltd@gmail.com>
* fix: make clippy happy
Signed-off-by: Tomasz Andrzejak <andreiltd@gmail.com>
* review: arch-neutral naming and doc/comment cleanup
Addresses a batch of open review comments on #1385:
- Rename `pml4_addr` -> `root_pt_addr` in `HyperlightVm::new` and the
`standard_{32bit_paging,64bit}_defaults` helpers to reflect that the
same path supports both amd64 (PML4) and i686 (PD) guests.
- Tighten the `PtRootFinder` docstring to describe the 3rd parameter
as the root PT GPA of the currently-executing address space rather
than naming CR3 (arch-specific).
- Reword `vmem::Mapping::user_accessible` to be phrased in
privilege-level terms rather than naming x86 ring-3/PAGE_USER.
- Simplify `space_aware_map` safety doc to describe the invariant
("roots must contain populated PTs for any space referenced by the
mapping") instead of how the typical caller obtains it.
- Drop the Nanvix-specific example from the `walk_va_spaces` doc and
keep only the generic topological-ordering invariant.
- Add a TODO to amd64's `walk_va_spaces` noting that aliased
intermediate-table detection is not yet implemented (follow-up work).
- Restore the explanatory comment above `#[cfg(not(feature =
"i686-guest"))] let regions = Vec::new()` in `Snapshot::new` — on
i686 embedders use `map_file_cow` regions for ramfs images which
must survive into the snapshot identity.
- Update `mgr.rs` comments on `SCRATCH_TOP_SNAPSHOT_PT_GPA_BASE` and
`copy_pt_to_scratch` to explain the real reason for the PT-tail
arrangement (avoid overlap with `map_file_cow` regions that follow
the snapshot in guest PA space).
Signed-off-by: danbugs <danilochiarlone@gmail.com>
* review: track snapshot generation per-snapshot, not per-restore
Addresses review: the guest-visible counter in scratch should reflect
'how many snapshots have happened in my execution path from init to
here', not 'how many restores have occurred into this (possibly
cross-tenant) hypervisor partition'.
- Rename `SandboxMemoryManager::restore_count` -> `snapshot_count`;
increment it in `snapshot()` rather than in `restore_snapshot()`.
- Store the generation number on each `Snapshot` as
`snapshot_generation`, assigned at snapshot time.
- On restore, adopt the restored snapshot's own generation rather
than bumping a local restore counter. That way the guest's
`SCRATCH_TOP_SNAPSHOT_GENERATION` always tracks 'which snapshot
am I currently a clone of', which is stable under snap/restore
cycles and independent of other tenants' activity.
The scratch-side offset (`SCRATCH_TOP_SNAPSHOT_GENERATION_OFFSET`)
already had the correct name and stays put; only the host-side
accounting moves.
Signed-off-by: danbugs <danilochiarlone@gmail.com>
* review: inline copy_pt_to_scratch into update_scratch_bookkeeping
The two were always called together, so merging them gives one
scratch-setup entry point instead of two. No behavioural change.
Signed-off-by: danbugs <danilochiarlone@gmail.com>
* review: move read_pte_if_present / require_pte_exist into arch modules
The 'present' check is arch-specific — amd64 and i686 use a single
bit, but on aarch64 the lower two bits of the entry need to be
checked (and the decision also interacts with hugepage encoding).
Move the helpers out of the common module so each arch can define
what 'present' means locally; crate::vmem no longer depends on
PAGE_PRESENT being a single-bit mask.
Signed-off-by: danbugs <danilochiarlone@gmail.com>
* review: move UpdateParentRoot and UpdateParentTable into amd64 module
The impls for both structs (`impl UpdateParent` + `impl TableMovability`
via `MayMoveTable`) live entirely in the amd64 module, and i686
statically doesn't support `TableOps` implementations that relocate
tables, so the types were never used on i686. Move them to amd64
and leave only the genuinely arch-independent `UpdateParentNone`
in the common module.
Signed-off-by: danbugs <danilochiarlone@gmail.com>
* review: GuestPageTableBuffer takes TableAddr, saves root addrs from alloc_table
Two related review points:
- `set_root_offset(offset: usize)` back-doored the buffer's internal
layout; replace it with `set_root(addr: u64)` that takes a
`TableAddr` directly, matching what the rest of the vmem API
operates on. Callers create new roots by calling `alloc_table` and
feeding the returned address straight in.
- The snapshot-rebuild loop pre-allocated N-1 roots for the
non-primary spaces and then *computed* each root's GPA from
`(base_gpa + idx * PAGE_SIZE)`, which duplicated the buffer's
internal layout at call site. Save the `alloc_table` return values
in a `root_addrs: Vec<u64>` and index that instead.
Also rename the internal `root_offset: Cell<usize>` -> `root:
Cell<u64>` and add `initial_root()` so callers never need to name a
byte offset inside the buffer.
Signed-off-by: danbugs <danilochiarlone@gmail.com>
* review: narrow vmem helpers from pub(crate) to pub(in crate::vmem)
The helpers factored out of the arch-specific modules (`bits`,
`write_entry_updating`, `modify_ptes`, `ModifyPteIterator`,
`MapRequest`, `MapResponse`) exist only so the arch implementations
can share code — nothing outside `hyperlight_common::vmem` has any
business reaching them. Narrow them from `pub(crate)` to
`pub(in crate::vmem)` so they are reachable from the `arch`
submodules only. `read_pte_if_present` / `require_pte_exist` were
moved per-arch earlier in this PR and already use `pub(super)`.
Signed-off-by: danbugs <danilochiarlone@gmail.com>
* review: drop embedder-region preservation across snapshot/restore
Previously `Snapshot::new` preserved the embedder-provided regions
into the snapshot (with a `#[cfg(not(feature = "i686-guest"))]`
gate that zeroed them only on amd64). This was unnecessary — every
VA that used to map into a `map_file_cow` region has already been
rewritten during compaction to point at the new copy inside the
snapshot blob (see the `guest_page` walk in `Snapshot::new`), so
the Hyperlight-core restore path does not need the originals
re-mapped for translation to work. Keeping them was also actively
unsound: if the new snapshot PAs happen to overlap the old region
PAs, re-mapping the originals on restore would corrupt the snapshot
data the guest is reading from.
Drop the preservation entirely — always emit an empty `regions`
list from `Snapshot::new`. Embedders that were relying on the old
behaviour (notably Nanvix's ramfs path) need to ensure any data they
still care about is reachable via the guest's relocated VAs, which
is already the contract the rest of the relocation machinery
assumes.
Signed-off-by: danbugs <danilochiarlone@gmail.com>
* fix: cargo fmt
Two sites where nightly fmt wraps lines that exceeded the width
after earlier commits in this PR: the import block in
`arch/amd64/vmem.rs` (now that the `UpdateParentRoot` /
`UpdateParentTable` re-exports were removed) and the
`standard_64bit_defaults` call in `hyperlight_vm/x86_64.rs` (grew
past the limit with `_root_pt_addr`).
Signed-off-by: danbugs <danilochiarlone@gmail.com>
---------
Signed-off-by: danbugs <danilochiarlone@gmail.com>
Signed-off-by: Tomasz Andrzejak <andreiltd@gmail.com>
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
Co-authored-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
Co-authored-by: Tomasz Andrzejak <andreiltd@gmail.com>1 parent 271e69b commit 94ee495
30 files changed
Lines changed: 1663 additions & 595 deletions
File tree
- src
- hyperlight_common
- src
- arch
- aarch64
- amd64
- i686
- hyperlight_guest_bin/src
- hyperlight_guest
- src
- hyperlight_host
- src
- hypervisor
- hyperlight_vm
- regs/x86_64
- virtual_machine
- kvm
- mshv
- mem
- sandbox
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
268 | 268 | | |
269 | 269 | | |
270 | 270 | | |
271 | | - | |
| 271 | + | |
272 | 272 | | |
273 | 273 | | |
274 | 274 | | |
| |||
291 | 291 | | |
292 | 292 | | |
293 | 293 | | |
294 | | - | |
295 | | - | |
| 294 | + | |
| 295 | + | |
296 | 296 | | |
297 | 297 | | |
298 | 298 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
34 | | - | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
35 | 37 | | |
36 | 38 | | |
37 | 39 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
| 24 | + | |
23 | 25 | | |
24 | 26 | | |
25 | 27 | | |
| |||
44 | 46 | | |
45 | 47 | | |
46 | 48 | | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
47 | 72 | | |
48 | 73 | | |
49 | 74 | | |
| |||
0 commit comments