Skip to content

Commit 31ee14b

Browse files
dreamliner787-9midnightveil
authored andcommitted
Add x86 vCPU on/off API
Add microkit_vcpu_x86_on() and microkit_vcpu_x86_off(). Signed-off-by: Bill Nguyen <bill.nguyen@unsw.edu.au>
1 parent ab5c4a5 commit 31ee14b

10 files changed

Lines changed: 412 additions & 18 deletions

File tree

.linkcheck-ignore.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
11
urls:
22
- "https://developer.arm.com/downloads/*"
3+
- "https://cdrdv2.intel.com/*"

docs/manual.md

Lines changed: 61 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -637,7 +637,10 @@ as to what fault occurred.
637637
The `reply_msginfo` argument is given by libmicrokit and can be used to reply to the fault.
638638

639639
The returned `seL4_Bool` is whether or not to reply to the fault with the message `reply_msginfo`.
640-
Returning `seL4_True` will reply to the fault. Returning `seL4_False` will not reply to the fault.
640+
Returning `seL4_True` will reply to the fault and resume the child PD or vCPU at the fault restart program counter.
641+
For more details, please see the "Faults" section of the [seL4 Manual](https://sel4.systems/Info/Docs/seL4-manual-latest.pdf).
642+
643+
Returning `seL4_False` will not reply to the fault and causes the child PD or vCPU to no longer run.
641644

642645
You can use `microkit_msginfo_get_label` on `msginfo` to deduce what kind of fault happened
643646
(for example, whether it was a user exception or a virtual memory fault).
@@ -649,6 +652,43 @@ To find the full list of possible faults that could occur and details regarding
649652
kind of fault, please see the 'Faults' section of the
650653
[seL4 reference manual](https://sel4.systems/Info/Docs/seL4-manual-latest.pdf).
651654

655+
### x86 VCPU fault
656+
Please see the 'VMX BASIC EXIT REASONS' section of the
657+
[Intel® 64 and IA-32 Architectures Software Developer’s Manual Combined Volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D, and 4]
658+
(https://cdrdv2.intel.com/v1/dl/getContent/671200) for a list of possible VM Exit reasons.
659+
660+
These message registers contain data relating to the VM Exit:
661+
- `SEL4_VMENTER_CALL_EIP_MR`: Instruction Pointer,
662+
- `SEL4_VMENTER_CALL_CONTROL_PPC_MR`: Primary Processor Based VM Execution Controls,
663+
- `SEL4_VMENTER_CALL_INTERRUPT_INFO_MR`: VM Entry Interruption-Information,
664+
- `SEL4_VMENTER_FAULT_REASON_MR`: VM Exit reason,
665+
- `SEL4_VMENTER_FAULT_QUALIFICATION_MR`: VM Exit qualification,
666+
- `SEL4_VMENTER_FAULT_INSTRUCTION_LEN_MR`: Length of instruction that caused the VM Exit,
667+
- `SEL4_VMENTER_FAULT_GUEST_PHYSICAL_MR`: Guest Physical Address of the VM Exit,
668+
- `SEL4_VMENTER_FAULT_RFLAGS_MR`: Guest FLAGS register,
669+
- `SEL4_VMENTER_FAULT_GUEST_INT_MR`: Guest interruptability,
670+
- `SEL4_VMENTER_FAULT_CR3_MR`: Guest CR3.
671+
672+
Some of these message registers may not contain valid data depending on the VM Exit reason,
673+
please consult the Intel SDM for more details.
674+
675+
These message registers contain the guest general purpose registers at the time of VM Exit:
676+
- `SEL4_VMENTER_FAULT_EAX`
677+
- `SEL4_VMENTER_FAULT_EBX`
678+
- `SEL4_VMENTER_FAULT_ECX`
679+
- `SEL4_VMENTER_FAULT_EDX`
680+
- `SEL4_VMENTER_FAULT_ESI`
681+
- `SEL4_VMENTER_FAULT_EDI`
682+
- `SEL4_VMENTER_FAULT_EBP`
683+
- `SEL4_VMENTER_FAULT_R8`
684+
- `SEL4_VMENTER_FAULT_R9`
685+
- `SEL4_VMENTER_FAULT_R10`
686+
- `SEL4_VMENTER_FAULT_R11`
687+
- `SEL4_VMENTER_FAULT_R12`
688+
- `SEL4_VMENTER_FAULT_R13`
689+
- `SEL4_VMENTER_FAULT_R14`
690+
- `SEL4_VMENTER_FAULT_R15`
691+
652692
## `microkit_msginfo microkit_ppcall(microkit_channel ch, microkit_msginfo msginfo)`
653693

654694
Performs a call to a protected procedure in a different PD.
@@ -826,6 +866,22 @@ virtual CPU with ID `vcpu`.
826866
Write the registers of a given virtual CPU with ID `vcpu`. The `regs` argument is the pointer to
827867
the struct of registers `seL4_VCPUContext` that are written from.
828868

869+
## `void microkit_vcpu_x86_on(void)`
870+
871+
Start running the vCPU bound to this PD.
872+
873+
The vCPU runs after every return from a Microkit entrypoint; it does not run concurrently with the native thread.
874+
875+
The kernel performs minimal setup of the vCPU's architectural state, configuring only the hardware-mandated
876+
fixed bits and setting VMCS fields required to safely context-switch between VMX root and non-root operations.
877+
The caller is responsible for initialising all other architectural state such as the instruction pointer using
878+
`microkit_vcpu_x86_write_vmcs()`, in accordance with the
879+
[Intel SDM](https://cdrdv2.intel.com/v1/dl/getContent/671200).
880+
881+
## `void microkit_vcpu_x86_off(void)`
882+
883+
Stop the PD from switching to guest execution mode when a Microkit entrypoint returns.
884+
829885
## `seL4_CPtr microkit_cspace_root_slot_to_cptr(seL4_Word slot)` {#libmicrokit_cspace_root_slot_to_cptr}
830886

831887
Converts the slot identifier of the `<cspace>`'s capability element into an
@@ -938,6 +994,8 @@ The `protection_domain` element has the same attributes as any other protection
938994
* `id`: The ID of the child for the parent to refer to.
939995
* `setvar_id`: (optional) Specifies a symbol in the parent program image. This symbol will be rewritten with the ID of the child.
940996

997+
On x86-64, a PD with a VCPU cannot have child PDs.
998+
941999
The `virtual_machine` element has the following attributes:
9421000

9431001
* `name`: A unique name for the virtual machine
@@ -948,6 +1006,7 @@ The `virtual_machine` element has the following attributes:
9481006
Additionally, it supports the following child elements:
9491007

9501008
* `vcpu`: (one or more) Describes the virtual CPU that will be tied to the virtual machine.
1009+
* On x86-64, there is a limit of one VCPU per PD.
9511010
* `map`: (zero or more) Describes mapping of memory regions into the virtual machine.
9521011

9531012
The `vcpu` element has the following attributes:
@@ -1027,6 +1086,7 @@ The `end` element has the following attributes:
10271086
* `id`: Channel identifier in the context of the named protection domain. Must be at least 0 and less than 63.
10281087
* `pp`: (optional) Indicates that the protection domain for this end can perform a protected procedure call to the other end; defaults to false.
10291088
Protected procedure calls can only be to PDs of strictly higher priority.
1089+
On x86-64, PDs with virtual machines cannot receive protected procedure calls.
10301090
* `notify`: (optional) Indicates that the protection domain for this end can send a notification to the other end; defaults to true.
10311091
* `setvar_id`: (optional) Specifies a symbol in the program image. This symbol will be rewritten with the channel identifier.
10321092

libmicrokit/include/microkit.h

Lines changed: 97 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,9 @@
1010
#pragma once
1111

1212
#include <sel4/sel4.h>
13+
#ifdef CONFIG_VTX
14+
#include <sel4/arch/vmenter.h>
15+
#endif /* CONFIG_VTX */
1316

1417
typedef unsigned int microkit_channel;
1518
typedef unsigned int microkit_child;
@@ -47,6 +50,15 @@ extern char microkit_name[MICROKIT_PD_NAME_LENGTH];
4750
extern seL4_Bool microkit_have_signal;
4851
extern seL4_CPtr microkit_signal_cap;
4952
extern seL4_MessageInfo_t microkit_signal_msg;
53+
#if defined(CONFIG_VTX)
54+
struct microkit_x86_vcpu_state {
55+
seL4_Bool is_on;
56+
seL4_Word rip;
57+
seL4_Word prim_proc_ctl;
58+
seL4_Word irq_info;
59+
};
60+
extern struct microkit_x86_vcpu_state microkit_x86_vcpu_state;
61+
#endif
5062

5163
/* Symbols for error checking libmicrokit API calls. Patched by the Microkit tool
5264
* to set bits corresponding to valid channels for this PD. */
@@ -386,28 +398,86 @@ static inline seL4_Uint32 microkit_x86_ioport_read_32(microkit_ioport ioport_id,
386398

387399
return ret.result;
388400
}
389-
#endif
390401

391-
#if defined(CONFIG_ARCH_X86_64) && defined(CONFIG_VTX)
402+
#if defined(CONFIG_VTX)
403+
/* Architecturally defined identifiers for a x86 VCPU's VMCS fields,
404+
* see seL4 source: `include/arch/x86/arch/object/vcpu.h`
405+
* or Intel® 64 and IA-32 Architectures Software Developer’s Manual
406+
* Combined Volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D, and 4
407+
* Order Number: 325462-084US June 2024
408+
* "Table B-8. Encodings for 32-Bit Control Fields (0100_00xx_xxxx_xxx0B)" and
409+
* "Table B-14. Encodings for Natural-Width Guest-State Fields (0110_10xx_xxxx_xxx0B) (Contd.)".
410+
* */
411+
#define MICROKIT_VMX_GUEST_RIP 0x0000681E
412+
#define MICROKIT_VMX_CONTROL_PRIMARY_PROCESSOR_CONTROLS 0x00004002
413+
#define MICROKIT_VMX_CONTROL_ENTRY_INTERRUPTION_INFO 0x00004016
414+
392415
static inline seL4_Word microkit_vcpu_x86_read_vmcs(microkit_child vcpu, seL4_Word field)
393416
{
394-
seL4_X86_VCPU_ReadVMCS_t ret;
395-
ret = seL4_X86_VCPU_ReadVMCS(BASE_VCPU_CAP + vcpu, field);
396-
if (ret.error != seL4_NoError) {
397-
microkit_dbg_puts("microkit_x86_read_vmcs: error reading data\n");
398-
microkit_internal_crash(ret.error);
417+
/* This function assumes that a PD would only have access to 1 VCPU object.
418+
* For these fields:
419+
* - Guest RIP,
420+
* - Primary Processor Based VM Execution Controls, and
421+
* - VM Entry Interruption-Information
422+
* they will be read from local variables that were written
423+
* to by `microkit_vcpu_x86_write_vmcs()`. The kernel will service
424+
* reading from other fields.
425+
*/
426+
427+
seL4_Word value;
428+
switch (field) {
429+
case MICROKIT_VMX_GUEST_RIP:
430+
value = microkit_x86_vcpu_state.rip;
431+
break;
432+
case MICROKIT_VMX_CONTROL_PRIMARY_PROCESSOR_CONTROLS:
433+
value = microkit_x86_vcpu_state.prim_proc_ctl;
434+
break;
435+
case MICROKIT_VMX_CONTROL_ENTRY_INTERRUPTION_INFO:
436+
value = microkit_x86_vcpu_state.irq_info;
437+
break;
438+
default: {
439+
seL4_X86_VCPU_ReadVMCS_t ret = seL4_X86_VCPU_ReadVMCS(BASE_VCPU_CAP + vcpu, field);
440+
if (ret.error != seL4_NoError) {
441+
microkit_dbg_puts("microkit_x86_read_vmcs: error reading data\n");
442+
microkit_internal_crash(ret.error);
443+
}
444+
value = ret.value;
445+
break;
446+
}
399447
}
400448

401-
return ret.value;
449+
return value;
402450
}
403451

404452
static inline void microkit_vcpu_x86_write_vmcs(microkit_child vcpu, seL4_Word field, seL4_Word value)
405453
{
406-
seL4_X86_VCPU_WriteVMCS_t ret;
407-
ret = seL4_X86_VCPU_WriteVMCS(BASE_VCPU_CAP + vcpu, field, value);
408-
if (ret.error != seL4_NoError) {
409-
microkit_dbg_puts("microkit_x86_write_vmcs: error writing data\n");
410-
microkit_internal_crash(ret.error);
454+
/* Assumes that a PD would only have access to 1 VCPU object.
455+
* These fields will be written to local variables rather than the actual VMCS:
456+
* - RIP,
457+
* - Primary Processor Based VM Execution Controls, and
458+
* - VM Entry Interruption-Information.
459+
* Because they will be passed to the kernel on `seL4_VMEnter()`,
460+
* then the kernel will write them to the VMCS for us.
461+
* Writing other VMCS fields will go to the real VMCS immediately via a syscall.
462+
*/
463+
switch (field) {
464+
case MICROKIT_VMX_GUEST_RIP:
465+
microkit_x86_vcpu_state.rip = value;
466+
break;
467+
case MICROKIT_VMX_CONTROL_PRIMARY_PROCESSOR_CONTROLS:
468+
microkit_x86_vcpu_state.prim_proc_ctl = value;
469+
break;
470+
case MICROKIT_VMX_CONTROL_ENTRY_INTERRUPTION_INFO:
471+
microkit_x86_vcpu_state.irq_info = value;
472+
break;
473+
default: {
474+
seL4_X86_VCPU_WriteVMCS_t ret = seL4_X86_VCPU_WriteVMCS(BASE_VCPU_CAP + vcpu, field, value);
475+
if (ret.error != seL4_NoError) {
476+
microkit_dbg_puts("microkit_x86_write_vmcs: error writing data\n");
477+
microkit_internal_crash(ret.error);
478+
}
479+
break;
480+
}
411481
}
412482
}
413483

@@ -472,7 +542,20 @@ static inline void microkit_vcpu_x86_write_regs(microkit_child vcpu, seL4_VCPUCo
472542
}
473543
}
474544

475-
#endif
545+
static inline void microkit_vcpu_x86_on(void)
546+
{
547+
/* On x86, a TCB can only have one bound VCPU at any given time.
548+
* So we don't take a `microkit_child vcpu` here. */
549+
microkit_x86_vcpu_state.is_on = seL4_True;
550+
}
551+
552+
static inline void microkit_vcpu_x86_off(void)
553+
{
554+
microkit_x86_vcpu_state.is_on = seL4_False;
555+
}
556+
557+
#endif /* CONFIG_VTX */
558+
#endif /* CONFIG_ARCH_X86_64 */
476559

477560
static inline void microkit_deferred_notify(microkit_channel ch)
478561
{

libmicrokit/src/main.c

Lines changed: 66 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,9 @@
1717
#define PD_MASK 0xff
1818
#define CHANNEL_MASK 0x3f
1919

20+
#define BADGE_FAULT_BIT 62
21+
#define BADGE_ENDPOINT_BIT 63
22+
2023
/* All globals are prefixed with microkit_* to avoid clashes with user defined globals. */
2124

2225
bool microkit_passive;
@@ -27,6 +30,10 @@ seL4_Bool microkit_have_signal = seL4_False;
2730
seL4_CPtr microkit_signal_cap;
2831
seL4_MessageInfo_t microkit_signal_msg;
2932

33+
#if defined(CONFIG_VTX)
34+
struct microkit_x86_vcpu_state microkit_x86_vcpu_state;
35+
#endif /* CONFIG_VTX */
36+
3037
seL4_Word microkit_irqs;
3138
seL4_Word microkit_notifications;
3239
seL4_Word microkit_pps;
@@ -71,10 +78,49 @@ static void deferred_flush(void)
7178
}
7279
}
7380

81+
#if defined(CONFIG_VTX)
82+
static seL4_MessageInfo_t x86_vcpu_resume(seL4_Word *badge)
83+
{
84+
/* There is no seL4 invocation which combines a VMEnter and a non-blocking send.
85+
* Thus, we must perform any deferred signals from `microkit_deferred_notify()` /
86+
* `microkit_deferred_irq_ack()` before invoking VMEnter. */
87+
deferred_flush();
88+
89+
seL4_Word is_fault, fault_reason;
90+
struct microkit_x86_vcpu_state *s = &microkit_x86_vcpu_state;
91+
92+
x64_sys_send_recv(seL4_SysVMEnter, 0, badge, 0, &is_fault,
93+
&s->rip, &s->prim_proc_ctl, &s->irq_info, &fault_reason, 0);
94+
/* We want to follow the documented kernel behaviour where these 4 values are
95+
* always populated in the message registers. However, due to `setMR()`'s behaviour
96+
* in include/object/tcb.h of the kernel, these 4 values will only be set in
97+
* CPU registers. So we need to copy them to the IPC buffer to conform to the
98+
* documented kernel behaviour.
99+
*
100+
* The other reason why we want to manage these values is so that the user does not
101+
* have to worry about saving/updating them when servicing a notification. */
102+
microkit_mr_set(SEL4_VMENTER_CALL_EIP_MR, s->rip);
103+
microkit_mr_set(SEL4_VMENTER_CALL_CONTROL_PPC_MR, s->prim_proc_ctl);
104+
microkit_mr_set(SEL4_VMENTER_CALL_INTERRUPT_INFO_MR, s->irq_info);
105+
/* The 4th value (SEL4_VMENTER_FAULT_REASON_MR) is only valid when is_fault is true. */
106+
107+
/* Create a dummy msgInfo so that we can call `fault()`, as on x86 a VMExit is
108+
* not an IPC as on other architectures. */
109+
if (is_fault) {
110+
microkit_mr_set(SEL4_VMENTER_FAULT_REASON_MR, fault_reason);
111+
*badge |= 1ull << BADGE_FAULT_BIT;
112+
return seL4_MessageInfo_new(0, 0, 0, SEL4_VMENTER_NUM_FAULT_MSGS);
113+
} else {
114+
/* VCPU got interrupted due to a notification, no msgInfo. */
115+
return seL4_MessageInfo_new(0, 0, 0, 0);
116+
}
117+
}
118+
#endif
119+
74120
static void handler_loop(void)
75121
{
76122
bool have_reply = false;
77-
seL4_MessageInfo_t reply_tag;
123+
seL4_MessageInfo_t reply_tag = seL4_MessageInfo_new(0, 0, 0, 0);
78124

79125
/**
80126
* Because of https://github.com/seL4/seL4/issues/1536
@@ -97,7 +143,16 @@ static void handler_loop(void)
97143
seL4_Word badge;
98144
seL4_MessageInfo_t tag;
99145

146+
#if defined(CONFIG_VTX)
147+
if (microkit_x86_vcpu_state.is_on) {
148+
/* We should never have a reply message from the `protected()` endpoint,
149+
* as on x86 a PD with a bound vCPU cannot receive PPCs.*/
150+
// assert(!have_reply);
151+
tag = x86_vcpu_resume(&badge);
152+
} else if (have_reply) {
153+
#else
100154
if (have_reply) {
155+
#endif
101156
deferred_flush();
102157
tag = seL4_ReplyRecv(INPUT_CAP, reply_tag, &badge, REPLY_CAP);
103158
} else if (microkit_have_signal) {
@@ -107,13 +162,21 @@ static void handler_loop(void)
107162
tag = seL4_Recv(INPUT_CAP, &badge, REPLY_CAP);
108163
}
109164

110-
uint64_t is_endpoint = badge >> 63;
111-
uint64_t is_fault = (badge >> 62) & 1;
165+
uint64_t is_endpoint = badge >> BADGE_ENDPOINT_BIT;
166+
uint64_t is_fault = (badge >> BADGE_FAULT_BIT) & 1;
112167

113168
have_reply = false;
114169

115170
if (is_fault) {
116171
seL4_Bool reply_to_fault = fault(badge & PD_MASK, tag, &reply_tag);
172+
#if defined(CONFIG_VTX)
173+
/* If fault() returns false then we shouldn't resume the VCPU. */
174+
if (!reply_to_fault) {
175+
microkit_x86_vcpu_state.is_on = seL4_False;
176+
}
177+
/* There won't be anything to reply to for a VCPU fault. */
178+
reply_to_fault = seL4_False;
179+
#endif
117180
if (reply_to_fault) {
118181
have_reply = true;
119182
}

0 commit comments

Comments
 (0)