Skip to content

Commit d2cfea6

Browse files
khushit-shah-nxSasha Levin
authored andcommitted
KVM: x86: Add x2APIC "features" to control EOI broadcast suppression
[ Upstream commit 6517dfb ] Add two flags for KVM_CAP_X2APIC_API to allow userspace to control support for Suppress EOI Broadcasts when using a split IRQCHIP (I/O APIC emulated by userspace), which KVM completely mishandles. When x2APIC support was first added, KVM incorrectly advertised and "enabled" Suppress EOI Broadcast, without fully supporting the I/O APIC side of the equation, i.e. without adding directed EOI to KVM's in-kernel I/O APIC. That flaw was carried over to split IRQCHIP support, i.e. KVM advertised support for Suppress EOI Broadcasts irrespective of whether or not the userspace I/O APIC implementation supported directed EOIs. Even worse, KVM didn't actually suppress EOI broadcasts, i.e. userspace VMMs without support for directed EOI came to rely on the "spurious" broadcasts. KVM "fixed" the in-kernel I/O APIC implementation by completely disabling support for Suppress EOI Broadcasts in commit 0bcc3fb ("KVM: lapic: stop advertising DIRECTED_EOI when in-kernel IOAPIC is in use"), but didn't do anything to remedy userspace I/O APIC implementations. KVM's bogus handling of Suppress EOI Broadcast is problematic when the guest relies on interrupts being masked in the I/O APIC until well after the initial local APIC EOI. E.g. Windows with Credential Guard enabled handles interrupts in the following order: 1. Interrupt for L2 arrives. 2. L1 APIC EOIs the interrupt. 3. L1 resumes L2 and injects the interrupt. 4. L2 EOIs after servicing. 5. L1 performs the I/O APIC EOI. Because KVM EOIs the I/O APIC at step #2, the guest can get an interrupt storm, e.g. if the IRQ line is still asserted and userspace reacts to the EOI by re-injecting the IRQ, because the guest doesn't de-assert the line until step #4, and doesn't expect the interrupt to be re-enabled until step #5. Unfortunately, simply "fixing" the bug isn't an option, as KVM has no way of knowing if the userspace I/O APIC supports directed EOIs, i.e. suppressing EOI broadcasts would result in interrupts being stuck masked in the userspace I/O APIC due to step #5 being ignored by userspace. And fully disabling support for Suppress EOI Broadcast is also undesirable, as picking up the fix would require a guest reboot, *and* more importantly would change the virtual CPU model exposed to the guest without any buy-in from userspace. Add KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST and KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST flags to allow userspace to explicitly enable or disable support for Suppress EOI Broadcasts. This gives userspace control over the virtual CPU model exposed to the guest, as KVM should never have enabled support for Suppress EOI Broadcast without userspace opt-in. Not setting either flag will result in legacy quirky behavior for backward compatibility. Disallow fully enabling SUPPRESS_EOI_BROADCAST when using an in-kernel I/O APIC, as KVM's history/support is just as tragic. E.g. it's not clear that commit c806a6a ("KVM: x86: call irq notifiers with directed EOI") was entirely correct, i.e. it may have simply papered over the lack of Directed EOI emulation in the I/O APIC. Note, Suppress EOI Broadcasts is defined only in Intel's SDM, not in AMD's APM. But the bit is writable on some AMD CPUs, e.g. Turin, and KVM's ABI is to support Directed EOI (KVM's name) irrespective of guest CPU vendor. Fixes: 7543a63 ("KVM: x86: Add KVM exit for IOAPIC EOIs") Closes: https://lore.kernel.org/kvm/7D497EF1-607D-4D37-98E7-DAF95F099342@nutanix.com Cc: stable@vger.kernel.org Suggested-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Khushit Shah <khushit.shah@nutanix.com> Link: https://patch.msgid.link/20260123125657.3384063-1-khushit.shah@nutanix.com [sean: clean up minor formatting goofs and fix a comment typo] Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
1 parent c6db0f4 commit d2cfea6

7 files changed

Lines changed: 127 additions & 15 deletions

File tree

Documentation/virt/kvm/api.rst

Lines changed: 26 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7800,8 +7800,10 @@ Will return -EBUSY if a VCPU has already been created.
78007800

78017801
Valid feature flags in args[0] are::
78027802

7803-
#define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
7804-
#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)
7803+
#define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
7804+
#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)
7805+
#define KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST (1ULL << 2)
7806+
#define KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST (1ULL << 3)
78057807

78067808
Enabling KVM_X2APIC_API_USE_32BIT_IDS changes the behavior of
78077809
KVM_SET_GSI_ROUTING, KVM_SIGNAL_MSI, KVM_SET_LAPIC, and KVM_GET_LAPIC,
@@ -7814,6 +7816,28 @@ as a broadcast even in x2APIC mode in order to support physical x2APIC
78147816
without interrupt remapping. This is undesirable in logical mode,
78157817
where 0xff represents CPUs 0-7 in cluster 0.
78167818

7819+
Setting KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST instructs KVM to enable
7820+
Suppress EOI Broadcasts. KVM will advertise support for Suppress EOI
7821+
Broadcast to the guest and suppress LAPIC EOI broadcasts when the guest
7822+
sets the Suppress EOI Broadcast bit in the SPIV register. This flag is
7823+
supported only when using a split IRQCHIP.
7824+
7825+
Setting KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST disables support for
7826+
Suppress EOI Broadcasts entirely, i.e. instructs KVM to NOT advertise
7827+
support to the guest.
7828+
7829+
Modern VMMs should either enable KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST
7830+
or KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST. If not, legacy quirky
7831+
behavior will be used by KVM: in split IRQCHIP mode, KVM will advertise
7832+
support for Suppress EOI Broadcasts but not actually suppress EOI
7833+
broadcasts; for in-kernel IRQCHIP mode, KVM will not advertise support for
7834+
Suppress EOI Broadcasts.
7835+
7836+
Setting both KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST and
7837+
KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST will fail with an EINVAL error,
7838+
as will setting KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST without a split
7839+
IRCHIP.
7840+
78177841
7.8 KVM_CAP_S390_USER_INSTR0
78187842
----------------------------
78197843

arch/x86/include/asm/kvm_host.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1229,6 +1229,12 @@ enum kvm_irqchip_mode {
12291229
KVM_IRQCHIP_SPLIT, /* created with KVM_CAP_SPLIT_IRQCHIP */
12301230
};
12311231

1232+
enum kvm_suppress_eoi_broadcast_mode {
1233+
KVM_SUPPRESS_EOI_BROADCAST_QUIRKED, /* Legacy behavior */
1234+
KVM_SUPPRESS_EOI_BROADCAST_ENABLED, /* Enable Suppress EOI broadcast */
1235+
KVM_SUPPRESS_EOI_BROADCAST_DISABLED /* Disable Suppress EOI broadcast */
1236+
};
1237+
12321238
struct kvm_x86_msr_filter {
12331239
u8 count;
12341240
bool default_allow:1;
@@ -1480,6 +1486,7 @@ struct kvm_arch {
14801486

14811487
bool x2apic_format;
14821488
bool x2apic_broadcast_quirk_disabled;
1489+
enum kvm_suppress_eoi_broadcast_mode suppress_eoi_broadcast_mode;
14831490

14841491
bool has_mapped_host_mmio;
14851492
bool guest_can_read_msr_platform_info;

arch/x86/include/uapi/asm/kvm.h

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -913,8 +913,10 @@ struct kvm_sev_snp_launch_finish {
913913
__u64 pad1[4];
914914
};
915915

916-
#define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
917-
#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)
916+
#define KVM_X2APIC_API_USE_32BIT_IDS _BITULL(0)
917+
#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK _BITULL(1)
918+
#define KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST _BITULL(2)
919+
#define KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST _BITULL(3)
918920

919921
struct kvm_hyperv_eventfd {
920922
__u32 conn_id;

arch/x86/kvm/ioapic.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -561,7 +561,7 @@ static void kvm_ioapic_update_eoi_one(struct kvm_vcpu *vcpu,
561561
spin_lock(&ioapic->lock);
562562

563563
if (trigger_mode != IOAPIC_LEVEL_TRIG ||
564-
kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
564+
kvm_lapic_suppress_eoi_broadcast(apic))
565565
return;
566566

567567
ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);

arch/x86/kvm/lapic.c

Lines changed: 68 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,63 @@ bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector)
105105
apic_test_vector(vector, apic->regs + APIC_IRR);
106106
}
107107

108+
static bool kvm_lapic_advertise_suppress_eoi_broadcast(struct kvm *kvm)
109+
{
110+
switch (kvm->arch.suppress_eoi_broadcast_mode) {
111+
case KVM_SUPPRESS_EOI_BROADCAST_ENABLED:
112+
return true;
113+
case KVM_SUPPRESS_EOI_BROADCAST_DISABLED:
114+
return false;
115+
case KVM_SUPPRESS_EOI_BROADCAST_QUIRKED:
116+
/*
117+
* The default in-kernel I/O APIC emulates the 82093AA and does not
118+
* implement an EOI register. Some guests (e.g. Windows with the
119+
* Hyper-V role enabled) disable LAPIC EOI broadcast without
120+
* checking the I/O APIC version, which can cause level-triggered
121+
* interrupts to never be EOI'd.
122+
*
123+
* To avoid this, KVM doesn't advertise Suppress EOI Broadcast
124+
* support when using the default in-kernel I/O APIC.
125+
*
126+
* Historically, in split IRQCHIP mode, KVM always advertised
127+
* Suppress EOI Broadcast support but did not actually suppress
128+
* EOIs, resulting in quirky behavior.
129+
*/
130+
return !ioapic_in_kernel(kvm);
131+
default:
132+
WARN_ON_ONCE(1);
133+
return false;
134+
}
135+
}
136+
137+
bool kvm_lapic_suppress_eoi_broadcast(struct kvm_lapic *apic)
138+
{
139+
struct kvm *kvm = apic->vcpu->kvm;
140+
141+
if (!(kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI))
142+
return false;
143+
144+
switch (kvm->arch.suppress_eoi_broadcast_mode) {
145+
case KVM_SUPPRESS_EOI_BROADCAST_ENABLED:
146+
return true;
147+
case KVM_SUPPRESS_EOI_BROADCAST_DISABLED:
148+
return false;
149+
case KVM_SUPPRESS_EOI_BROADCAST_QUIRKED:
150+
/*
151+
* Historically, in split IRQCHIP mode, KVM ignored the suppress
152+
* EOI broadcast bit set by the guest and broadcasts EOIs to the
153+
* userspace I/O APIC. For In-kernel I/O APIC, the support itself
154+
* is not advertised, can only be enabled via KVM_SET_APIC_STATE,
155+
* and KVM's I/O APIC doesn't emulate Directed EOIs; but if the
156+
* feature is enabled, it is respected (with odd behavior).
157+
*/
158+
return ioapic_in_kernel(kvm);
159+
default:
160+
WARN_ON_ONCE(1);
161+
return false;
162+
}
163+
}
164+
108165
__read_mostly DEFINE_STATIC_KEY_FALSE(kvm_has_noapic_vcpu);
109166
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_has_noapic_vcpu);
110167

@@ -554,15 +611,9 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu)
554611

555612
v = APIC_VERSION | ((apic->nr_lvt_entries - 1) << 16);
556613

557-
/*
558-
* KVM emulates 82093AA datasheet (with in-kernel IOAPIC implementation)
559-
* which doesn't have EOI register; Some buggy OSes (e.g. Windows with
560-
* Hyper-V role) disable EOI broadcast in lapic not checking for IOAPIC
561-
* version first and level-triggered interrupts never get EOIed in
562-
* IOAPIC.
563-
*/
614+
564615
if (guest_cpu_cap_has(vcpu, X86_FEATURE_X2APIC) &&
565-
!ioapic_in_kernel(vcpu->kvm))
616+
kvm_lapic_advertise_suppress_eoi_broadcast(vcpu->kvm))
566617
v |= APIC_LVR_DIRECTED_EOI;
567618
kvm_lapic_set_reg(apic, APIC_LVR, v);
568619
}
@@ -1517,6 +1568,15 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
15171568

15181569
/* Request a KVM exit to inform the userspace IOAPIC. */
15191570
if (irqchip_split(apic->vcpu->kvm)) {
1571+
/*
1572+
* Don't exit to userspace if the guest has enabled Directed
1573+
* EOI, a.k.a. Suppress EOI Broadcasts, in which case the local
1574+
* APIC doesn't broadcast EOIs (the guest must EOI the target
1575+
* I/O APIC(s) directly).
1576+
*/
1577+
if (kvm_lapic_suppress_eoi_broadcast(apic))
1578+
return;
1579+
15201580
apic->vcpu->arch.pending_ioapic_eoi = vector;
15211581
kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, apic->vcpu);
15221582
return;

arch/x86/kvm/lapic.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -231,6 +231,8 @@ static inline int kvm_lapic_latched_init(struct kvm_vcpu *vcpu)
231231

232232
bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
233233

234+
bool kvm_lapic_suppress_eoi_broadcast(struct kvm_lapic *apic);
235+
234236
void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu);
235237

236238
void kvm_bitmap_or_dest_vcpus(struct kvm *kvm, struct kvm_lapic_irq *irq,

arch/x86/kvm/x86.c

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -121,8 +121,10 @@ static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
121121

122122
#define KVM_CAP_PMU_VALID_MASK KVM_PMU_CAP_DISABLE
123123

124-
#define KVM_X2APIC_API_VALID_FLAGS (KVM_X2APIC_API_USE_32BIT_IDS | \
125-
KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK)
124+
#define KVM_X2APIC_API_VALID_FLAGS (KVM_X2APIC_API_USE_32BIT_IDS | \
125+
KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK | \
126+
KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST | \
127+
KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST)
126128

127129
static void update_cr8_intercept(struct kvm_vcpu *vcpu);
128130
static void process_nmi(struct kvm_vcpu *vcpu);
@@ -4966,6 +4968,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
49664968
break;
49674969
case KVM_CAP_X2APIC_API:
49684970
r = KVM_X2APIC_API_VALID_FLAGS;
4971+
if (kvm && !irqchip_split(kvm))
4972+
r &= ~KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST;
49694973
break;
49704974
case KVM_CAP_NESTED_STATE:
49714975
r = kvm_x86_ops.nested_ops->get_state ?
@@ -6783,11 +6787,24 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
67836787
if (cap->args[0] & ~KVM_X2APIC_API_VALID_FLAGS)
67846788
break;
67856789

6790+
if ((cap->args[0] & KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST) &&
6791+
(cap->args[0] & KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST))
6792+
break;
6793+
6794+
if ((cap->args[0] & KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST) &&
6795+
!irqchip_split(kvm))
6796+
break;
6797+
67866798
if (cap->args[0] & KVM_X2APIC_API_USE_32BIT_IDS)
67876799
kvm->arch.x2apic_format = true;
67886800
if (cap->args[0] & KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK)
67896801
kvm->arch.x2apic_broadcast_quirk_disabled = true;
67906802

6803+
if (cap->args[0] & KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST)
6804+
kvm->arch.suppress_eoi_broadcast_mode = KVM_SUPPRESS_EOI_BROADCAST_ENABLED;
6805+
if (cap->args[0] & KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST)
6806+
kvm->arch.suppress_eoi_broadcast_mode = KVM_SUPPRESS_EOI_BROADCAST_DISABLED;
6807+
67916808
r = 0;
67926809
break;
67936810
case KVM_CAP_X86_DISABLE_EXITS:

0 commit comments

Comments
 (0)