Skip to content

Commit b95a5bb

Browse files
authored
Merge pull request #651 from NotRequiem/main
Time debt resampling detection
2 parents f90ac38 + 87a7145 commit b95a5bb

File tree

1 file changed

+62
-7
lines changed

1 file changed

+62
-7
lines changed

src/vmaware.hpp

Lines changed: 62 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -5141,9 +5141,27 @@ struct VM {
51415141
* The hypervisor must return the time debt between the first loop and the second loop: always after step 3 and always before step 6
51425142
* To counter this, VMAware simply keeps track of latency between iterations, so no matter when the hypervisor restores the time debt, it sees the hidden latency
51435143
*
5144+
* - Statistical Check -
5145+
* Now, they might try to downscale TSC sometimes, and pay the debt much later, so for example, when vmaware runs cpuid, it sees:
5146+
* fast fast fast fast fast fast fast fast fast fast fast fast fast fast slow (time debt of all previous iterations is paid here in slow)
5147+
* VMAware sees: 94% fast, 6% slow, thinks 6% slow are normal kernel noise, and discards them, so it gets bypassed...
5148+
*
5149+
* Thus, VMAware when detects a spike that seems to be a time debt payment, it redistributes them into previous samples, so it becomes:
5150+
* before redistribution:
5151+
* 100 100 100 100 100 100 100 100 100 100 100 100 100 100 1000
5152+
*
5153+
* after redistribution:
5154+
* 166 166 166 166 166 166 166 166 166 166 166 166 166 166 166
5155+
*
5156+
* After the hypervisor have paid all the time debt, ALL the hidden latency will be redistributed into all samples, and will mathematically always cross the latency threshold
5157+
* They might try to keep playing with the time debt redistribution, like not paying 20-30% of the time debt, but thresholds in VMAware's code (and the excesive number of iterations)
5158+
* are calculated in purpose so that it's mathematically impossible (even with a patch that only downscales 1 cycle) to contaminate samples selectively so that it doesn't trigger any check
5159+
* If a hypervisor doesn't pay sometimes the time debt (enough so that sample redistribution is not larger than the latency threshold), it trips the local or global ratio check
5160+
* and if you do pay the time debt but at any interval (lets say at loop iteration 15, or 32, or 47...), you trip the cpuid latency check
5161+
*
51445162
* So, now what they can do?
51455163
* - Low Latency Check -
5146-
* If they cant do technique 1 (hiding the latency), they might attempt technique 2 (making the VM as fast as possible)
5164+
* If they can't do technique 1 (hiding the latency), they might attempt technique 2 (making the VM as fast as possible)
51475165
* Remember that we use the cpuid instruction, which always haves high latency in a VM, which return results containing info about the CPU
51485166
*
51495167
* To do this, they might cache results and give them back instantly when cpuid is executed, or recode the whole kernel to just make it handle the cpuid quickly
@@ -5516,12 +5534,20 @@ struct VM {
55165534
u64 acc = 0;
55175535
size_t idx = 0;
55185536

5537+
// Track sparse latency spikes that may represent "time debt repayment".
5538+
// Hypervisors hiding VMEXIT cost often subtract cycles on most CPUID calls
5539+
// and restore them periodically, producing rare but large spikes.
5540+
u64 last_spike_idx = 0;
5541+
u64 spike_count = 0;
5542+
55195543
// for each leaf do CPUID_ITER samples, then repeat
55205544
while (state.load(std::memory_order_acquire) != 2) {
55215545
for (size_t li = 0; li < n_leaves; ++li) {
5546+
55225547
const unsigned int leaf = leaves[li];
55235548

55245549
for (unsigned i = 0; i < CPUID_ITER; ++i) {
5550+
55255551
// read rdtsc and accumulate delta
55265552
const u64 now = __rdtsc();
55275553

@@ -5535,10 +5561,11 @@ struct VM {
55355561

55365562
// store latency if buffer has space
55375563
if (idx < samples.size()) {
5564+
55385565
u64 lat = cpuid(leaf);
55395566

5540-
// If a VMX Preemption Timer is delayed (firing after cpuid returns to bypass t3 inside our cpuid lambda),
5541-
// OR if the hypervisor intercepts RDTSC to restore the time debt instead of CPUID,
5567+
// If a VMX Preemption Timer is delayed (firing after cpuid returns to bypass t3 inside our cpuid lambda),
5568+
// OR if the hypervisor intercepts RDTSC to restore the time debt instead of CPUID,
55425569
// this outer total_overhead will include that hidden latency because it's at the end of this for loop
55435570
// if the hypervisor delays it even more, now will catch it, making the total_overhead check still detect it
55445571
// this means that no matter where the time debt is restored, VMAware will always be able to see the hidden latency
@@ -5551,21 +5578,49 @@ struct VM {
55515578
lat = total_overhead;
55525579
}
55535580

5581+
// If the hypervisor hides VMEXIT latency statistically, most CPUID calls will appear
5582+
// very fast while occasional spikes repay the hidden cycles. MAD filtering in calculate_latency() later
5583+
// will discard these spikes as outliers, so we detect and redistribute them here
5584+
if (lat > cycle_threshold) {
5585+
5586+
const size_t gap = idx - last_spike_idx;
5587+
last_spike_idx = idx;
5588+
spike_count++;
5589+
5590+
// If spikes appear periodically relative to CPUID frequency,
5591+
// treat them as time debt repayment instead of real interrupts
5592+
if (gap > 1 && gap < 64 && spike_count > 2) {
5593+
const u64 repay = lat / gap;
5594+
5595+
// distribute hidden cycles backwards across recent samples, exactly the ones were spikes didnt occur
5596+
for (size_t j = 1; j < gap && (idx >= j); ++j) {
5597+
samples[idx - j] += repay;
5598+
}
5599+
5600+
lat = repay;
5601+
}
5602+
}
5603+
55545604
samples[idx] = lat;
55555605
}
5606+
55565607
++idx;
55575608

55585609
// if thread 1 finished
5559-
if (state.load(std::memory_order_acquire) == 2) break;
5610+
if (state.load(std::memory_order_acquire) == 2)
5611+
break;
55605612
}
55615613

5562-
if (state.load(std::memory_order_acquire) == 2) break;
5614+
if (state.load(std::memory_order_acquire) == 2)
5615+
break;
55635616
}
55645617
}
55655618

55665619
// final rdtsc after detecting finish
55675620
const u64 final_now = __rdtsc();
5568-
if (final_now >= last) acc += (final_now - last);
5621+
if (final_now >= last)
5622+
acc += (final_now - last);
5623+
55695624
t2_end.store(acc, std::memory_order_release);
55705625
});
55715626

@@ -5614,7 +5669,7 @@ struct VM {
56145669
// this logic can be bypassed if the hypervisor downscales TSC in both cores, and that's precisely why we do now a Global Ratio
56155670
const double local_ratio = double(t2_delta) / double(t1_delta);
56165671

5617-
if (local_ratio < 0.95 || local_ratio > 1.05) {
5672+
if (local_ratio < 0.95) {
56185673
debug("TIMER: Detected a hypervisor intercepting TSC locally: ", local_ratio, "");
56195674
return true;
56205675
}

0 commit comments

Comments
 (0)