|
7 | 7 |
|
8 | 8 | ## Overview |
9 | 9 |
|
10 | | -Stage 5 is the PE emulation engine -- a full CPU emulator embedded within mpengine.dll that executes PE files in a sandboxed virtual environment. The emulator interprets x86, x64, and ARM instructions, provides 198 emulated Windows API handlers, loads 973 virtual DLLs (VDLLs) into a synthetic address space, and records behavioral telemetry (FOP opcode traces and API call logs) for signature matching. |
| 10 | +Stage 5 is the PE emulation engine -- a full CPU emulator embedded within mpengine.dll that executes PE files in a sandboxed virtual environment. The emulator interprets x86, x64, and ARM instructions, provides 198 emulated Windows API handlers (including CryptAPI and BCrypt), loads 973 virtual DLLs (VDLLs) into a synthetic address space, and records behavioral telemetry (FOP opcode traces and API call logs) for signature matching. The emulator runs with a default budget of **5 million instructions** (configurable), processing in batches of ~1,000. |
11 | 11 |
|
12 | | -The emulator's primary purpose is **dynamic unpacking**: many malware samples encrypt or compress their payloads and only reveal the real code at runtime. By emulating execution, Defender can observe the decrypted payload and scan it through the full pipeline recursively (Stage 6). |
| 12 | +The emulator's primary purpose is **dynamic unpacking**: many malware samples encrypt or compress their payloads and only reveal the real code at runtime. By emulating execution, Defender can observe the decrypted payload and scan it through the full pipeline recursively (Stage 6). Full exception handling support (VEH, SEH chain walking, x64 table-driven RUNTIME_FUNCTION dispatch) ensures packed malware that uses SEH-based control flow transfer is handled correctly. |
13 | 13 |
|
14 | 14 | ### Key RTTI Classes from the Binary |
15 | 15 |
|
@@ -234,68 +234,197 @@ Emulated code continues at return address |
234 | 234 |
|
235 | 235 | ### Instruction Processing Loop |
236 | 236 |
|
237 | | -The core emulation loop fetches, decodes, and executes one instruction at a time: |
| 237 | +The core emulation loop processes instructions in batches of approximately 1,000, checking |
| 238 | +control conditions between each batch: |
238 | 239 |
|
239 | 240 | ``` |
240 | 241 | Pseudocode: |
241 | 242 | ───────────────────────────────────────────────────────────────────────── |
242 | 243 |
|
243 | | -fn emulate_main_loop(ctx: &mut EmuContext) -> ScanResult { |
244 | | - let mut insn_count: u32 = 0; |
245 | | - let max_instructions: u32 = 500_000; // Hard limit |
| 244 | +emulate_main_loop(ctx): |
| 245 | + insn_count = 0 |
| 246 | + max_instructions = 5,000,000 // Default budget (configurable via DBVAR) |
| 247 | + batch_size = 1,000 |
246 | 248 |
|
247 | | - loop { |
248 | | - // Fetch instruction at current EIP |
249 | | - let eip = ctx.regs.eip; |
| 249 | + loop: |
| 250 | + // Execute a batch of instructions |
| 251 | + execute_batch(ctx, batch_size) |
| 252 | + insn_count += batch_size |
250 | 253 |
|
251 | 254 | // Check stop sentinel |
252 | | - if eip == 0xDEADBEEF { |
253 | | - break; // Normal termination |
254 | | - } |
| 255 | + if EIP == 0xDEADBEEF: |
| 256 | + break // Normal termination (return address sentinel) |
255 | 257 |
|
256 | | - // Decode instruction |
257 | | - let insn = decode_instruction(ctx.memory, eip); |
258 | | -
|
259 | | - // Check instruction limit |
260 | | - insn_count += 1; |
261 | | - if insn_count >= max_instructions { |
| 258 | + // Check instruction budget |
| 259 | + if insn_count >= max_instructions: |
262 | 260 | // "abort: execution limit met (%u instructions)" |
263 | 261 | // @ 0x109334D8 |
264 | | - break; |
265 | | - } |
266 | | -
|
267 | | - // Execute instruction |
268 | | - match insn.opcode_type { |
269 | | - DASM_OPTYPE_FPU_RM => { |
270 | | - // Route to FPU_* export function |
271 | | - // String: "DASM_OPTYPE_FPU_RM" @ 0x109815DC |
272 | | - execute_fpu_instruction(ctx, &insn); |
273 | | - } |
274 | | - _ => execute_general_instruction(ctx, &insn), |
275 | | - } |
276 | | -
|
277 | | - // Check for API trampoline hit |
278 | | - if eip >= 0x7FFE0000 && eip < 0x7FFF0000 { |
279 | | - let api_index = (eip - 0x7FFE0000) / TRAMPOLINE_STRIDE; |
280 | | - handle_api_call(ctx, api_index); |
281 | | - } |
282 | | -
|
283 | | - // Update EIP |
284 | | - ctx.regs.eip = insn.next_eip; |
285 | | - } |
286 | | -
|
287 | | - return ctx.scan_result; |
288 | | -} |
| 262 | + break |
| 263 | +
|
| 264 | + // Check for API trampoline hit (0F FF F0 opcode at current IP) |
| 265 | + if [EIP] == 0x0F 0xFF 0xF0: |
| 266 | + api_id = EAX |
| 267 | + dispatch_api_handler(ctx, api_id) |
| 268 | +
|
| 269 | + // Check for direct syscall (0F 05 = SYSCALL, 0F 34 = SYSENTER) |
| 270 | + if [EIP] == 0x0F 0x05 or [EIP] == 0x0F 0x34: |
| 271 | + dispatch_syscall(ctx, EAX) |
| 272 | +
|
| 273 | + // Self-modifying code: flush translation cache if code regions were written |
| 274 | + if code_region_written: |
| 275 | + flush_translation_cache() |
| 276 | +
|
| 277 | + // FPU instruction: route to exported FPU_* handler |
| 278 | + if opcode_type == DASM_OPTYPE_FPU_RM: |
| 279 | + // "DASM_OPTYPE_FPU_RM" @ 0x109815DC |
| 280 | + execute_fpu_instruction(ctx, insn) |
289 | 281 | ``` |
290 | 282 |
|
291 | 283 | ### Execution Limits |
292 | 284 |
|
293 | 285 | | Limit | Value | String/Source | |
294 | 286 | |-------|-------|---------------| |
295 | | -| Max instructions per run | 500,000 | `"abort: execution limit met (%u instructions)"` @ `0x109334D8` | |
296 | | -| Infinite loop detection | configurable | `"Infinite loop detected (more that %d instructions executed)"` @ `0x10983320` | |
| 287 | +| Max instructions per run | 5,000,000 | `"abort: execution limit met (%u instructions)"` @ `0x109334D8` | |
| 288 | +| Instruction batch size | ~1,000 | Between-batch control checks | |
| 289 | +| Fopclog max entries | 8,192 | First-opcode-byte recording cap | |
| 290 | +| Max SEH dispatches | 64 | Prevents infinite exception loops | |
| 291 | +| TLS callback budget | 50,000 per callback | Budget before main entry point | |
| 292 | +| DllMain budget | 10,000 per VDLL | Budget for VDLL initialization | |
| 293 | +| Tight loop detection | 50,000 insns without API call | Anti-analysis delay loop detection | |
| 294 | +| Consecutive error limit | 3 | Unhandled exception termination | |
| 295 | + |
| 296 | +*(from RE of mpengine.dll -- execution limit strings and emulator control flow)* |
| 297 | + |
| 298 | +--- |
| 299 | + |
| 300 | +## Exception Handling |
| 301 | + |
| 302 | +The emulator supports three exception handling mechanisms, checked in priority order: |
| 303 | + |
| 304 | +### VEH (Vectored Exception Handlers) |
| 305 | + |
| 306 | +VEH handlers registered via `AddVectoredExceptionHandler` are checked **before** the SEH chain |
| 307 | +on x86. Dispatch builds `EXCEPTION_POINTERS { ExceptionRecord*, ContextRecord* }` on the emulated |
| 308 | +stack and calls the handler. Return value `0xFFFFFFFF` (`EXCEPTION_CONTINUE_EXECUTION`) resumes |
| 309 | +execution; `0` (`EXCEPTION_CONTINUE_SEARCH`) tries the next handler. |
| 310 | + |
| 311 | +### SEH (x86 Structured Exception Handling) |
| 312 | + |
| 313 | +The SEH chain is walked from `TEB[0x00]` (FS:[0]). Up to 32 frames are walked. For each handler: |
| 314 | +1. Builds `EXCEPTION_RECORD` (80 bytes) and `CONTEXT` (716 bytes) on the emulated stack |
| 315 | +2. Calls handler with arguments: `(ExceptionRecord*, EstablisherFrame*, ContextRecord*, DispatcherContext*)` |
| 316 | +3. Sets return address to SEH return sentinel (`0xDEADC0DE`) |
| 317 | +4. Handler return value `0` = continue execution; `1` = continue search |
| 318 | + |
| 319 | +### x64 Table-Driven Exception Handling |
| 320 | + |
| 321 | +x64 uses `RUNTIME_FUNCTION` entries parsed from the PE's exception directory (data directory 3): |
| 322 | +1. Binary-searches the sorted `RUNTIME_FUNCTION` table for the faulting RIP |
| 323 | +2. Reads `UNWIND_INFO` at the entry's `UnwindInfoAddress` |
| 324 | +3. Checks for `UNW_FLAG_EHANDLER` (1) or `UNW_FLAG_UHANDLER` (2) flags |
| 325 | +4. Reads handler RVA from after the unwind codes array |
| 326 | +5. Sets up x64 fastcall call: RCX=ExceptionRecord*, RDX=EstablisherFrame, R8=ContextRecord* |
| 327 | + |
| 328 | +--- |
| 329 | + |
| 330 | +## TEB/PEB Environment Setup |
| 331 | + |
| 332 | +The emulator constructs a realistic Windows process environment that defeats common sandbox |
| 333 | +detection techniques used by malware. |
| 334 | + |
| 335 | +### Segment Configuration |
| 336 | + |
| 337 | +- **x86**: FS segment base → TEB at `0x00020000` |
| 338 | +- **x64**: GS segment base → TEB at `0x00020000` |
| 339 | + |
| 340 | +### Process Parameters (Fake Environment) |
| 341 | + |
| 342 | +``` |
| 343 | +Key TEB/PEB fields: |
| 344 | + FS:[0x18] / GS:[0x30] Self-pointer (TEB address) |
| 345 | + FS:[0x30] / GS:[0x60] PEB pointer |
| 346 | + PEB.BeingDebugged = 0 (anti-debug) |
| 347 | + PEB.NtGlobalFlag = 0 (anti-debug) |
| 348 | + PEB.ImageBaseAddress = loaded PE base |
| 349 | + PEB.Ldr = PEB_LDR_DATA (module list) |
| 350 | + PEB.ProcessParameters = RTL_USER_PROCESS_PARAMETERS |
| 351 | +
|
| 352 | +Process Parameters: |
| 353 | + ComputerName: HAL9TH (not "DESKTOP-...", matches mpengine default) |
| 354 | + UserName: JohnDoe (not "admin" or "malware") |
| 355 | + ImagePath: C:\Users\JohnDoe\Desktop\target.exe |
| 356 | + CurrentDir: C:\Windows\System32\ |
| 357 | + SystemRoot: C:\Windows |
| 358 | + TEMP: C:\Windows\Temp |
| 359 | +``` |
| 360 | + |
| 361 | +The PEB_LDR_DATA maintains three doubly-linked module lists (`InLoadOrderModuleList`, |
| 362 | +`InMemoryOrderModuleList`, `InInitializationOrderModuleList`) populated with the target PE |
| 363 | +and loaded VDLLs. Malware that walks these lists for DLL enumeration sees a realistic module chain. |
| 364 | + |
| 365 | +--- |
| 366 | + |
| 367 | +## Cryptographic API Emulation |
| 368 | + |
| 369 | +### CryptAPI (ADVAPI32.DLL) |
| 370 | + |
| 371 | +The emulator tracks cryptographic state (hash objects, key objects) for operations including: |
| 372 | +- `CryptAcquireContext` / `CryptReleaseContext` -- provider management |
| 373 | +- `CryptCreateHash` / `CryptHashData` / `CryptGetHashParam` -- MD5, SHA-1, SHA-256 hashing |
| 374 | +- `CryptDeriveKey` / `CryptGenKey` / `CryptImportKey` -- key management |
| 375 | +- `CryptDecrypt` / `CryptEncrypt` -- RC4 stream cipher, AES-CBC/ECB block cipher |
| 376 | +- `CryptSetKeyParam` -- IV and cipher mode configuration |
| 377 | + |
| 378 | +### BCrypt (BCRYPT.DLL) |
| 379 | + |
| 380 | +Modern CNG API support: |
| 381 | +- `BCryptOpenAlgorithmProvider` -- AES, RC4, SHA-256, etc. |
| 382 | +- `BCryptGenerateSymmetricKey` -- key import/generation |
| 383 | +- `BCryptDecrypt` / `BCryptEncrypt` -- block/stream cipher operations |
| 384 | + |
| 385 | +This enables the emulator to observe malware that decrypts its payload using Windows crypto APIs |
| 386 | +before executing it. |
| 387 | + |
| 388 | +--- |
| 389 | + |
| 390 | +## Memory Tracking and Content Extraction |
| 391 | + |
| 392 | +### Dirty Page Tracking |
| 393 | + |
| 394 | +A memory write hook records every written page address (page-aligned) during emulation. This |
| 395 | +identifies which memory regions were modified by the emulated code. |
| 396 | + |
| 397 | +### Self-Modifying Code Detection |
| 398 | + |
| 399 | +PE section address ranges are registered as "code regions." When a write targets any of these |
| 400 | +ranges, the translation block cache is invalidated at the next batch boundary, ensuring |
| 401 | +self-modified code executes correctly. |
| 402 | + |
| 403 | +### Unpacked Content Extraction |
| 404 | + |
| 405 | +After emulation completes, modified memory is collected: |
| 406 | +1. **PE sections**: All sections are read back; sections with >16 non-zero bytes are included |
| 407 | +2. **Dirty pages outside PE**: Pages not in PE sections, stack, TEB, or trampoline regions |
| 408 | + are coalesced into contiguous regions (capped at 1MB per region) |
| 409 | +3. **Embedded PE scan**: Extracted data is scanned for `MZ` + `PE\0\0` signatures to find |
| 410 | + unpacked PE payloads |
| 411 | + |
| 412 | +### Dropped File Collection |
| 413 | + |
| 414 | +Files created during emulation are collected from two sources: |
| 415 | +1. **VFS write tracking**: Files added via `CreateFileW` / `WriteFile` during emulation |
| 416 | +2. **Object manager**: Writable file handles with non-empty data |
| 417 | + |
| 418 | +All extracted content is fed back through the full scan pipeline at Stage 6 (Unpacked Content). |
| 419 | + |
| 420 | +--- |
| 421 | + |
| 422 | +## APC Draining |
297 | 423 |
|
298 | | -*(from RE of mpengine.dll -- execution limit strings)* |
| 424 | +When `NtQueueApcThread` is called during emulation, APC routines are queued. When the main |
| 425 | +emulation loop reaches the stop sentinel or instruction budget, any pending APCs are drained |
| 426 | +(each queued routine is called with its arguments) before termination. This handles malware |
| 427 | +that uses APC injection to execute unpacking code. |
299 | 428 |
|
300 | 429 | --- |
301 | 430 |
|
@@ -461,13 +590,19 @@ VFS-dropped files are extracted after emulation and fed back through the scan pi |
461 | 590 | | FPU export functions | 67 | |
462 | 591 | | SSE export functions | 1 (SSE_convert) | |
463 | 592 | | Emulated WinAPI handlers | 198 | |
464 | | -| Virtual DLLs (VDLLs) | 973 | |
465 | | -| Max instructions per run | 500,000 | |
| 593 | +| Virtual DLLs (VDLLs) | 973 (750 x86 + 195 x64 + 18 ARM + 10 MSIL) | |
| 594 | +| Max instructions per run | 5,000,000 (configurable via DBVAR) | |
| 595 | +| Instruction batch size | ~1,000 | |
| 596 | +| Fopclog max entries | 8,192 | |
| 597 | +| Max SEH dispatches | 64 | |
| 598 | +| TLS callback budget | 50,000 per callback | |
| 599 | +| DllMain budget | 10,000 per VDLL | |
466 | 600 | | FOP behavioral rules | 4,601 | |
467 | 601 | | TUNNEL signature variants | 4 (x86, x64, ARM, ARM64) | |
468 | 602 | | THREAD signature variants | 4 (x86, x64, ARM, ARM64) | |
469 | 603 | | PE analysis attributes | 302 (`pea_*`) | |
470 | 604 | | Emulator RTTI classes | 3 (x86, base, ARM) | |
| 605 | +| Crypto support | CryptAPI (MD5/SHA/AES/RC4) + BCrypt | |
471 | 606 |
|
472 | 607 | --- |
473 | 608 |
|
|
0 commit comments