All notable changes to the Sub-Microsecond Execution Engine will be documented in this file.
- Jitter Profiler & Stall Detector: New
JitterProfilerclass to detect micro-stalls caused by OS interrupts (SMI, Context Switches) in the busy-wait loop.- Why it helps: "Mean latency" metrics hide tail events. HFT requires consistent variance (jitter). This tool flags if the isolated core is truly isolated or being interrupted.
- Explicit L1 Cache Prefetching: Added
prefetch_L1andprefetch_next_lineutilities.- Why it helps: Allows hot-path loops to pull the next 64-byte cache line into L1 cache before it's needed, hiding the ~80ns DRAM latency.
- Cache Warming & Hot-Path Optimization: Implemented a 50,000-iteration "Warm-Up Phase" before market data ingestion starts.
- Why it helps: Pre-faults instruction pages into i-cache, populates the TLB, and trains CPU branch predictors. This ensures the very first market tick is processed with fully optimized CPU state, eliminating "cold start" latency spikes.
-
SIMD-Accelerated Alpha Extraction: Replaced scalar feature extraction loops with hand-optimized AVX2 (x86) and NEON (ARM) intrinsics within
simd_features.hpp.-
Why it helps: Shifts the "Order Flow Imbalance" (OFI) and volume imbalance calculations from sequential
$O(Depth)$ to effectively constant hardware-parallel time. This mimics FPGA-like pipelining in software, directly reducing the critical path latency for signal generation.
-
Why it helps: Shifts the "Order Flow Imbalance" (OFI) and volume imbalance calculations from sequential
- Vectorized Multi-Kernel Hawkes Engine: Implemented a SIMD-accelerated engine supporting 4 simultaneous kernels.
- Why it helps: Markets react at different speeds; this captures both microsecond liquidity bursts and millisecond price trends simultaneously, significantly improving signal accuracy without increasing latency.
- Engagement Framework: Added
docs/ENGAGEMENTS.mdand GitHub issue templates.- Why it helps: Provides a structured pathway for institutional partners and research labs to request custom hardware (FPGA/NIC) integrations or collaborative research without public disclosure.
- Build System: Updated
scripts/build_all.shwith automatic macOS Homebrew path detection.- Why it helps: Streamlines developer onboarding and ensures seamless local testing on modern Apple Silicon (ARM64) research environments.
- Documentation: Simplified README and organized commercial support sections.
- 890ns End-to-End Latency: Optimized decision pipeline down to 890ns median.
- Institutional Logging: Implemented 7-layer professional logging (NIC, TSC, PTP, etc.).
- Deterministic Replay: Bit-identical verification with SHA-256 manifests.
- Custom NIC Drivers: Support for Intel X710 and Mellanox ConnectX hardware mapping.
- Solarflare ef_vi: Integration for X2522/X2542 series.
- Kernel Bypass: Lock-free DPDK-style interface.
- AVX-512 SIMD: Vectorized OFI computation (40ns).
- Vectorized ML: FPGA-simulated inference pipeline (400ns fixed latency).
- Initial High-Frequency Trading skeleton.
- Hawkes Process Engine (Single-Kernel).
- Avellaneda-Stoikov Market Making strategy.
- Lock-Free SPSC Queues and Atomic Risk Controls.