|
| 1 | +# **IEX Pipeline: From Exchange to Analysis** |
| 2 | + |
| 3 | +**Project Goal** |
| 4 | + |
| 5 | +The **IEX Pipeline** is a two-stage, high-performance data pipeline that bridges the gap between free exchange data and quantitative research. It combines a **Rust downloader** that fetches terabytes of market data with a **C++23 converter** that transforms raw PCAP packets into compressed, query-efficient HDF5 arrays — ready for Julia, Python, MATLAB, or C++ analysis. |
| 6 | + |
| 7 | +Together, these tools demonstrate what VargaLabs engineering looks like in practice: **fast, reliable, and built for researchers who need raw truth data without vendor lock-in.** |
| 8 | + |
| 9 | +--- |
| 10 | + |
| 11 | +## **The Pipeline** |
| 12 | + |
| 13 | +``` |
| 14 | +┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ |
| 15 | +│ iex-download │────▶│ iex2h5 │────▶│ Researcher │ |
| 16 | +│ (Rust fetcher) │ │ (C++ converter) │ │ (Julia/Python) │ |
| 17 | +│ ~340 LOC │ │ ~4,500 LOC │ │ │ |
| 18 | +└─────────────────┘ └─────────────────┘ └─────────────────┘ |
| 19 | + │ │ |
| 20 | + ▼ ▼ |
| 21 | + IEX HTTPS API HDF5 / CSV / JSON |
| 22 | + .pcap.gz files / REDIS output |
| 23 | +``` |
| 24 | + |
| 25 | +**Stage 1 — Download:** `iex-download` fetches IEX historical datasets (TOPS, DEEP, DEEP+) using a PEG-based date parser and resilient HTTP transfers. |
| 26 | + |
| 27 | +**Stage 2 — Convert:** `iex2h5` parses raw IEX-TP packet captures at wire speed and writes structured, compressed HDF5 datasets with nanosecond precision. |
| 28 | + |
| 29 | +**Stage 3 — Analyze:** Open the resulting `.h5` file in Julia, Python, MATLAB, or R and query billions of ticks with sub-second latency. |
| 30 | + |
| 31 | +--- |
| 32 | + |
| 33 | +## **Why This Matters** |
| 34 | + |
| 35 | +IEX provides **17.5 TB of free historical tick data** spanning 2016–2025. For quantitative researchers, this is a goldmine — but only if you can actually get it into a usable format. |
| 36 | + |
| 37 | +Traditional approaches are fragile: |
| 38 | + |
| 39 | +* :material-cloud-download:{.icon} **Manual downloads** force you to click through thousands of files |
| 40 | +* :material-file-question:{.icon} **Raw PCAPs** are binary, uncompressed, and hard to query |
| 41 | +* :material-puzzle:{.icon} **Glue scripts** break when the feed format changes |
| 42 | + |
| 43 | +The IEX Pipeline solves this end-to-end: **one command to fetch, one command to convert, one file to analyze.** |
| 44 | + |
| 45 | +--- |
| 46 | + |
| 47 | +## **Performance** |
| 48 | + |
| 49 | +| Stage | Metric | Value | |
| 50 | +|-------|--------|-------| |
| 51 | +| **Download** | Total dataset | **17.5 TB** across 4,984 files | |
| 52 | +| **Download** | Binary size | **3.5 MB** single ELF (Rust, zero runtime deps) | |
| 53 | +| **Convert** | Ingest speed | **65M events/sec** (HDF5 backend) | |
| 54 | +| **Convert** | Compression | 40 GiB raw PCAP → **<600 MiB** HDF5 | |
| 55 | +| **Convert** | Latency | **0.017 µs/tick** (HDF5 → HDF5) | |
| 56 | + |
| 57 | +> **Platform:** Linux Mint 22.1, g++ 14.2.0, ThinkPad X1 Carbon Gen 12 (Intel Core Ultra 5 125U) |
| 58 | +
|
| 59 | +--- |
| 60 | + |
| 61 | +## **Tech Stack** |
| 62 | + |
| 63 | +| Component | Technology | Purpose | |
| 64 | +|-----------|-----------|---------| |
| 65 | +| **Downloader** | Rust (edition 2021) | Fast, safe, portable HTTP fetching | |
| 66 | +| **Converter** | C++23 | Zero-copy parsing, SIMD-friendly | |
| 67 | +| **Storage** | HDF5 + h5cpp | Compressed, hierarchical, portable arrays | |
| 68 | +| **Protocol** | IEX-TP | Native packet-level parsing (no libpcap dependency) | |
| 69 | +| **Build** | CMake + Ninja | Cross-platform, reproducible builds | |
| 70 | +| **CI** | GitHub Actions | Matrix testing across GCC/Clang × Ubuntu | |
| 71 | + |
| 72 | +--- |
| 73 | + |
| 74 | +## **In Action** |
| 75 | + |
| 76 | +<div id="asciicast-iex-download-demo" class="hidden md:block asciicast-player border border-solid border-gray-3 rounded-lg shadow-lg w-1/2 overflow-hidden mb-2 ml-4 float-right text-[9px]"></div> |
| 77 | +--8<-- "iex-download-demo.script" |
| 78 | + |
| 79 | +Watch the downloader in action — fetching terabytes with a progress bar that actually means something. |
| 80 | + |
| 81 | +??? example ":fontawesome-solid-terminal:{.example} Step 1 — Download TOPS for 2025" |
| 82 | + ```bash |
| 83 | + iex-download --tops --directory ./data 2025-01-01..2025-01-31 |
| 84 | + ``` |
| 85 | + |
| 86 | +??? example ":fontawesome-solid-terminal:{.example} Step 2 — Convert to HDF5" |
| 87 | + ```bash |
| 88 | + iex2h5 --convert all --output market-2025.h5 ./data/*.pcap.gz |
| 89 | + ``` |
| 90 | + |
| 91 | +??? example ":fontawesome-solid-terminal:{.example} Step 3 — Analyze in Python" |
| 92 | + ```python |
| 93 | + import h5py |
| 94 | + with h5py.File('market-2025.h5', 'r') as f: |
| 95 | + print(f['/time'][:10]) # first 10 timestamps |
| 96 | + ``` |
| 97 | + |
| 98 | +--- |
| 99 | + |
| 100 | +## **From the Blog** |
| 101 | + |
| 102 | +* :material-newspaper:{.icon} [**I Analyzed 6TB of Raw Stock Market Data**](/blog/longest-active-stocks-from-iex-pcap/) — uncovering the 30 most consistently traded stocks on IEX (2025-09-05) |
| 103 | +* :material-newspaper:{.icon} [**A Week of Market History Vanished**](/blog/iex-timestamp-glitch-wireshark/) — debugging an undocumented IEX timestamp sentinel (2025-09-12) |
| 104 | +* :material-newspaper:{.icon} [**IEX-DOWNLOAD: Rust, Tick Data, and 13TB of Fun**](/blog/iex-download-rust/) — the story behind the Rust rewrite (2025-09-25) |
| 105 | + |
| 106 | +--- |
| 107 | + |
| 108 | +## **Links** |
| 109 | + |
| 110 | +> :fontawesome-brands-github:{.icon} **iex-download** — GitHub: [vargalabs/iex-download](https://github.com/vargalabs/iex-download) | Docs: [vargalabs.github.io/iex-download](https://vargalabs.github.io/iex-download) | DOI: [10.5281/zenodo.17188420](https://doi.org/10.5281/zenodo.17188420) |
| 111 | +
|
| 112 | +> :fontawesome-brands-github:{.icon} **iex2h5** — GitHub: [vargalabs/iex2h5](https://github.com/vargalabs/iex2h5) | Docs: [vargalabs.github.io/iex2h5](https://vargalabs.github.io/iex2h5) | DOI: [10.5281/zenodo.15677290](https://doi.org/10.5281/zenodo.15677290) |
| 113 | +
|
| 114 | +> :material-cube-outline:{.icon} **Powered by** [h5cpp](/site/h5cpp/) — the C++17 header-only HDF5 library that makes this possible. |
0 commit comments