Skip to content

Commit caa9c4d

Browse files
authored
[#40]:paige:site, add unified IEX Pipeline showcase page (#41)
1 parent d948a95 commit caa9c4d

2 files changed

Lines changed: 115 additions & 0 deletions

File tree

docs/site/iex-pipeline.md

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
# **IEX Pipeline: From Exchange to Analysis**
2+
3+
**Project Goal**
4+
5+
The **IEX Pipeline** is a two-stage, high-performance data pipeline that bridges the gap between free exchange data and quantitative research. It combines a **Rust downloader** that fetches terabytes of market data with a **C++23 converter** that transforms raw PCAP packets into compressed, query-efficient HDF5 arrays — ready for Julia, Python, MATLAB, or C++ analysis.
6+
7+
Together, these tools demonstrate what VargaLabs engineering looks like in practice: **fast, reliable, and built for researchers who need raw truth data without vendor lock-in.**
8+
9+
---
10+
11+
## **The Pipeline**
12+
13+
```
14+
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
15+
│ iex-download │────▶│ iex2h5 │────▶│ Researcher │
16+
│ (Rust fetcher) │ │ (C++ converter) │ │ (Julia/Python) │
17+
│ ~340 LOC │ │ ~4,500 LOC │ │ │
18+
└─────────────────┘ └─────────────────┘ └─────────────────┘
19+
│ │
20+
▼ ▼
21+
IEX HTTPS API HDF5 / CSV / JSON
22+
.pcap.gz files / REDIS output
23+
```
24+
25+
**Stage 1 — Download:** `iex-download` fetches IEX historical datasets (TOPS, DEEP, DEEP+) using a PEG-based date parser and resilient HTTP transfers.
26+
27+
**Stage 2 — Convert:** `iex2h5` parses raw IEX-TP packet captures at wire speed and writes structured, compressed HDF5 datasets with nanosecond precision.
28+
29+
**Stage 3 — Analyze:** Open the resulting `.h5` file in Julia, Python, MATLAB, or R and query billions of ticks with sub-second latency.
30+
31+
---
32+
33+
## **Why This Matters**
34+
35+
IEX provides **17.5 TB of free historical tick data** spanning 2016–2025. For quantitative researchers, this is a goldmine — but only if you can actually get it into a usable format.
36+
37+
Traditional approaches are fragile:
38+
39+
* :material-cloud-download:{.icon} **Manual downloads** force you to click through thousands of files
40+
* :material-file-question:{.icon} **Raw PCAPs** are binary, uncompressed, and hard to query
41+
* :material-puzzle:{.icon} **Glue scripts** break when the feed format changes
42+
43+
The IEX Pipeline solves this end-to-end: **one command to fetch, one command to convert, one file to analyze.**
44+
45+
---
46+
47+
## **Performance**
48+
49+
| Stage | Metric | Value |
50+
|-------|--------|-------|
51+
| **Download** | Total dataset | **17.5 TB** across 4,984 files |
52+
| **Download** | Binary size | **3.5 MB** single ELF (Rust, zero runtime deps) |
53+
| **Convert** | Ingest speed | **65M events/sec** (HDF5 backend) |
54+
| **Convert** | Compression | 40 GiB raw PCAP → **<600 MiB** HDF5 |
55+
| **Convert** | Latency | **0.017 µs/tick** (HDF5 → HDF5) |
56+
57+
> **Platform:** Linux Mint 22.1, g++ 14.2.0, ThinkPad X1 Carbon Gen 12 (Intel Core Ultra 5 125U)
58+
59+
---
60+
61+
## **Tech Stack**
62+
63+
| Component | Technology | Purpose |
64+
|-----------|-----------|---------|
65+
| **Downloader** | Rust (edition 2021) | Fast, safe, portable HTTP fetching |
66+
| **Converter** | C++23 | Zero-copy parsing, SIMD-friendly |
67+
| **Storage** | HDF5 + h5cpp | Compressed, hierarchical, portable arrays |
68+
| **Protocol** | IEX-TP | Native packet-level parsing (no libpcap dependency) |
69+
| **Build** | CMake + Ninja | Cross-platform, reproducible builds |
70+
| **CI** | GitHub Actions | Matrix testing across GCC/Clang × Ubuntu |
71+
72+
---
73+
74+
## **In Action**
75+
76+
<div id="asciicast-iex-download-demo" class="hidden md:block asciicast-player border border-solid border-gray-3 rounded-lg shadow-lg w-1/2 overflow-hidden mb-2 ml-4 float-right text-[9px]"></div>
77+
--8<-- "iex-download-demo.script"
78+
79+
Watch the downloader in action — fetching terabytes with a progress bar that actually means something.
80+
81+
??? example ":fontawesome-solid-terminal:{.example} Step 1 — Download TOPS for 2025"
82+
```bash
83+
iex-download --tops --directory ./data 2025-01-01..2025-01-31
84+
```
85+
86+
??? example ":fontawesome-solid-terminal:{.example} Step 2 — Convert to HDF5"
87+
```bash
88+
iex2h5 --convert all --output market-2025.h5 ./data/*.pcap.gz
89+
```
90+
91+
??? example ":fontawesome-solid-terminal:{.example} Step 3 — Analyze in Python"
92+
```python
93+
import h5py
94+
with h5py.File('market-2025.h5', 'r') as f:
95+
print(f['/time'][:10]) # first 10 timestamps
96+
```
97+
98+
---
99+
100+
## **From the Blog**
101+
102+
* :material-newspaper:{.icon} [**I Analyzed 6TB of Raw Stock Market Data**](/blog/longest-active-stocks-from-iex-pcap/) — uncovering the 30 most consistently traded stocks on IEX (2025-09-05)
103+
* :material-newspaper:{.icon} [**A Week of Market History Vanished**](/blog/iex-timestamp-glitch-wireshark/) — debugging an undocumented IEX timestamp sentinel (2025-09-12)
104+
* :material-newspaper:{.icon} [**IEX-DOWNLOAD: Rust, Tick Data, and 13TB of Fun**](/blog/iex-download-rust/) — the story behind the Rust rewrite (2025-09-25)
105+
106+
---
107+
108+
## **Links**
109+
110+
> :fontawesome-brands-github:{.icon} **iex-download** — GitHub: [vargalabs/iex-download](https://github.com/vargalabs/iex-download) | Docs: [vargalabs.github.io/iex-download](https://vargalabs.github.io/iex-download) | DOI: [10.5281/zenodo.17188420](https://doi.org/10.5281/zenodo.17188420)
111+
112+
> :fontawesome-brands-github:{.icon} **iex2h5** — GitHub: [vargalabs/iex2h5](https://github.com/vargalabs/iex2h5) | Docs: [vargalabs.github.io/iex2h5](https://vargalabs.github.io/iex2h5) | DOI: [10.5281/zenodo.15677290](https://doi.org/10.5281/zenodo.15677290)
113+
114+
> :material-cube-outline:{.icon} **Powered by** [h5cpp](/site/h5cpp/) — the C++17 header-only HDF5 library that makes this possible.

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ nav:
1010
- Presentations: site/talks.md
1111
- Projects:
1212
- IEX Download: site/iexdownload.md
13+
- IEX Pipeline: site/iex-pipeline.md
1314
- IEX2H5: site/iex2h5.md
1415
- Radix64: site/radix64.md
1516
- TinyCrypto: site/tinycrypto.md

0 commit comments

Comments
 (0)