⚠️ Historical snapshot — January 13, 2026. This document captured the research trajectory as of early 2026. For current state see:
- Paper 1: ✅ Published at IEEE BigData 2025 (Dec 2025) —
docs/papers/paper1/- Paper 2: ✅ Accepted at AIAI 2026 (camera-ready May 2026), 🔄 under review at JRFM (MDPI), rejected at Digital Finance —
docs/papers/paper2/- Forward-looking research directions: consolidated in
docs/papers/extensions/(supersedes the Paper 3 / Paper 4 planning referenced below)- GitHub issues referenced below: all closed as of April 2026
The content below is preserved as the trajectory-at-the-time, not current state.
Last Updated: January 13, 2026 Status (at time of writing): Paper #1 published (arXiv:2512.17923), Paper #2 ready for submission (figures polished Jan 2026)
Related Issues (all since closed): #118 (Dissertation Structure), #135 (Paper #3 Planning), #136 (Paper #4 Planning)
This document outlines the multi-paper research trajectory for validating LLM understanding of market microstructure constraints through obfuscation testing.
Core Methodology: Obfuscation testing framework (strip temporal context, force reasoning from structure) Test Domain: Options market dealer constraints (gamma exposure hedging) Key Innovation: Rigorous validation that distinguishes understanding from memorization
| Paper | Status | Timeline | Contribution |
|---|---|---|---|
| Paper #1 | ✅ Published | Dec 2025 | Baseline obfuscation methodology (single-day, SPY) |
| Paper #2 | ✅ Ready for Submission | Q1 2026 | 30-day regime detection + 0DTE hypothesis confirmed |
| Paper #3 | 🔀 Migrating | Q2 2026 | Cross-asset generalization → separate repository |
| Paper #4+ | 🔀 Migrating | 2026+ | Causal networks, pattern discovery → separate repository |
Title: "Validating Large Language Model Understanding of Market Microstructure Through Obfuscation Testing"
Venue: LLM-Finance 2025 Workshop @ IEEE BigData 2025
Contribution:
- Novel obfuscation testing framework for LLM validation
- Proof that LLMs can detect structural dealer constraints without temporal context
- Multi-pattern validation (3 patterns, 242 days, 726 tests)
Key Results:
- Detection: 71.5% average (unbiased prompts across 3 patterns)
- Accuracy: 91.2% (predictions materialize)
- Validation: Full 2024 (242 trading days per pattern)
GitHub Issues:
- #88: Paper #1 status tracking
- #90: Prompt bias investigation (resolved)
- #91-93: Core figures (complete)
Documentation: docs/papers/paper1/
GitHub Issues: #89 (30-Day Regime Detection), #107 (Validation Strategy)
Original Plan: 5-day trajectory analysis (accumulation/relief/reversal)
- Result: 98-100% detection across all conditions (2020 weak GEX vs 2024 strong GEX)
- Finding: Detects universal daily hedging (trivial), not distinctive patterns (interesting)
- Decision: Pivot to 30-day regime windows for meaningful selectivity (30-50% expected detection)
Current Approach: 30-Day Regime Detection
Title: TBD - "LLM Detection of Persistent Dealer Gamma Regimes: 0DTE Evolution and Regime Persistence"
Target: Journal submission (6-8 pages)
Research Questions:
- Can LLMs identify persistent market regimes from dealer gamma positioning?
- Did 0DTE proliferation (2020→2024) increase regime persistence?
- How do LLMs discriminate persistent regimes from transitional periods?
Methodology:
- 30-day regime windows (not 5-day trajectories)
- Regime classification:
- Persistent Positive: >70% days (21+/30) positive GEX, >$5B avg, ≤5 flips
- Persistent Negative: >70% days (21+/30) negative GEX, >$5B avg, ≤5 flips
- Transitional: Frequent flips, no dominant direction (REJECT)
- Low Conviction: Consistent but weak magnitude <$5B (REJECT)
- Expected selectivity: 30-50% detection (vs 98-100% for 5-day)
- 0DTE comparison: 2024 vs 2020 regime persistence
Expected Contributions:
- Regime detection with meaningful selectivity (30-50%, not universal)
- 0DTE proliferation effect on regime stability
- LLM discrimination of structural vs transitional periods
- Temporal extension of obfuscation framework (30-day, not 5-day)
Validation Status (Nov 20, 2025):
- ✅ Phase 1 (Q1 2024 Baseline): 71.2% detection (37/52 windows) - Borderline high
- ✅ Phase 2 (Negative Controls): All tests PASSED
- Phase 2a (Shuffle): 61.1% Q1 2024 vs 12.1% 2020 (5x FP difference proves selectivity)
- Phase 2b (Transitional): 0% FP (perfect rejection)
- Phase 2c (Low-Magnitude): 0% FP (perfect rejection)
- ✅ Phase 3 (Full 2024): 81.2% detection (181/223 windows) - Extreme year confirmed
- ✅ Phase 4 (2020 Comparison): 12.1% detection (27/223 windows) - Normal baseline
KEY FINDING: 69.1pp difference (2024: 81.2%, 2020: 12.1%) confirms:
- ✅ Framework IS selective (5.7x discrimination)
- ✅ 2024 was genuinely extreme (not overdetection)
- ✅ 0DTE hypothesis CONFIRMED (p < 0.001, φ = 0.672)
Implementation Complete (Nov 6-19, 2025):
- ✅ RegimeClassifier module (332 lines)
- ✅ SequentialGEXFetcher updated (window_size=30 parameter)
- ✅ Regime detection prompt v1
- ✅ OpenAI Batch API integration (50% cost reduction)
- ✅ Phase 2 negative control generators (shuffle, transitional, low-magnitude)
Documentation:
docs/papers/paper2/- Complete Paper #2 documentationdocs/papers/paper2/methodology.md- Regime criteria and frameworkdocs/papers/paper2/validation_strategy.md- 4-phase validation roadmapdocs/papers/paper2/results/phase1_results.md- Phase 1 detailed resultsdocs/papers/paper2/execution_plan.md- Current work and next steps
Section VI.K Added (December 2025):
- "Sensitivity to GEX Formulation" subsection added to Discussion
- Explains absolute dollar-scaling (S²) vs practitioner normalized approaches
- Justifies why magnitude preservation is essential for obfuscation testing
- Introduces Formula Agreement Test as future validation (Issue #186)
- Cross-project coordination: AutoTrader-AgentEdge Issue #502
5-Day Work Value: Valuable negative result (98-100% detection too universal), documented in archived sessions
Note: Paper #3 development is being moved to a dedicated repository to keep this repo focused on Papers #1-2 (SPY-based obfuscation validation). The cross-asset work requires different data infrastructure and will be tracked separately.
GitHub Issue: #135 (Per-Strike GEX & Intraday Dynamics)
Title: TBD - "Cross-Asset Validation of LLM Market Microstructure Understanding"
Target: Journal submission (8-10 pages)
Research Questions:
- Does obfuscation testing generalize beyond SPY index options?
- Do dealer constraints differ between index and single-name options?
- Can LLMs detect stock-specific vs market-wide patterns?
Methodology:
- Test on 10-20 individual stocks (high liquidity: AAPL, MSFT, NVDA, TSLA, etc.)
- Use sequential analysis if Paper #2 validates it
- Compare dealer dynamics: Index (SPY) vs single-name (individual stocks)
- Pattern persistence: Test if patterns hold across asset classes
Key Differences (Index vs Single-Name):
- Index options: Broader dealer base, market-making focus
- Single-name options: Concentrated positions, hedging focus
- Gamma dynamics: SPY has constant 0DTE volume, stocks vary
- Liquidity: SPY ultra-liquid, individual stocks more fragmented
Expected Contributions:
- Full generalization proof (methodology works beyond single asset)
- Cross-asset comparison (index vs single-name dealer dynamics)
- Pattern persistence analysis (universal vs asset-specific constraints)
- Combined temporal + cross-asset validation (if Paper #2 successful)
Dataset Requirements:
- Individual stock options data (2024)
- ~10-20 stocks × 242 days = ~2,420-4,840 tests
- Higher data collection effort than Paper #2
Estimated Effort:
- 1-2 weeks data collection (individual stocks)
- 1 week validation runs
- 2-3 weeks analysis/writing
GitHub Issue: #6 (Cross-asset validation) - relates to Paper #3
Dependencies:
- Paper #1 acceptance
- Paper #2 submission (determine if sequential method is validated)
Note: Paper #4+ work (causal constraint networks, pattern discovery, comparative LLMs) is being moved to a dedicated repository alongside Paper #3. This repo will remain focused on the completed Papers #1-2.
GitHub Issue: #136 (Causal Constraint Networks - Graph-Theoretic Framework)
Research Question: Can LLMs discover novel patterns (not just validate known ones)?
Methodology:
- Unsupervised pattern mining with LLMs
- Move from validation → discovery
- Different evaluation framework (data mining risks)
Challenges:
- Requires different validation methodology (how to verify discovered patterns?)
- Higher risk of false positives (data mining concerns)
- Need expert validation for novel patterns
Status: Deferred to Paper #4 or beyond (fundamentally different problem class)
Research Question: How do different LLM architectures perform on constraint detection?
Methodology:
- Test multiple LLMs: GPT-4, o3-mini, Claude, open-source models
- Reasoning capabilities comparison
- Structured output quality assessment
Key Comparison: Reasoning models (o3-mini) vs standard models (GPT-4)
- Hypothesis: Explicit reasoning improves causal identification
Status: Medium-term (requires o3-mini availability)
Research Question: Are LLM confidence scores well-calibrated to empirical accuracy?
Methodology:
- Compare stated confidence to prediction materialization rates
- Develop post-processing calibration adjustments if needed
- Test across sequential and cross-asset contexts
Status: Analysis component (fold into Paper #2 or #3, not standalone)
Research Question: Can we combine formal verification + LLM reasoning?
Methodology:
- Formal methods: Prove constraint properties mathematically
- LLM reasoning: Assess practical materialization from context
- Complementary strengths → robust validation
Status: Long-term vision (2026+)
Research Question: Can obfuscation-validated LLMs monitor markets in real-time?
Application:
- Automated constraint detection
- Explainable alerts (WHO→WHOM→WHAT)
- Regulatory reporting (market structure surveillance)
Status: Long-term (requires production infrastructure)
These ideas were proposed earlier but have been superseded by the current roadmap:
Original proposal: Explain why profitability declined Q1→Q4 2024 despite stable detection
Status: SUPERSEDED - Fold into Paper #2 discussion section Rationale: Interesting but not core methodology contribution. Sequential analysis may naturally explain regime changes.
Original proposal: Paper #3 focused on unsupervised pattern mining
Status: DEFERRED to Paper #4+ Rationale: Advisor sequence ("before going to individual stocks") prioritizes cross-asset generalization. Pattern discovery is fundamentally different problem requiring different validation framework.
- #118 (OPEN): Dissertation Structure & Paper Sequencing
- #88 (OPEN): Paper #1 status tracking (submitted Oct 26)
- #90 (CLOSED): Prompt bias resolved
- #91-93 (CLOSED): Core figures complete
- #95 (CLOSED): Presentation diagrams
- #96-97 (CLOSED): DataObfuscator optimization, performance benchmarks
- #89 (OPEN): 30-Day Regime Detection Framework (primary methodology)
- #107 (OPEN): 4-Phase Validation Strategy
- #112 (CLOSED): OpenAI Batch API implementation (50% cost reduction)
- #137 (CLOSED): JSON parsing fixes (88.5% → 100% completion)
- #139 (CLOSED): Documentation consolidation (48 → 10 files)
- #135 (OPEN): Per-Strike GEX Analysis & Intraday Dynamics
- #87 (OPEN): Cross-asset validation (5-7 individual equities)
- #136 (OPEN): Causal Constraint Networks (Graph-Theoretic Framework)
- #29 (OPEN): Database optimization
- #16 (OPEN): Performance improvements
- #45 (OPEN): Error handling enhancements
- #13 (OPEN): Pattern consolidation (defer)
Decision: Proceed with Paper #2 (Sequential GEX) implementation
- Timeline: Start immediately after acceptance notification
- Effort: 5 days implementation + 2-3 weeks writing
- Risk: Low (uses existing data)
Decision 1: Include sequential in Paper #2 or defer?
- If accuracy improves: Paper #2 focuses on sequential methodology
- If neutral/worse: Fold into Paper #1 discussion, proceed to Paper #3 without sequential
Decision 2: Timeline for Paper #3
- If Paper #2 quick: Start Paper #3 data collection in parallel with Paper #2 writing
- If Paper #2 delayed: Sequential start (finish Paper #2, then start Paper #3)
Decision: After Papers #2-3 complete
- Assess which long-term direction has most impact:
- Pattern discovery (high risk, high reward)
- Comparative LLMs (medium risk, clear contribution)
- Hybrid systems (long-term vision)
- Real-time applications (practical impact)
Paper #1 (Workshop): ✅ PUBLISHED
- LLM-Finance 2025 Workshop @ IEEE BigData 2025
- Published: December 2025 (arXiv:2512.17923)
- Format: 4-6 pages workshop paper
Paper #2 (Journal): Ready for Submission
- Target: Journal of Financial Markets, Journal of Finance, or similar
- Format: 6-8 pages journal article
- Timeline: Q1 2026 submission
- Status: Figures polished Jan 2026, validation complete
Paper #3 & #4+: Migrating to Separate Repository
- Cross-asset generalization and causal networks work
- Will be tracked in dedicated repository
- Timeline: 2026+
Consider presenting at:
- AFA (American Finance Association)
- WFA (Western Finance Association)
- MFA (Midwest Finance Association)
- NeurIPS (ML track)
- ICML (Finance + ML)
Throughout all papers, maintain:
- Obfuscation rigor: Always strip temporal context
- WHO→WHOM→WHAT: Explicit causal identification
- Academic honesty: Report failures and limitations
- Reproducibility: All code/data documented
- Generalization: Prove methodology scales beyond cherry-picked examples
| Date | Milestone |
|---|---|
| ✅ Oct 26, 2025 | Paper #1 submitted |
| ✅ Dec 2025 | Paper #1 published (arXiv:2512.17923) |
| ✅ Jan 2026 | Paper #2 figures polished |
| 🔜 Q1 2026 | Paper #2 submission |
| 🔀 Q2 2026+ | Papers #3-4 → separate repository |
Repository Focus: This repository is now complete for Papers #1-2 (SPY obfuscation validation). Papers #3-4 (cross-asset, causal networks) are being migrated to a dedicated repository.
Repository: https://github.com/iAmGiG/gex-llm-patterns
Primary Issues: #88 (Paper #1), #89 (Paper #2), #6 (Paper #3)
Documentation: docs/papers/paper1/, docs/papers/research_roadmap.md
Status: Updated January 13, 2026. Paper #1 published, Paper #2 ready for submission. Papers #3-4 migrating to separate repository.