|
1 | 1 | # Thesis Backtester — AI-Powered Investment Analysis Framework |
2 | 2 |
|
3 | | -> Strategy config → Quantitative screening → LLM multi-chapter deep analysis → Multi-baseline backtest validation |
| 3 | +> Markdown operators + DAG orchestration + LLM step-by-step reasoning + multi-baseline backtest validation |
4 | 4 |
|
5 | | -**Thesis Backtester** is an open-source engine that backtests *qualitative* investment ideas using LLM-powered blind analysis. Unlike traditional quant backtesting (which only works with numeric rules like "buy when PE < 10"), this tool validates the kind of judgment calls real investors make: |
| 5 | +Encode investment analysis methodology as executable operators, orchestrate them into a DAG with dependencies, and let LLM execute chapter by chapter — each step builds on the conclusions of the previous one. Not free-form AI chat, but **structured analysis following your methodology**. |
6 | 6 |
|
7 | | -- "Is this high dividend sustainable or a trap?" |
8 | | -- "Is this low PE genuinely cheap or a value trap?" |
9 | | -- "Does management have integrity?" |
10 | | -- "Can this business model survive a downturn?" |
| 7 | +## Backtest Results: +7.1pp Alpha |
11 | 8 |
|
12 | | -## Backtest Results: 5-Year Blind Test on 120 Stocks |
| 9 | +120 stocks × 12 half-year cross-sections × 5 years (2020-2025), 5-baseline comparison: |
13 | 10 |
|
14 | | -Validated a value investing strategy (low PE + low PB + high dividend + AI deep analysis) across **12 half-year cross-sections from 2020-2025, screening 600 candidates and analyzing 120 stocks**. |
15 | | - |
16 | | -### 5-Baseline Performance Comparison (6-Month Forward Return) |
17 | | - |
18 | | -| Baseline | Samples | Avg Return | Win Rate | vs CSI300 | |
19 | | -|----------|---------|-----------|----------|-----------| |
20 | | -| CSI300 Index | 12 | +0.9% | 42% | — | |
21 | | -| Screen Pool (equal-weight) | 600 | +4.0% | 53% | +3.0pp | |
22 | | -| Screen Top (Gold Tier) | 56 | +4.0% | 57% | +3.0pp | |
| 11 | +| Baseline | Samples | 6M Return | Win Rate | vs CSI300 | |
| 12 | +|----------|---------|----------|----------|-----------| |
| 13 | +| CSI300 | 12 | +0.9% | 42% | — | |
| 14 | +| Screen Pool | 600 | +4.0% | 53% | +3.0pp | |
23 | 15 | | **Agent Buy** | **43** | **+8.1%** | **65%** | **+7.1pp** | |
24 | | -| Agent Top5 | 60 | +6.7% | 65% | +5.7pp | |
25 | | - |
26 | | -### Cumulative Return Curve |
27 | 16 |
|
28 | 17 |  |
29 | 18 |
|
30 | | -### Alpha Decomposition |
31 | | - |
32 | 19 | ``` |
33 | | -CSI300 Index +0.9% ← Market baseline |
34 | | - │ +3.0pp ← Screening alpha |
| 20 | +CSI300 +0.9% |
| 21 | + │ +3.0pp screening alpha |
35 | 22 | Screen Pool +4.0% |
36 | | - │ +4.1pp ← Agent incremental alpha |
37 | | -Agent Buy +8.1% ← End-to-end alpha: +7.1pp |
| 23 | + │ +4.1pp Agent incremental alpha |
| 24 | +Agent Buy +8.1% end-to-end alpha: +7.1pp |
38 | 25 | ``` |
39 | 26 |
|
40 | | -- **Quantitative screening works**: Low valuation + high dividend beats CSI300 by 3.0pp, 53% win rate |
41 | | -- **Agent adds incremental value**: +4.1pp on top of screening, win rate from 53% → 65% |
42 | | -- **Stronger at longer horizon**: Agent Buy 12M avg return +13.9% (vs CSI300 +1.1%, alpha +12.8pp) |
43 | | -- **Avoid signals are effective**: Stocks the Agent avoided performed worse |
| 27 | +**Avoid signals are even stronger**: 73% of stocks the Agent flagged "avoid" subsequently declined. Risk avoidance alpha (-14.8pp) is 2.3x stock selection alpha (+6.4pp). |
44 | 28 |
|
45 | | -> Full report: [backtest_report](strategies/v6_value/backtest/backtest_report_20260316_1448.md) | Structured data: [backtest_summary](strategies/v6_value/backtest/backtest_summary_20260316_1448.json) |
| 29 | +> [Full report](strategies/v6_value/backtest/backtest_report_20260316_1448.md) · [Structured data](strategies/v6_value/backtest/backtest_summary_20260316_1448.json) · [120 analysis reports](strategies/v6_value/backtest/agent_reports/) |
46 | 30 |
|
47 | | -## How It Works |
| 31 | +## Live Analysis Workbench |
48 | 32 |
|
49 | | -``` |
50 | | -Traditional backtest: numeric rule → historical prices → P&L |
51 | | -Thesis backtest: investment philosophy → AI blind analysis → compare with actual outcomes |
| 33 | +```bash |
| 34 | +# Single stock real-time analysis (free data, no Tushare needed) |
| 35 | +python -m src.engine.launcher strategies/v6_enhanced/strategy.yaml live-analyze 601288.SH |
| 36 | + |
| 37 | +# Web workbench |
| 38 | +streamlit run src/web/app.py |
52 | 39 | ``` |
53 | 40 |
|
54 | | -### 3-Step Independent Pipeline |
| 41 | + |
55 | 42 |
|
56 | | -```bash |
57 | | -# Step 1: Generate cross-section dates + screen + save CSV (seconds) |
58 | | -python -m src.engine.launcher strategies/v6_value/strategy.yaml backtest-screen |
| 43 | +<details> |
| 44 | +<summary>View analysis process screenshots</summary> |
59 | 45 |
|
60 | | -# Step 2: Concurrent agent analysis + progress/retry/incremental (hours) |
61 | | -python -m src.engine.launcher strategies/v6_value/strategy.yaml backtest-agent |
62 | | -python -m src.engine.launcher strategies/v6_value/strategy.yaml backtest-agent --dry-run |
| 46 | + |
| 47 | + |
| 48 | + |
| 49 | + |
63 | 50 |
|
64 | | -# Step 3: Collect forward returns + 5-baseline evaluation + return chart (minutes) |
65 | | -python -m src.engine.launcher strategies/v6_value/strategy.yaml backtest-eval |
66 | | -``` |
| 51 | +</details> |
67 | 52 |
|
68 | | -Each step is independent — can be interrupted and resumed. Agent automatically skips completed analyses. |
| 53 | +4 preset frameworks: |
69 | 54 |
|
70 | | -### Single Analysis |
| 55 | +| Framework | Chapters | Focus | |
| 56 | +|-----------|----------|-------| |
| 57 | +| V6 Value Investing | 6 | Backtest-validated (+7.1pp alpha) | |
| 58 | +| **V6 Enhanced** | **8** | **Deep analysis + forward risk + consistency ruling** | |
| 59 | +| Quick Scan | 3 | 10-15 min fast assessment | |
| 60 | +| Income Focus | 5 | Dividend sustainability | |
71 | 61 |
|
72 | | -```bash |
73 | | -# Quantitative screening |
74 | | -python -m src.engine.launcher strategies/v6_value/strategy.yaml screen 2024-06-30 |
| 62 | +## Core Design |
75 | 63 |
|
76 | | -# Single stock agent analysis (requires LLM_API_KEY + LLM_BASE_URL) |
77 | | -python -m src.engine.launcher strategies/v6_value/strategy.yaml agent-analyze 601288.SH 2024-06-30 |
| 64 | +**Operator DAG orchestration > single prompt**: each step's conclusion flows into the next, producing measurably better results through chained reasoning. |
78 | 65 |
|
79 | | -# Batch: screen + agent analysis |
80 | | -python -m src.engine.launcher strategies/v6_value/strategy.yaml batch-analyze 2024-06-30 |
| 66 | +``` |
| 67 | +strategy.yaml All-in-one config: screening + framework + scoring + LLM |
| 68 | + │ |
| 69 | + ▼ |
| 70 | +┌─── Engine ──────────────────────────────────────────────────────┐ |
| 71 | +│ StrategyConfig · Launcher · OperatorRegistry · FactorRegistry │ |
| 72 | +└──────┬──────────────┬───────────────────┬───────────────────────┘ |
| 73 | + │ │ │ |
| 74 | + ┌────▼────┐ ┌─────▼──────┐ ┌───────▼────────┐ |
| 75 | + │Screener │ │ Agent │ │ Backtest │ |
| 76 | + │ │ │ 26 ops DAG │ │ Pipeline │ |
| 77 | + │ │ │ 3-layer │ │ screen → agent │ |
| 78 | + └────┬────┘ │ scoring │ │ → eval │ |
| 79 | + │ └─────┬──────┘ └───────┬────────┘ |
| 80 | +┌──────▼──────────────▼─────────────────▼───────────────────────┐ |
| 81 | +│ Data Layer: Provider abstraction · Parquet · Snapshot · API │ |
| 82 | +└───────────────────────────────────────────────────────────────┘ |
81 | 83 | ``` |
82 | 84 |
|
83 | | -## Architecture |
84 | | - |
| 85 | +| Design | Approach | |
| 86 | +|--------|----------| |
| 87 | +| **Operator-driven** | 26 `.md` operators, strategies compose via YAML, no code needed | |
| 88 | +| **Blind testing** | Company names hidden to eliminate AI brand bias | |
| 89 | +| **Time boundary** | Data layer filtering + prompt injection + tool sandbox | |
| 90 | +| **3-layer scoring** | Thinking steps → scoring rubric → decision thresholds | |
| 91 | +| **5-baseline comparison** | CSI300 / screen pool / top tier / Agent buy / Agent top5 | |
| 92 | + |
| 93 | +<details> |
| 94 | +<summary>Agent analysis flow (DAG dependency graph)</summary> |
| 95 | + |
| 96 | +```mermaid |
| 97 | +graph LR |
| 98 | + CH1[Ch1 Data Verification] |
| 99 | + CH2[Ch2 Fundamentals] |
| 100 | + CH3[Ch3 Cash Flow] |
| 101 | + CH4[Ch4 Valuation] |
| 102 | + CH5[Ch5 Stress Test] |
| 103 | + CH6[Ch6 Decision] |
| 104 | + SYN[Synthesis] |
| 105 | +
|
| 106 | + CH1 --> CH2 & CH3 |
| 107 | + CH2 --> CH3 & CH4 |
| 108 | + CH3 --> CH4 & CH5 |
| 109 | + CH4 --> CH5 & CH6 |
| 110 | + CH5 --> CH6 |
| 111 | + CH6 --> SYN |
| 112 | +
|
| 113 | + style SYN fill:#ff6b35,color:#fff |
| 114 | + style CH1 fill:#4a90d9,color:#fff |
| 115 | + style CH2 fill:#4a90d9,color:#fff |
| 116 | + style CH3 fill:#4a90d9,color:#fff |
| 117 | + style CH4 fill:#4a90d9,color:#fff |
| 118 | + style CH5 fill:#4a90d9,color:#fff |
| 119 | + style CH6 fill:#4a90d9,color:#fff |
85 | 120 | ``` |
86 | | -┌─────────────────────────────────────────────────────────────────┐ |
87 | | -│ Strategy Instance │ |
88 | | -│ strategy.yaml (screening + chapters + operators + scoring + LLM)│ |
89 | | -└─────────────────────────┬───────────────────────────────────────┘ |
90 | | - │ |
91 | | -┌─────────────────────────▼───────────────────────────────────────┐ |
92 | | -│ Engine Layer (src/engine/) │ |
93 | | -│ StrategyConfig · Launcher · FactorRegistry · OperatorRegistry │ |
94 | | -└──────┬──────────┬──────────┬────────────────────────────────────┘ |
95 | | - │ │ │ |
96 | | -┌──────▼───┐ ┌───▼────┐ ┌──▼────────────────────────────────────┐ |
97 | | -│ Screener │ │ Agent │ │ Backtest Pipeline │ |
98 | | -│ │ │ (Blind)│ │ screen → agent → eval (3 steps) │ |
99 | | -└──────┬───┘ └───┬────┘ └──┬────────────────────────────────────┘ |
100 | | - │ │ │ |
101 | | -┌──────▼─────────▼─────────▼─────────────────────────────────────┐ |
102 | | -│ Data Layer (src/data/) │ |
103 | | -│ Provider (abstract) · Parquet Storage · Snapshot · API │ |
104 | | -└─────────────────────────────────────────────────────────────────┘ |
| 121 | + |
| 122 | +</details> |
| 123 | + |
| 124 | +<details> |
| 125 | +<summary>Backtest pipeline (3 independent steps)</summary> |
| 126 | + |
| 127 | +```bash |
| 128 | +python -m src.engine.launcher strategies/v6_value/strategy.yaml backtest-screen # ① Screen (seconds) |
| 129 | +python -m src.engine.launcher strategies/v6_value/strategy.yaml backtest-agent # ② Agent (hours) |
| 130 | +python -m src.engine.launcher strategies/v6_value/strategy.yaml backtest-eval # ③ Evaluate (minutes) |
105 | 131 | ``` |
106 | 132 |
|
107 | | -### Key Design Decisions |
| 133 | +Each step is independent — can be interrupted and resumed. Agent automatically skips completed analyses. |
108 | 134 |
|
109 | | -- **Operator-driven**: 21 analysis operators (`.md` files), strategies compose via YAML, output schema auto-generated |
110 | | -- **6-chapter analysis framework**: Data verification → Fundamentals → Cash flow → Valuation → Stress test → Decision |
111 | | -- **Blind testing**: Company names hidden to eliminate AI brand bias and memory contamination |
112 | | -- **3-layer scoring**: Thinking steps guide reasoning + scoring rubric calibrates + decision thresholds enforce consistency |
113 | | -- **5-baseline comparison**: CSI300 + Screen pool + Gold tier + Agent buy + Agent top5 |
114 | | -- **Time-boundary enforcement**: Data layer hard filtering + prompt injection + agent tool sandbox |
115 | | -- **Strategy-as-config**: `strategy.yaml` defines everything, no code needed |
| 135 | +</details> |
116 | 136 |
|
117 | 137 | ## Quick Start |
118 | 138 |
|
119 | | -### Prerequisites |
120 | | - |
121 | 139 | ```bash |
122 | 140 | pip install -e . |
123 | | -export TUSHARE_TOKEN="your_token_here" # Tushare Pro account |
124 | | -export LLM_API_KEY="your_key_here" # OpenAI-compatible API |
125 | | -export LLM_BASE_URL="https://api.deepseek.com" # Recommended: DeepSeek |
| 141 | +export LLM_API_KEY="your_key" |
| 142 | +export LLM_BASE_URL="https://api.deepseek.com" |
| 143 | + |
| 144 | +# Live analysis (free data, no Tushare needed) |
| 145 | +python -m src.engine.launcher strategies/v6_enhanced/strategy.yaml live-analyze 601288.SH |
| 146 | + |
| 147 | +# Or launch web workbench |
| 148 | +streamlit run src/web/app.py |
126 | 149 | ``` |
127 | 150 |
|
128 | | -### Data Initialization |
| 151 | +<details> |
| 152 | +<summary>Backtest mode (requires Tushare)</summary> |
129 | 153 |
|
130 | 154 | ```bash |
131 | | -python -m src.engine.launcher data init-basic # Stock list + trade calendar |
132 | | -python -m src.engine.launcher data init-market 2020-01-01 # Daily quotes + indicators + factors |
133 | | -python -m src.engine.launcher data daily-update # Daily incremental update |
| 155 | +export TUSHARE_TOKEN="your_token" |
| 156 | + |
| 157 | +python -m src.engine.launcher data init-basic |
| 158 | +python -m src.engine.launcher data init-market 2020-01-01 |
| 159 | +python -m src.engine.launcher strategies/v6_value/strategy.yaml backtest-screen |
| 160 | +python -m src.engine.launcher strategies/v6_value/strategy.yaml backtest-agent |
| 161 | +python -m src.engine.launcher strategies/v6_value/strategy.yaml backtest-eval |
134 | 162 | ``` |
135 | 163 |
|
136 | | -### Creating Your Own Strategy |
| 164 | +</details> |
137 | 165 |
|
138 | | -1. Create `strategies/<name>/strategy.yaml` (reference the [fully annotated v6_value config](strategies/v6_value/strategy.yaml)) |
139 | | -2. Define quantitative screening in `screening` section |
140 | | -3. Compose operators in `framework.chapters` (or create new operators in `operators/`) |
| 166 | +<details> |
| 167 | +<summary>Create your own strategy</summary> |
| 168 | + |
| 169 | +1. Create `strategies/<name>/strategy.yaml` (reference [v6_value](strategies/v6_value/strategy.yaml)) |
| 170 | +2. Define screening conditions (`screening`) |
| 171 | +3. Compose operators into chapters (`framework.chapters`) |
141 | 172 | 4. Run `backtest-screen` → `backtest-agent` → `backtest-eval` |
142 | 173 |
|
143 | 174 | No code required. Output schema auto-generated from operator `outputs` definitions. |
144 | 175 |
|
145 | | -## Project Structure |
| 176 | +</details> |
| 177 | + |
| 178 | +<details> |
| 179 | +<summary>Project structure</summary> |
146 | 180 |
|
147 | 181 | ``` |
148 | 182 | src/ |
149 | 183 | ├── engine/ # Engine: config + launcher + registries |
150 | | -├── data/ # Data: Provider + Parquet + snapshot |
151 | | -│ └── tushare/ # Tushare Provider implementation |
152 | | -├── agent/ # Agent: LLM blind analysis (DAG + tool_use) |
| 184 | +├── data/ # Data: Provider + Parquet + snapshot + free crawler |
| 185 | +├── agent/ # Agent: LLM analysis (DAG scheduling + tool_use) |
153 | 186 | ├── screener/ # Screener: declarative quantitative filtering |
154 | 187 | ├── backtest/ # Backtest: 3-step pipeline + 5-baseline eval |
155 | | -└── web/ # Web: Streamlit workbench |
156 | | -
|
157 | | -factors/ # Quantitative factor definitions (.py) |
158 | | -operators/ # Qualitative analysis operators (.md, 26 total) |
159 | | -strategies/ # Strategy instances |
160 | | -└── v6_value/ # V6 Value Investing (with full backtest data) |
161 | | - ├── strategy.yaml # Config (fully annotated) |
162 | | - └── backtest/ # Backtest results |
163 | | - ├── agent_reports/ # 120 agent analysis reports |
164 | | - ├── screen_results/ # 12 cross-section screening CSVs |
165 | | - └── backtest_chart_*.png # Return curve chart |
166 | | -``` |
| 188 | +└── web/ # Web: Streamlit analysis workbench |
167 | 189 |
|
168 | | -## Documentation |
| 190 | +operators/v1/ # Operator library v1 (21, frozen, tied to backtest results) |
| 191 | +operators/v2/ # Operator library v2 (26, including forward risk operators) |
| 192 | +strategies/ # Strategy instances (4 presets) |
| 193 | +``` |
169 | 194 |
|
170 | | -- [Architecture](docs/design/architecture.md) — System layers and module responsibilities |
171 | | -- [Agent Runtime](docs/design/agent.md) — DAG scheduling, prompt assembly, tool sandbox |
172 | | -- [Data Layer](docs/design/data_layer.md) — Provider abstraction, Parquet storage, snapshots |
173 | | -- [Operators & Factors](docs/design/operators.md) — 21 operator catalog, auto-schema, industry gates |
174 | | -- [Screener](docs/design/screener.md) — Declarative quantitative screening engine |
175 | | -- [Backtest](docs/design/backtest.md) — 3-step pipeline, 5-baseline evaluation |
176 | | -- [Scoring Design](docs/design/scoring.md) — 3-layer scoring philosophy |
177 | | -- [Scaling Plan](docs/scaling_plan.md) — Roadmap from 120 to 600+ samples |
| 195 | +</details> |
178 | 196 |
|
179 | | -## Tech Stack |
| 197 | +## Roadmap |
180 | 198 |
|
181 | | -| Component | Technology | |
182 | | -|-----------|-----------| |
183 | | -| Language | Python 3.9+ | |
184 | | -| Storage | Parquet (zstd compression) | |
185 | | -| LLM Interface | OpenAI-compatible API (async, tool_use) | |
186 | | -| Data Source | Tushare Pro API (Provider abstraction) | |
187 | | -| Web | Streamlit | |
| 199 | +| Timeline | Plan | |
| 200 | +|----------|------| |
| 201 | +| **2026 Q2** | Mock portfolio: full CSI300 Agent evaluation → Top 15 holdings → public release → year-end accountability | |
| 202 | +| **2026 H2** | 3-layer production: earnings-driven analysis (quarterly) + price monitoring (daily) + news verification (on-trigger) | |
| 203 | +| **Ongoing** | Operator refinement · sample expansion (120 → 500+) · multi-strategy comparison | |
188 | 204 |
|
189 | | -## Contributing |
| 205 | +Tech directions: |
| 206 | +- Engine-level gate enforcement (currently declarative only) |
| 207 | +- Same-day result caching |
| 208 | +- Multi-LLM comparison (DeepSeek / GPT / Claude) |
| 209 | +- More free data sources (full announcements, research report summaries) |
190 | 210 |
|
191 | | -Early stage project. Contributions welcome: |
| 211 | +## Docs |
192 | 212 |
|
193 | | -- **New strategy instances** — bring your own investment thesis |
194 | | -- **New analysis operators** — add `.md` files to `operators/` |
195 | | -- **Data source adapters** — implement `DataProvider` Protocol for US/HK markets |
196 | | -- **Multi-model comparison** — DeepSeek / GPT / Gemini benchmarks |
| 213 | +- [Architecture](docs/design/architecture.md) · [Agent](docs/design/agent.md) · [Data Layer](docs/design/data_layer.md) · [Operators](docs/design/operators.md) · [Screener](docs/design/screener.md) · [Backtest](docs/design/backtest.md) · [Scoring](docs/design/scoring.md) · [Live Analysis](docs/design/live_analysis.md) |
197 | 214 |
|
198 | 215 | ## License |
199 | 216 |
|
200 | 217 | Apache License 2.0 |
201 | 218 |
|
202 | 219 | ## Disclaimer |
203 | 220 |
|
204 | | -This tool is for **investment methodology research and validation only**. It does not constitute investment advice. Past backtest results do not guarantee future performance. Always do your own due diligence. |
| 221 | +This tool is for **investment methodology research and validation only**. It does not constitute investment advice. Past backtest results do not guarantee future performance. |
205 | 222 |
|
206 | 223 | --- |
207 | 224 |
|
|
0 commit comments