You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| test-slow | 3.12 |`-m "slow and not credentialed"`| — |
66
-
| test-credentialed | 3.12 |`-m "credentialed and not smoke"`| Maintainer approval |
67
-
| coverage | 3.12 | Default suite (fast) with coverage report | — |
58
+
Jobs in `.github/workflows/test.yml`. Each test job collects coverage data (from Python 3.12 only); the final coverage job merges them into one combined report.
@@ -113,6 +125,7 @@ Tests that work without downloaded data or network access:
113
125
**Tier 2: Real data tests (`benchmark` + `live` markers)**
114
126
115
127
Tests that download and use actual benchmark data:
128
+
116
129
- Environment/tool tests: create real environments, execute tools on real databases
117
130
- Data loading pipeline: `load_tasks`, `load_domain_config`, etc.
118
131
- Data integrity validation (also marked `slow`): schema checks, minimum record counts, field structure
@@ -122,6 +135,7 @@ Tests that download and use actual benchmark data:
122
135
Benchmarks use `ensure_data_exists()` to download data to the **package's default data directory** (not temp dirs). This function caches — it skips download if files already exist. A session-scoped pytest fixture (e.g., `ensure_tau2_data`, `ensure_macs_templates`) triggers the download once per test session.
123
136
124
137
Tests that need real data should:
138
+
125
139
1. Depend on the download fixture (`ensure_tau2_data`, `ensure_macs_templates`, etc.)
126
140
2. Be marked `@pytest.mark.live`
127
141
3. Use simple constructors — e.g., `Tau2Environment({"domain": "retail"})` — since data is already in the default location
@@ -131,6 +145,7 @@ Tests that don't need data (structural, mock-based) should NOT depend on the dow
131
145
#### How to decide: mock or real data?
132
146
133
147
This is a judgment call. As a guideline:
148
+
134
149
- If the test validates **structure, types, or error handling** → Tier 1 (offline)
135
150
- If the test operates on **real database records, files, or network resources** → Tier 2 (`live`)
136
151
- Don't force synthetic fixtures where they add complexity without value. If something needs real data, test it with real data.
0 commit comments