You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add require_artifacts flag and LLM validation error (#62)
* Add require_artifacts flag and LLM validation error
Bump version to 0.15.0. Introduce a require_artifacts parameter to CyteType.run (default True) so callers can choose whether artifact build/upload failures should abort the run; when False the run continues without uploaded_files and logs a warning. Add LLMValidationError and map the API error code LLM_VALIDATION_FAILED to this exception, and export it from the api package. Add tests covering artifact-failure behavior (raising by default and continuing when require_artifacts=False).
* docs update
🚀 [Try it in Google Colab](https://colab.research.google.com/drive/1aRLsI3mx8JR8u5BKHs48YUbLsqRsh2N7?usp=sharing)
82
82
83
-
> **Note:** No API keys required for default configuration. See [custom LLM configuration](docs/configuration.md#llm-configuration) for advanced options.
84
-
>
85
-
> `run()` now handles artifact packaging and upload automatically (`vars.h5` + `obs.duckdb`) before annotation.
86
-
> Generated artifact files are kept on disk by default; use `cleanup_artifacts=True` to remove them after run completion/failure.
83
+
> **Note:** No API keys required for default configuration. See [Configuration](docs/configuration.md) for LLM setup, artifact handling, and advanced options.
Copy file name to clipboardExpand all lines: docs/configuration.md
+41-16Lines changed: 41 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,14 +29,6 @@ adata = annotator.run(
29
29
)
30
30
```
31
31
32
-
`run()` now performs the full upload pipeline internally:
33
-
- Creates `vars.h5` from `adata.X`
34
-
- Creates `obs.duckdb` from `adata.obs`
35
-
- Uploads both artifacts to the CyteType API
36
-
- Calls `/annotate` with uploaded file references
37
-
38
-
If artifact creation or upload fails, `run()` fails fast.
39
-
40
32
## LLM Configuration
41
33
You can provide your own LLM providers/models:
42
34
```python
@@ -64,20 +56,42 @@ adata = annotator.run(
64
56
)
65
57
```
66
58
67
-
## Advanced
59
+
## Artifacts
60
+
61
+
`run()` automatically builds and uploads two artifact files before submitting an annotation job:
62
+
63
+
-**`vars.h5`** — a compressed HDF5 file containing the normalized expression matrix (`adata.X`) and variable metadata (`adata.var`). Used by the server for on-demand gene expression lookups during annotation and in the interactive report.
64
+
-**`obs.duckdb`** — a DuckDB database containing the observation metadata (`adata.obs`). Used by the server to power metadata queries and filtering in the interactive report.
65
+
66
+
Both files are created locally and then uploaded to the CyteType API. The uploaded references are attached to the `/annotate` payload so the server can link them to the job.
67
+
68
+
### Artifact Parameters
69
+
68
70
```python
69
71
adata = annotator.run(
70
72
...
71
-
poll_interval_seconds=30, # How often to poll (default)
72
-
timeout_seconds=7200, # Max wait time (default: 2 hours)
73
-
api_url="https://custom-api.example.com", # Custom API endpoint if needed
74
-
vars_h5_path="vars.h5", # Local artifact output path
75
-
obs_duckdb_path="obs.duckdb", # Local artifact output path
cleanup_artifacts=False, # Keep artifacts by default
73
+
vars_h5_path="vars.h5", # Local output path for vars artifact
74
+
obs_duckdb_path="obs.duckdb", # Local output path for obs artifact
75
+
upload_timeout_seconds=3600, # Socket read timeout per upload (seconds)
76
+
cleanup_artifacts=False, # Delete local artifact files after run
77
+
require_artifacts=True, # Raise on artifact failure (set False to skip)
78
78
)
79
79
```
80
80
81
+
| Parameter | Default | Description |
82
+
|-----------|---------|-------------|
83
+
|`vars_h5_path`|`"vars.h5"`| Local path where the vars HDF5 file is written |
84
+
|`obs_duckdb_path`|`"obs.duckdb"`| Local path where the obs DuckDB file is written |
85
+
|`upload_timeout_seconds`|`3600`| Socket read timeout for each artifact upload |
86
+
|`cleanup_artifacts`|`False`| Delete local artifact files after run completes or fails |
87
+
|`require_artifacts`|`True`| Raise on artifact build/upload failure. Set to `False` to skip artifacts and continue with annotation only |
88
+
89
+
### Error Handling
90
+
91
+
By default (`require_artifacts=True`), any failure during artifact building or uploading stops the run and surfaces the full error. The error message includes a link to report the issue on GitHub.
92
+
93
+
If you want the annotation to proceed even when artifacts fail (e.g. due to disk space or network issues), set `require_artifacts=False`. The job will submit without artifacts — annotation still works, but the interactive report will not have expression lookups or metadata filtering.
94
+
81
95
### Memory Recommendation for Large Datasets
82
96
83
97
For large datasets, open your AnnData object in backed mode to reduce memory usage while building `vars.h5`:
@@ -86,4 +100,15 @@ For large datasets, open your AnnData object in backed mode to reduce memory usa
86
100
import scanpy as sc
87
101
88
102
adata = sc.read_h5ad("input.h5ad", backed="r")
103
+
```
104
+
105
+
## Advanced
106
+
107
+
```python
108
+
adata = annotator.run(
109
+
...
110
+
poll_interval_seconds=30, # How often to poll (default)
111
+
timeout_seconds=7200, # Max wait time (default: 2 hours)
112
+
api_url="https://custom-api.example.com", # Custom API endpoint if needed
Copy file name to clipboardExpand all lines: docs/troubleshooting.md
+2-1Lines changed: 2 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,4 +5,5 @@
5
5
- Make sure you have valid gene symbols in the AnnData object and are passing the correct gene symbols column name to parameter `gene_symbols_column`.
6
6
- If you are using a custom LLM, make sure you have the correct API key and base URL.
7
7
- For large datasets, load AnnData in backed mode (`sc.read_h5ad(..., backed="r")`) to reduce memory use during artifact generation.
8
-
-`run()` creates `vars.h5` and `obs.duckdb` before annotation. Use `cleanup_artifacts=True` if you do not want to keep these local files.
8
+
-`run()` creates `vars.h5` and `obs.duckdb` before annotation. Use `cleanup_artifacts=True` if you do not want to keep these local files.
9
+
- If artifact building or uploading fails, `run()` will raise an error by default. Set `require_artifacts=False` to skip artifacts and continue with annotation only.
0 commit comments