You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+23-14Lines changed: 23 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,6 +11,7 @@ A Snakemake storage plugin for downloading files via HTTP with local caching, ch
11
11
-**zenodo.org** - Zenodo data repository (checksum from API)
12
12
-**data.pypsa.org** - PyPSA data repository (checksum from manifest.yaml)
13
13
-**storage.googleapis.com** - Google Cloud Storage (checksum from GCS JSON API)
14
+
-**any http(s) URL** - Generic fallback with size/mtime from HTTP headers
14
15
15
16
## Features
16
17
@@ -19,7 +20,7 @@ A Snakemake storage plugin for downloading files via HTTP with local caching, ch
19
20
-**Rate limit handling**: Automatically respects Zenodo's rate limits using `X-RateLimit-*` headers with exponential backoff retry
20
21
-**Concurrent download control**: Limits simultaneous downloads to prevent overwhelming servers
21
22
-**Progress bars**: Shows download progress with tqdm
22
-
-**Immutable URLs**: Returns mtime=0 for Zenodo and data.pypsa.org (persistent URLs); uses actual mtime for GCS
23
+
-**Immutable URLs**: Returns mtime=0 for Zenodo and data.pypsa.org (persistent URLs); uses actual mtime for GCS and generic HTTP
23
24
-**Environment variable support**: Configure via environment variables for CI/CD workflows
24
25
25
26
## Installation
@@ -67,7 +68,7 @@ If you don't explicitly configure it, the plugin will use default settings autom
67
68
68
69
## Usage
69
70
70
-
Use Zenodo, data.pypsa.org, or Google Cloud Storage URLs directly in your rules. Snakemake automatically detects supported URLs and routes them to this plugin:
71
+
Use any HTTP(S) URL directly in your rules. Snakemake automatically routes all HTTP(S) URLs to this plugin:
71
72
72
73
```python
73
74
rule download_zenodo:
@@ -93,6 +94,14 @@ rule download_gcs:
93
94
"resources/cba_projects.zip"
94
95
shell:
95
96
"cp {input} {output}"
97
+
98
+
rule download_generic:
99
+
input:
100
+
storage("https://example.com/data/dataset.csv"),
101
+
output:
102
+
"resources/dataset.csv"
103
+
shell:
104
+
"cp {input} {output}"
96
105
```
97
106
98
107
Or if you configured a tagged storage entity:
@@ -116,7 +125,7 @@ The plugin will:
116
125
- Progress bar showing download status
117
126
- Automatic rate limit handling with exponential backoff retry
If both plugins are installed, supported URLs would be ambiguous - both plugins accept them.
162
-
Typically snakemake would raise an error: **"Multiple suitable storage providers found"** if you try to use `storage()` without specifying which plugin to use, ie. one needs to explicitly call the Cached HTTP provider using `storage.cached_http(url)` instead of `storage(url)`,
163
-
but we monkey-patch the http plugin to refuse zenodo.org, data.pypsa.org, and storage.googleapis.com URLs.
170
+
Generic HTTP URLs are treated as mutable: size and mtime are read from `Content-Length` and
171
+
`Last-Modified`response headers. Servers that do not support `HEAD` requests are handled
0 commit comments