You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs(data): host assets on existing-repo releases; KEGG redistribution permitted
Reflect the chosen distribution model: GitHub release assets live outside the git tree, so a
separate data repository is optional — attach assets to dedicated tags (e.g. kegg-kegg116,
diamond-2.1.9) on an existing RAVEN repo and reuse the same URLs across raven-python and
MATLAB RAVEN. Use Zenodo only for DOIs or files >2 GB. KEGG artefacts are redistributed with
permission, so the prior 'confirm rights' caveat is removed. Example/schema URLs repointed
from a hypothetical raven-data repo to raven-python.
"description": "Language-agnostic registry of downloadable raven-python / RAVEN data artefacts and external binary bundles. Consumed by the Python resolvers (raven_python.data / raven_python.binaries) and by MATLAB RAVEN. Every file carries a SHA256 so consumers verify integrity after download.",
Both are just URLs in the manifest, so consumers don't care — choose per asset:
90
+
Release **assets are stored separately from the git tree** (GitHub keeps them in a blob
91
+
store), so attaching them to a release does **not** bloat the repository. A dedicated assets
92
+
repository is therefore **optional** — attach the assets to releases on an existing RAVEN
93
+
repo (this one, or MATLAB [RAVEN](https://github.com/SysBioChalmers/RAVEN)) and have **both
94
+
packages reuse the same release-asset URLs** via this manifest.
91
95
92
-
-**GitHub Releases** — simplest, free, language-agnostic, up to ~2 GB per file. Good default,
93
-
and you're already on GitHub for the code.
94
-
-**Zenodo** — adds a citable **DOI**, long-term archival, and handles files larger than 2 GB
95
-
(up to 50 GB/record). Right for the KEGG HMM bundle and anything you want citable.
96
+
Use **dedicated tags** for the assets — e.g. `kegg-kegg116`, `diamond-2.1.9` — rather than
97
+
attaching them to code-milestone releases like `v0.1.0a1`. KEGG data updates roughly yearly
98
+
while the code changes often; dedicated tags keep the two cadences decoupled while still
99
+
living in one repository. The manifest's per-dataset `version` does the rest (it namespaces
100
+
the download cache).
96
101
97
-
### Auto-publishing to Zenodo from GitHub
102
+
Both GitHub Releases and Zenodo are just URLs in the manifest, so consumers don't care —
103
+
mix them per file:
104
+
105
+
-**GitHub Releases** — simplest, free, language-agnostic, up to **~2 GB per file**. The
106
+
default home for the manifest and most assets.
107
+
-**Zenodo** — adds a citable **DOI**, long-term archival, and handles files **larger than
108
+
2 GB** (up to 50 GB/record). Use it for individual large HMM libraries or anything you want
109
+
citable; point just that file's `url` at the Zenodo record.
110
+
111
+
### Auto-publishing to Zenodo from GitHub (only if you need DOIs / >2 GB files)
98
112
99
113
:::{important}
100
114
The **native GitHub↔Zenodo integration** (flip a switch, publish a Release → DOI) archives
@@ -103,10 +117,11 @@ Release. So it only works for assets *committed into the repo*, which defeats th
103
117
multi-GB binaries. Use it for a *software* DOI, not for the data assets.
104
118
:::
105
119
106
-
For the data assets, keep everything GitHub-driven with a small **GitHub Action** that, on
107
-
release, uploads the assets to Zenodo via its REST API (e.g. [`zenodraft`](https://github.com/zenodraft/zenodraft)).
108
-
You cut a normal GitHub Release with the files attached; the Action mirrors them to Zenodo and
109
-
mints a new version DOI. Drop this in the data repo as `.github/workflows/zenodo.yml`:
120
+
If you do want Zenodo DOIs (or need to host files >2 GB), keep it GitHub-driven with a small
121
+
**GitHub Action** that, on release, uploads the assets to Zenodo via its REST API (e.g.
122
+
[`zenodraft`](https://github.com/zenodraft/zenodraft)). You cut a normal GitHub Release with
123
+
the files attached; the Action mirrors them to Zenodo and mints a new version DOI. Drop this
124
+
into whichever repo hosts the asset releases as `.github/workflows/zenodo.yml`:
110
125
111
126
```yaml
112
127
name: Mirror release assets to Zenodo
@@ -136,5 +151,5 @@ ever interact with GitHub Releases; Zenodo archiving + DOIs happen automatically
136
151
| Asset | Home | Notes |
137
152
| --- | --- | --- |
138
153
| **Software binaries** (BLAST / DIAMOND / HMMER) | **bioconda** preferred; or release ZIPs via the resolver | DIAMOND is **GPL-3.0** — ship its license text in the ZIP; keep it as a separate asset, never bundled into the MIT wheel. |
139
-
| **KEGG HMMs / tables** | **Zenodo** (DOI, >2 GB, archival) | ⚠️ Derived from the subscription-licensed KEGG dump — **confirm redistribution rights with KEGG before publishing publicly**. If not permitted, keep access-gated and have users build from their own dump (the resolver supports a local dir). |
154
+
| **KEGG HMMs / tables** | GitHub release (dedicated `kegg-*` tag); Zenodo for libraries >2 GB | Derived from the KEGG dump and **redistributed with permission from KEGG**. Note the provenance in the release notes / manifest `license`. |
140
155
| **Template models** (Human-GEM, yeast-GEM) | **Don't re-host** | Fetch from their canonical repos by pinned release tag — respects their licenses and avoids stale copies. |
0 commit comments