You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
doc(downloading): document direct S3 access via Quilt and assets.yaml
Expand the "Using S3 Directly" section to explain the public dandiarchive
S3 bucket layout (blobs/, zarr/, dandisets/), how to browse it through
the Quilt web UI, and how to locate a specific asset by reading the
contentUrl entries in per-Dandiset assets.yaml / assets.jsonld manifests.
Each asset lists both an API download URL (for embargoed data) and a
direct anonymous S3 URL. Includes a worked example and notes that
DataLad already encodes this Dandiset-to-S3 mapping inside git-annex.
Also drop two stale TODO placeholders in the WebDAV and S3 sections.
Co-Authored-By: Claude Code 2.1.152 / Claude Opus 4.7 <noreply@anthropic.com>
Each Dandiset is represented as a separate DataLad dataset.
91
91
<https://github.com/dandi/dandisets/> is a [DataLad superdataset](https://handbook.datalad.org/en/latest/glossary.html#term-DataLad-superdataset) that includes all individual Dandiset datasets as subdatasets ([git submodules](https://git-scm.com/book/en/v2/Git-Tools-Submodules)).
92
-
Where present, individual [Zarr](https://zarr.dev/)files are included as subdatasets ([git submodules](https://git-scm.com/book/en/v2/Git-Tools-Submodules)) hosted in the GitHub organization <https://github.com/dandizarrs/>.
92
+
Where present, individual [Zarr](https://zarr.dev/)assets are included as subdatasets ([git submodules](https://git-scm.com/book/en/v2/Git-Tools-Submodules)) hosted in the GitHub organization <https://github.com/dandizarrs/>.
93
93
94
94
The Git revision histories of each dataset reflect the Dandiset's draft state as of each execution of the mirroring job.
95
95
Published Dandiset versions are tagged with Git tags.
@@ -113,7 +113,8 @@ Learn more about DataLad from its handbook at <https://handbook.datalad.org/>.
113
113
## Using WebDAV
114
114
115
115
DANDI provides a [WebDAV](https://en.wikipedia.org/wiki/WebDAV) service at https://webdav.dandiarchive.org/ for accessing the data in the DANDI archive.
116
-
You can use any WebDAV client or even a web browser to access the data - any dandiset, any version, any file or collection of files.
116
+
You can use any WebDAV client or even a web browser to access the data - any Dandiset, any version, any file or collection of files, and navigate inside Zarr assets (including their versions).
117
+
117
118
You can use any web download tool to download the data from the DANDI archive, e.g.
118
119
119
120
````commandline
@@ -126,3 +127,67 @@ for a download of a specific release `0.210831.2033` of the `000027` dandiset.
126
127
You might need to configure your WebDAV client to follow redirects; e.g., for the [davfs2](https://savannah.nongnu.org/projects/davfs2) WebDAV client, set `follow_redirect` to `1` in `/etc/davfs2/davfs2.conf`.
127
128
128
129
**Developers' note:** The WebDAV service's code is available at https://github.com/dandi/dandidav/ and can also be used for independent DANDI deployments.
130
+
131
+
132
+
## Using S3 Directly
133
+
134
+
DANDI stores all asset content in the public AWS S3 bucket [`dandiarchive`](https://dandiarchive.s3.amazonaws.com/).
135
+
Any S3 client (e.g. `aws s3`, `rclone`) works, and individual objects are also reachable over plain HTTPS using URLs of the form `https://dandiarchive.s3.amazonaws.com/<key>` — no AWS account or authentication is required for non-embargoed data.
136
+
137
+
The bucket has three top-level folders relevant to data access:
138
+
139
+
-**`blobs/`** — content of individual (non-Zarr) assets, named by a hash-derived key (e.g. `blobs/58c/537/58c53789-eec4-4080-ad3b-207cf2a1cac9`).
140
+
-**`zarr/`** — content of [Zarr](https://zarr.dev/) assets, each laid out as the original Zarr directory tree under `zarr/<zarr-id>/`.
141
+
-**`dandisets/`** — per-Dandiset manifests that map each Dandiset version to the specific entries it contains under `blobs/` and `zarr/`.
142
+
143
+
Because `blobs/` and `zarr/` are keyed by content/identifier rather than by Dandiset path, you locate files by starting from `dandisets/`.
144
+
145
+
### Browsing the bucket via Quilt
146
+
147
+
[Quilt](https://open.quiltdata.com/b/dandiarchive/tree/README.md) offers a web UI for browsing the bucket — convenient for exploration without installing an S3 client.
148
+
149
+
<img
150
+
src="../../../img/web_quiltdata-topdir.jpg"
151
+
alt="Quilt browser showing top-level folders in the dandiarchive bucket"
### Locating files for a specific Dandiset version
155
+
156
+
Each Dandiset version lives at `dandisets/<dandiset-id>/<version>/`, where `<version>` is either a published version (e.g. `0.260218.2052`) or `draft/` for the current draft state — for example, <https://open.quiltdata.com/b/dandiarchive/tree/dandisets/000003/0.260218.2052/>.
157
+
158
+
<img
159
+
src="../../../img/web_quiltdata-dandiset-1.jpg"
160
+
alt="Files inside a published Dandiset version on Quilt"
-**`assets.yaml`** / **`assets.jsonld`** — every asset in this Dandiset version, with its original `path` and one or more `contentUrl` entries (e.g. <https://open.quiltdata.com/b/dandiarchive/tree/dandisets/000003/0.260218.2052/assets.yaml>).
167
+
168
+
Each asset's `contentUrl` lists two URLs:
169
+
170
+
1. A DANDI API download URL (`https://api.dandiarchive.org/api/assets/<id>/download/`) — use this for **embargoed** Dandisets where authenticated access is required.
171
+
2. A direct S3 URL (`https://dandiarchive.s3.amazonaws.com/blobs/...` or `.../zarr/<zarr-id>/`) — anonymous and fastest for public data.
**Note:**[DataLad](#using-datalad) already encodes this Dandiset-version-to-S3-location mapping inside git-annex, so `datalad get` resolves files transparently without you having to read `assets.yaml` yourself.
0 commit comments