Skip to content

Commit 79f2da1

Browse files
yarikopticclaude
andcommitted
doc(downloading): document direct S3 access via Quilt and assets.yaml
Expand the "Using S3 Directly" section to explain the public dandiarchive S3 bucket layout (blobs/, zarr/, dandisets/), how to browse it through the Quilt web UI, and how to locate a specific asset by reading the contentUrl entries in per-Dandiset assets.yaml / assets.jsonld manifests. Each asset lists both an API download URL (for embargoed data) and a direct anonymous S3 URL. Includes a worked example and notes that DataLad already encodes this Dandiset-to-S3 mapping inside git-annex. Also drop two stale TODO placeholders in the WebDAV and S3 sections. Co-Authored-By: Claude Code 2.1.152 / Claude Opus 4.7 <noreply@anthropic.com>
1 parent b8a3755 commit 79f2da1

4 files changed

Lines changed: 67 additions & 2 deletions

File tree

85.7 KB
Loading
131 KB
Loading

docs/img/web_quiltdata-topdir.jpg

74.3 KB
Loading

docs/user-guide-using/accessing-data/downloading.md

Lines changed: 67 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ style="width: 60%; height: auto; display: block; margin-left: auto; margin-righ
8989

9090
Each Dandiset is represented as a separate DataLad dataset.
9191
<https://github.com/dandi/dandisets/> is a [DataLad superdataset](https://handbook.datalad.org/en/latest/glossary.html#term-DataLad-superdataset) that includes all individual Dandiset datasets as subdatasets ([git submodules](https://git-scm.com/book/en/v2/Git-Tools-Submodules)).
92-
Where present, individual [Zarr](https://zarr.dev/) files are included as subdatasets ([git submodules](https://git-scm.com/book/en/v2/Git-Tools-Submodules)) hosted in the GitHub organization <https://github.com/dandizarrs/>.
92+
Where present, individual [Zarr](https://zarr.dev/) assets are included as subdatasets ([git submodules](https://git-scm.com/book/en/v2/Git-Tools-Submodules)) hosted in the GitHub organization <https://github.com/dandizarrs/>.
9393

9494
The Git revision histories of each dataset reflect the Dandiset's draft state as of each execution of the mirroring job.
9595
Published Dandiset versions are tagged with Git tags.
@@ -113,7 +113,8 @@ Learn more about DataLad from its handbook at <https://handbook.datalad.org/>.
113113
## Using WebDAV
114114

115115
DANDI provides a [WebDAV](https://en.wikipedia.org/wiki/WebDAV) service at https://webdav.dandiarchive.org/ for accessing the data in the DANDI archive.
116-
You can use any WebDAV client or even a web browser to access the data - any dandiset, any version, any file or collection of files.
116+
You can use any WebDAV client or even a web browser to access the data - any Dandiset, any version, any file or collection of files, and navigate inside Zarr assets (including their versions).
117+
117118
You can use any web download tool to download the data from the DANDI archive, e.g.
118119

119120
````commandline
@@ -126,3 +127,67 @@ for a download of a specific release `0.210831.2033` of the `000027` dandiset.
126127
You might need to configure your WebDAV client to follow redirects; e.g., for the [davfs2](https://savannah.nongnu.org/projects/davfs2) WebDAV client, set `follow_redirect` to `1` in `/etc/davfs2/davfs2.conf`.
127128

128129
**Developers' note:** The WebDAV service's code is available at https://github.com/dandi/dandidav/ and can also be used for independent DANDI deployments.
130+
131+
132+
## Using S3 Directly
133+
134+
DANDI stores all asset content in the public AWS S3 bucket [`dandiarchive`](https://dandiarchive.s3.amazonaws.com/).
135+
Any S3 client (e.g. `aws s3`, `rclone`) works, and individual objects are also reachable over plain HTTPS using URLs of the form `https://dandiarchive.s3.amazonaws.com/<key>` — no AWS account or authentication is required for non-embargoed data.
136+
137+
The bucket has three top-level folders relevant to data access:
138+
139+
- **`blobs/`** — content of individual (non-Zarr) assets, named by a hash-derived key (e.g. `blobs/58c/537/58c53789-eec4-4080-ad3b-207cf2a1cac9`).
140+
- **`zarr/`** — content of [Zarr](https://zarr.dev/) assets, each laid out as the original Zarr directory tree under `zarr/<zarr-id>/`.
141+
- **`dandisets/`** — per-Dandiset manifests that map each Dandiset version to the specific entries it contains under `blobs/` and `zarr/`.
142+
143+
Because `blobs/` and `zarr/` are keyed by content/identifier rather than by Dandiset path, you locate files by starting from `dandisets/`.
144+
145+
### Browsing the bucket via Quilt
146+
147+
[Quilt](https://open.quiltdata.com/b/dandiarchive/tree/README.md) offers a web UI for browsing the bucket — convenient for exploration without installing an S3 client.
148+
149+
<img
150+
src="../../../img/web_quiltdata-topdir.jpg"
151+
alt="Quilt browser showing top-level folders in the dandiarchive bucket"
152+
style="width: 60%; height: auto; display: block; margin-left: auto; margin-right: auto;"/>
153+
154+
### Locating files for a specific Dandiset version
155+
156+
Each Dandiset version lives at `dandisets/<dandiset-id>/<version>/`, where `<version>` is either a published version (e.g. `0.260218.2052`) or `draft/` for the current draft state — for example, <https://open.quiltdata.com/b/dandiarchive/tree/dandisets/000003/0.260218.2052/>.
157+
158+
<img
159+
src="../../../img/web_quiltdata-dandiset-1.jpg"
160+
alt="Files inside a published Dandiset version on Quilt"
161+
style="width: 60%; height: auto; display: block; margin-left: auto; margin-right: auto;"/>
162+
163+
Inside this folder you will find:
164+
165+
- **`dandiset.yaml`** / **`dandiset.jsonld`** — Dandiset-level metadata.
166+
- **`assets.yaml`** / **`assets.jsonld`** — every asset in this Dandiset version, with its original `path` and one or more `contentUrl` entries (e.g. <https://open.quiltdata.com/b/dandiarchive/tree/dandisets/000003/0.260218.2052/assets.yaml>).
167+
168+
Each asset's `contentUrl` lists two URLs:
169+
170+
1. A DANDI API download URL (`https://api.dandiarchive.org/api/assets/<id>/download/`) — use this for **embargoed** Dandisets where authenticated access is required.
171+
2. A direct S3 URL (`https://dandiarchive.s3.amazonaws.com/blobs/...` or `.../zarr/<zarr-id>/`) — anonymous and fastest for public data.
172+
173+
<img
174+
src="../../../img/web_quiltdata-dandiset-contentUrl.jpg"
175+
alt="contentUrl entries in assets.yaml"
176+
style="width: 60%; height: auto; display: block; margin-left: auto; margin-right: auto;"/>
177+
178+
For example, an entry might read:
179+
180+
````yaml
181+
path: sub-YutaMouse20/sub-YutaMouse20_ses-YutaMouse20-140327_behavior+ecephys.nwb
182+
contentUrl:
183+
- https://api.dandiarchive.org/api/assets/5e9e92e1-f044-4aa0-ab47-1cfcb8899348/download/
184+
- https://dandiarchive.s3.amazonaws.com/blobs/58c/537/58c53789-eec4-4080-ad3b-207cf2a1cac9
185+
````
186+
187+
The S3 URL can then be fetched with any tool — for example:
188+
189+
````commandline
190+
wget https://dandiarchive.s3.amazonaws.com/blobs/58c/537/58c53789-eec4-4080-ad3b-207cf2a1cac9 -O sub-YutaMouse20_..._behavior+ecephys.nwb
191+
````
192+
193+
**Note:** [DataLad](#using-datalad) already encodes this Dandiset-version-to-S3-location mapping inside git-annex, so `datalad get` resolves files transparently without you having to read `assets.yaml` yourself.

0 commit comments

Comments
 (0)