Skip to content

dandi/access-summaries

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

223 Commits
 
 
 
 
 
 
 
 

Repository files navigation

DOI

Dandiset access summaries

Summaries of access stats (full downloads & streaming) for each Dandiset on the DANDI archive.

This is the underlying data for the DANDI usage dashboard.

Getting the data

A GZIP archive of the content/ directory is published daily to the dist branch via a scheduled GitHub Actions workflow.

Using curl

curl -fsSL https://raw.githubusercontent.com/dandi/access-summaries/dist/content.tar.gz | tar -xz

Using Python standard library

import io
import tarfile
import urllib.request

url = "https://raw.githubusercontent.com/dandi/access-summaries/dist/content.tar.gz"
with urllib.request.urlopen(url) as response:
    with tarfile.open(fileobj=io.BytesIO(response.read()), mode="r:gz") as tar:
        tar.extractall(filter="data")

Using recurring / up-to-date state

For cases where you want to stay up to date with only the latest changes to the data, clone the repository and pull regularly to get the newest content/:

git clone https://github.com/dandi/access-summaries.git
cd access-summaries

# Keep up-to-date over time with:
git pull

Data layout

After extracting the archive, the content/ directory has the following structure:

content/
├── archive_totals.json           # Archive-wide totals
├── totals.json                   # Per-dandiset totals
├── region_codes_to_coordinates.yaml  # Lat/lon for each region code
└── summaries/
    └── {dandiset_id}/
        ├── by_day.tsv            # bytes_sent per day
        ├── by_region.tsv         # bytes_sent per region
        ├── by_asset.tsv          # bytes_sent per asset (all-time)
        ├── by_asset_per_week.tsv # bytes_sent per asset per week
        └── by_asset_type_per_week.tsv  # bytes_sent per asset type per week

Usage examples

The example below assumes you have already extracted the archive into the current directory (so that content/ exists locally). It uses matplotlib, which you can install with:

pip install matplotlib

Plot bytes sent over time for a single dandiset

import csv
from datetime import datetime
import matplotlib.pyplot as plt

dandiset_id = "000003"
dates, bytes_sent = [], []
with open(f"content/summaries/{dandiset_id}/by_day.tsv", newline="") as f:
    for row in sorted(csv.DictReader(f, delimiter="\t"), key=lambda r: r["date"]):
        dates.append(datetime.strptime(row["date"], "%Y-%m-%d"))
        bytes_sent.append(int(row["bytes_sent"]) / 1e9)

fig, ax = plt.subplots(figsize=(10, 4))
ax.plot(dates, bytes_sent)
ax.set_xlabel("Date")
ax.set_ylabel("Data transferred (GB)")
ax.set_title(f"Daily data transfer for dandiset {dandiset_id}")
plt.tight_layout()
plt.show()

About

Summaries of access stats (full downloads & streaming) for each Dandiset on the DANDI archive.

Resources

Stars

Watchers

Forks

Contributors