Package downloads, dependencies, and repository mappings for the Rust ecosystem.
DB dump: static.crates.io/db-dump.tar.gz -- crate/version name mappings and dependency graph. No authentication required.
Download archives: static.crates.io/archive/version-downloads/ -- daily per-version download counts, aggregated into monthly files. No authentication required. Supports parallel byte-range requests.
In data/sources/crates/:
db-dump/crates.csv-- crate ID, name, repository URLdb-dump/versions.csv-- version ID, crate IDdb-dump/default_versions.csv-- current (non-yanked) version per cratedb-dump/dependencies.csv-- version-level dependency edgesversion-downloads/YYYY-MM.csv-- monthly per-version download totals
| Script | Purpose |
|---|---|
src/sources/crates/fetch_db_dump.py |
Download + extract DB dump (skips if done) |
src/sources/crates/fetch_version_downloads.py |
Download monthly archives (skips complete months) |
src/sources/crates/process_data.py |
Build outputs (~20s) |
uv run src/sources/crates/fetch_db_dump.py [--chunks 16]
uv run src/sources/crates/fetch_version_downloads.py --years 2021 2022 2023 2024 2025 [--concurrency 128]
uv run python -m src.sources.crates.process_data [--min-avg N] [--alpha F]- Load mappings from db-dump (crates, versions, default_versions, dependencies)
- Aggregate downloads -- monthly version-downloads into per-crate annual totals
- top-packages.csv -- crates covering 95% of ecosystem downloads
- dependency-tree.csv -- follow transitive deps through default-version deps only (not yanked)
- github-repos.csv -- parse repo URLs from crates.io metadata
- results.csv -- download-weighted PageRank, value classes A/B/C/D
In data/sources/crates/:
| File | Rows | Description |
|---|---|---|
top-packages.csv |
~3.7K | Crates covering 95% of downloads (+ avg_downloads_share) |
dependency-tree.csv |
~48K edges | Transitive deps from top crates |
github-repos.csv |
~6.0K | Package-to-GitHub-repo mappings |
results.csv |
~6.2K | All dep-tree crates with pagerank + value_class |