The npm slice of the Value pipeline: how npm download and
dependency data becomes a download-weighted PageRank and an A/B/C/D value class
for every JavaScript/TypeScript package. This page covers the pipeline
assembly; for raw-fetch mechanics (endpoints, rate limits, fetch scripts) see
the source reference sources/npm.md.
| Source | Data collected | Raw file (data/sources/npm/) |
|---|---|---|
| npm downloads API | per-package annual downloads (2021–2025) | raw/downloads.csv |
| npm downloads API | ecosystem-wide annual totals (the 95% denominator) | raw/npm-stats.csv |
| npm registry | declared runtime dependencies (package → dep) |
raw/dependencies.csv |
| nice-registry | package → repo_url mapping (~2M packages) |
nice-registry/packages.csv |
No authentication required; the downloads API is rate-limited to ~5 req/s.
npm data flows through the shared Value mechanics (full description in
value.md):
- Top packages — sort by avg annual downloads, keep packages covering 95% of
the ecosystem-wide total (from
npm-stats.csv). - Dependency tree — follow transitive runtime deps from the top set, fetching any missing deps from the registry.
- package → repo — match every dep-tree package against nice-registry.
- PageRank — download-weighted personalized PageRank (α = 0.85) over the
directed dep graph (
A → Bmeans A depends on B). - Value class — sort by PageRank desc; cumulative-share cutoffs assign A (≤50%) / B (≤75%) / C (≤90%) / D (rest).
Orchestrated by src.value.npm_pipeline (fetch-data → fetch-stats → fetch-repos →
process). Metric lineage (← = data source, […] = period):
JavaScript / TypeScript (npm)
├── downloads_2021..2025 ← api.npmjs.org/downloads [2021–2025]
├── avg_downloads ← derived (mean over populated years) [2021–2025]
├── avg_downloads_share ← derived (pkg / ecosystem total) [2021–2025]
├── top ← derived (95% cum-download cutoff) [2021–2025]
├── dep edges (package→dep)← registry.npmjs.org [most recent]
├── pagerank ← derived (DL-weighted PR, α=0.85) [2021–2025]
├── value_class ← derived (A/B/C/D, cum-PR share) [2021–2025]
└── package→repo ← nice-registry [most recent]
- Value — each package's
value_classis grouped by repo intodata/value/value.csvas theclass_npmcolumn; the strongest class across ecosystems becomesclass. - Risk — A/B-class npm repos enter
src.risk.run_risk_pipeline(scope set byrisk_input.value_classesinsrc/settings.json). - Eligibility — A/B repos that also pass the OSI-license and non-EOL gates
reach
data/eligibility/eligibility.csv.
results.csv (data/sources/npm/) — one row per dep-tree package:
| Column | Description |
|---|---|
package |
Package name |
github_repo |
owner/repo slug |
avg_downloads, 2021–2025 |
Downloads |
top |
True if in the 95% cumulative set |
pagerank |
Download-weighted PageRank score |
value_class |
A/B/C/D |
Carried from the cross-ecosystem tables in value.md:
| Stage | Count |
|---|---|
| Top packages (95% downloads) | 5,765 |
| After dep tree | 6,370 |
| Results | 6,370 |
| With GitHub repo | 6,281 (99%) |
| Class | A | B | C | D | Total |
|---|---|---|---|---|---|
| Packages | 331 | 748 | 1,183 | 4,108 | 6,370 |
Repos (value.csv) |
144 | 430 | 769 | 3,087 | — |
A+B repos with a GitHub repo: 100% — npm has the cleanest upstream identity of the four ecosystems, so essentially all load-bearing npm packages reach Risk and Eligibility.