Skip to content

Latest commit

 

History

History
91 lines (72 loc) · 4.08 KB

File metadata and controls

91 lines (72 loc) · 4.08 KB

JavaScript / TypeScript (npm)

The npm slice of the Value pipeline: how npm download and dependency data becomes a download-weighted PageRank and an A/B/C/D value class for every JavaScript/TypeScript package. This page covers the pipeline assembly; for raw-fetch mechanics (endpoints, rate limits, fetch scripts) see the source reference sources/npm.md.

Sources & data collected

Source Data collected Raw file (data/sources/npm/)
npm downloads API per-package annual downloads (2021–2025) raw/downloads.csv
npm downloads API ecosystem-wide annual totals (the 95% denominator) raw/npm-stats.csv
npm registry declared runtime dependencies (package → dep) raw/dependencies.csv
nice-registry package → repo_url mapping (~2M packages) nice-registry/packages.csv

No authentication required; the downloads API is rate-limited to ~5 req/s.

Value pipeline

npm data flows through the shared Value mechanics (full description in value.md):

  1. Top packages — sort by avg annual downloads, keep packages covering 95% of the ecosystem-wide total (from npm-stats.csv).
  2. Dependency tree — follow transitive runtime deps from the top set, fetching any missing deps from the registry.
  3. package → repo — match every dep-tree package against nice-registry.
  4. PageRank — download-weighted personalized PageRank (α = 0.85) over the directed dep graph (A → B means A depends on B).
  5. Value class — sort by PageRank desc; cumulative-share cutoffs assign A (≤50%) / B (≤75%) / C (≤90%) / D (rest).

Orchestrated by src.value.npm_pipeline (fetch-data → fetch-stats → fetch-repos → process). Metric lineage ( = data source, […] = period):

JavaScript / TypeScript (npm)
├── downloads_2021..2025   ← api.npmjs.org/downloads             [2021–2025]
├── avg_downloads          ← derived (mean over populated years) [2021–2025]
├── avg_downloads_share    ← derived (pkg / ecosystem total)     [2021–2025]
├── top                    ← derived (95% cum-download cutoff)   [2021–2025]
├── dep edges (package→dep)← registry.npmjs.org                  [most recent]
├── pagerank               ← derived (DL-weighted PR, α=0.85)    [2021–2025]
├── value_class            ← derived (A/B/C/D, cum-PR share)     [2021–2025]
└── package→repo           ← nice-registry                       [most recent]

Where it's used downstream

  • Value — each package's value_class is grouped by repo into data/value/value.csv as the class_npm column; the strongest class across ecosystems becomes class.
  • Risk — A/B-class npm repos enter src.risk.run_risk_pipeline (scope set by risk_input.value_classes in src/settings.json).
  • Eligibility — A/B repos that also pass the OSI-license and non-EOL gates reach data/eligibility/eligibility.csv.

Outputs

results.csv (data/sources/npm/) — one row per dep-tree package:

Column Description
package Package name
github_repo owner/repo slug
avg_downloads, 20212025 Downloads
top True if in the 95% cumulative set
pagerank Download-weighted PageRank score
value_class A/B/C/D

npm funnel & classes

Carried from the cross-ecosystem tables in value.md:

Stage Count
Top packages (95% downloads) 5,765
After dep tree 6,370
Results 6,370
With GitHub repo 6,281 (99%)
Class A B C D Total
Packages 331 748 1,183 4,108 6,370
Repos (value.csv) 144 430 769 3,087

A+B repos with a GitHub repo: 100% — npm has the cleanest upstream identity of the four ecosystems, so essentially all load-bearing npm packages reach Risk and Eligibility.