Skip to content

Use Zenodo curated wind dataset instead of scraping MaStR API #2

@MaykThewessen

Description

@MaykThewessen

Suggestion: Use Zenodo curated wind dataset as data source

Current approach

The project scrapes the MaStR API directly using paginated curl requests (pageSize=25000, 7 pages). This has several downsides:

  • Very slow — the API is rate-limited and downloads ~95 MB per page at low throughput
  • Fragile — parallel downloads can produce corrupt/truncated JSON files
  • Raw data — coordinates and metadata may contain errors from the registry

Proposed alternative

Use the curated dataset from Zenodo:

Corrected and supplemented unit data on approved wind turbines in Germany
(Version 2026_02_19)

This dataset:

  • Contains 40,455 wind turbines (38,606 onshore + 1,849 offshore)
  • Has corrected and validated coordinates and metadata
  • Is available as both CSV and GeoJSON (pre-geocoded in WGS84 and EPSG:25832)
  • Includes all relevant fields: MaStR ID, power (kW), hub height, rotor diameter, manufacturer, type, installation date, decommission date, state, county, municipality, on/offshore status
  • Is a one-time download (~13 MB CSV / ~36 MB GeoJSON) vs. ~660 MB from the API
  • Is versioned and citable (DOI)

Column mapping

The Zenodo CSV columns map well to the existing PowerPlant dataclass:

Zenodo CSV PowerPlant field Notes
nettonennleistung power In kW
datum_inbetriebnahme install_date ISO format (no .NET date parsing needed)
datum_endgueltige_stilllegung removal_date ISO format
lon_x / lat_y longitude / latitude WGS84
wind_an_land_oder_auf_see off_shore "Windkraft an Land" / "Windkraft auf See"
bundesland State-level info (new)
landkreis County-level info (new)
nabenhoehe Hub height in meters (new)
rotordurchmesser Rotor diameter in meters (new)
hersteller / typenbezeichnung Manufacturer & turbine type (new)

Impact

This would significantly simplify the data pipeline, improve data quality, and make the project more reproducible.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions