Suggestion: Use Zenodo curated wind dataset as data source
Current approach
The project scrapes the MaStR API directly using paginated curl requests (pageSize=25000, 7 pages). This has several downsides:
- Very slow — the API is rate-limited and downloads ~95 MB per page at low throughput
- Fragile — parallel downloads can produce corrupt/truncated JSON files
- Raw data — coordinates and metadata may contain errors from the registry
Proposed alternative
Use the curated dataset from Zenodo:
Corrected and supplemented unit data on approved wind turbines in Germany
(Version 2026_02_19)
This dataset:
- Contains 40,455 wind turbines (38,606 onshore + 1,849 offshore)
- Has corrected and validated coordinates and metadata
- Is available as both CSV and GeoJSON (pre-geocoded in WGS84 and EPSG:25832)
- Includes all relevant fields: MaStR ID, power (kW), hub height, rotor diameter, manufacturer, type, installation date, decommission date, state, county, municipality, on/offshore status
- Is a one-time download (~13 MB CSV / ~36 MB GeoJSON) vs. ~660 MB from the API
- Is versioned and citable (DOI)
Column mapping
The Zenodo CSV columns map well to the existing PowerPlant dataclass:
| Zenodo CSV |
PowerPlant field |
Notes |
nettonennleistung |
power |
In kW |
datum_inbetriebnahme |
install_date |
ISO format (no .NET date parsing needed) |
datum_endgueltige_stilllegung |
removal_date |
ISO format |
lon_x / lat_y |
longitude / latitude |
WGS84 |
wind_an_land_oder_auf_see |
off_shore |
"Windkraft an Land" / "Windkraft auf See" |
bundesland |
— |
State-level info (new) |
landkreis |
— |
County-level info (new) |
nabenhoehe |
— |
Hub height in meters (new) |
rotordurchmesser |
— |
Rotor diameter in meters (new) |
hersteller / typenbezeichnung |
— |
Manufacturer & turbine type (new) |
Impact
This would significantly simplify the data pipeline, improve data quality, and make the project more reproducible.
Suggestion: Use Zenodo curated wind dataset as data source
Current approach
The project scrapes the MaStR API directly using paginated
curlrequests (pageSize=25000, 7 pages). This has several downsides:Proposed alternative
Use the curated dataset from Zenodo:
Corrected and supplemented unit data on approved wind turbines in Germany
(Version 2026_02_19)
This dataset:
Column mapping
The Zenodo CSV columns map well to the existing
PowerPlantdataclass:nettonennleistungpowerdatum_inbetriebnahmeinstall_datedatum_endgueltige_stilllegungremoval_datelon_x/lat_ylongitude/latitudewind_an_land_oder_auf_seeoff_shorebundeslandlandkreisnabenhoeherotordurchmesserhersteller/typenbezeichnungImpact
This would significantly simplify the data pipeline, improve data quality, and make the project more reproducible.