Feature/klimadashboard by JanEggers-hr · Pull Request #27 · wdr-data/wdr-ddj-cloud

JanEggers-hr · 2026-04-13T15:20:44Z

Klimadashboard-Scraper in Probebetrieb nehmen

Split msr_wind.py into scraper + processor using open-mastr library (no API key needed) with S3 database download/upload integration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Download all MaStR energy types (wind, solar, biomass, hydro, etc.) - Rename DB from msr_wind.db to mastr.db - Scraper becomes msr_scraper.py (technology-agnostic) - Processor stays wind-specific, more can be added later Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

7 tasks: add dependency, create scraper, create processor, update orchestrator with S3, cleanup old code, gitignore, smoke test. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Scraper runs in isolated venv via uv run (PEP 723). Downloads mastr.db from S3, runs scraper, runs processor, uploads. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- README_msr.md: new architecture, no API keys, isolated venv - README.md: updated file references and env vars Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

pd.DataFrame.max(axis=1) fails when comparing NaN (float) with date strings. Convert to datetime first, then take max. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Lage column is NULL in open-mastr wind_extended table. The correct column for onshore/offshore filtering is WindAnLandOderAufSee. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Raw MaStR data is rebuilt each run from open-mastr cache. Only upload ee_wind_taeglich as CSV (~500 KB) to S3. Remove S3 download/upload of mastr.db entirely. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Generate 4 additional CSVs: - wind_gesamt_monatlich/jaehrlich: cumulative capacity (GW) per period - wind_zubau_monatlich/jaehrlich: new capacity (MW) per period Each with onshore, onshore_geplant, offshore, offshore_geplant columns. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- msr_solar_processor.py: analog to wind processor, 215 GW target (EEG 2023) - Generates solar_taeglich, solar_gesamt/zubau_monatlich/jaehrlich CSVs - Integrated into klimadashboard.py run() - Updated README.md and TODO.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- DB path now local_storage/klimadashboard/mastr.db - S3 download before scraper, upload after processing - Scraper streams progress to stderr ([1/8] wind: 42433 Einheiten) - klimadashboard.py streams scraper stderr live via Popen Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Check DatumDownload in wind_extended. If matches today, skip bulk download and return existing counts. Speeds up local testing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

open-mastr always does full bulk download, so fetching old DB from S3 is pointless. Scraper rebuilds locally, then uploads. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

msr_dw_display.py: builds and uploads 4 charts: - Wind expansion (onshore/offshore + targets) - Solar expansion (installed + target) - Combined renewable expansion - Yearly capacity additions (wind + solar) Chart IDs left blank for now (TODO after chart creation). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

4 charts now all monthly: - CHART_WIND: Gesamtleistung onshore/offshore - CHART_SOLAR: Gesamtleistung solar - CHART_WIND_ZUBAU: Zubau onshore/offshore pro Monat - CHART_SOLAR_ZUBAU: Zubau solar pro Monat Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

YEARLY_AGGREGATES = False (default) uses monthly data. Set to True for yearly aggregation. Affects all 4 charts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

open-mastr leaks tqdm/logging to stdout. Our JSON status is always the last line. Take only that for json.loads(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add solar processor, DW display script, caching, chart IDs, correct S3 flow (upload only), local_storage path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Fill f-string with datetime.now() for last-updated timestamp - Use metadata.annotate.notes (correct DW API path) - Use update_chart instead of deprecated update_metadata Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add open-mastr~=0.17 to pyproject.toml (pandas~=2.3 now compatible) - Remove PEP 723 inline deps and subprocess/JSON protocol from scraper - Import scrape_mastr() directly in klimadashboard.py - Fix pandas 2.x freq aliases: M→ME, Y→YE Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Adds a new klimadashboard scraper to run in trial operation, combining Fraunhofer Energy Charts “Energiemix” updates with MaStR bulk data ingestion via open-mastr and derived wind/solar expansion datasets that are uploaded to S3 and pushed to Datawrapper.

Changes:

Register new klimadashboard scraper job (daily schedule + required env var) and wire deployment secret.
Add MaStR bulk scraper + wind/solar processors + Datawrapper upload module + orchestrator entrypoint.
Add open-mastr dependency and accompanying documentation/specs.

Reviewed changes

Copilot reviewed 59 out of 63 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
scrapers_config.json	Adds `klimadashboard` to deployed scrapers and declares required env var.
README.md	Minor formatting tweak in contributor instructions.
pyproject.toml	Adds `open-mastr` dependency.
docs/superpowers/specs/2026-04-02-msr-wind-open-mastr-design.md	Design spec for open-mastr refactor + scraper/processor split.
ddj_cloud/scrapers/klimadashboard/src/TODO.md	Project TODO notes for dashboard workstreams.
ddj_cloud/scrapers/klimadashboard/src/msr_wind_processor.py	Computes daily wind expansion time series and summary aggregates.
ddj_cloud/scrapers/klimadashboard/src/msr_solar_processor.py	Computes daily solar expansion time series and summary aggregates.
ddj_cloud/scrapers/klimadashboard/src/msr_scraper.py	Downloads MaStR bulk dumps via open-mastr and writes `mastr.db`.
ddj_cloud/scrapers/klimadashboard/src/msr_dw_display.py	Uploads aggregated wind/solar datasets to Datawrapper charts.
ddj_cloud/scrapers/klimadashboard/src/energiemix.py	Fetches Fraunhofer API data and updates Datawrapper charts.
ddj_cloud/scrapers/klimadashboard/README.md	Scraper-specific documentation (architecture, env vars, DB layout).
ddj_cloud/scrapers/klimadashboard/msr_php/xsd/Netze.xsd	Adds PHP legacy schema artifact.
ddj_cloud/scrapers/klimadashboard/msr_php/xsd/Netzanschlusspunkte.xsd	Adds PHP legacy schema artifact.
ddj_cloud/scrapers/klimadashboard/msr_php/xsd/Marktrollen.xsd	Adds PHP legacy schema artifact.
ddj_cloud/scrapers/klimadashboard/msr_php/xsd/Marktfunktionen.xsd	Adds PHP legacy schema artifact.
ddj_cloud/scrapers/klimadashboard/msr_php/xsd/MarktakteureUndRollen.xsd	Adds PHP legacy schema artifact.
ddj_cloud/scrapers/klimadashboard/msr_php/xsd/Lokationstypen.xsd	Adds PHP legacy schema artifact.
ddj_cloud/scrapers/klimadashboard/msr_php/xsd/Lokationen.xsd	Adds PHP legacy schema artifact.
ddj_cloud/scrapers/klimadashboard/msr_php/xsd/Katalogwerte.xsd	Adds PHP legacy schema artifact.
ddj_cloud/scrapers/klimadashboard/msr_php/xsd/Katalogkategorien.xsd	Adds PHP legacy schema artifact.
ddj_cloud/scrapers/klimadashboard/msr_php/xsd/GeloeschteUndDeaktivierteMarktakteure.xsd	Adds PHP legacy schema artifact.
ddj_cloud/scrapers/klimadashboard/msr_php/xsd/GeloeschteUndDeaktivierteEinheiten.xsd	Adds PHP legacy schema artifact.
ddj_cloud/scrapers/klimadashboard/msr_php/xsd/Ertuechtigungen.xsd	Adds PHP legacy schema artifact.
ddj_cloud/scrapers/klimadashboard/msr_php/xsd/EinheitenWind.xsd	Adds PHP legacy schema artifact.
ddj_cloud/scrapers/klimadashboard/msr_php/xsd/EinheitenWasser.xsd	Adds PHP legacy schema artifact.
ddj_cloud/scrapers/klimadashboard/msr_php/xsd/Einheitentypen.xsd	Adds PHP legacy schema artifact.
ddj_cloud/scrapers/klimadashboard/msr_php/xsd/EinheitenStromVerbraucher.xsd	Adds PHP legacy schema artifact.
ddj_cloud/scrapers/klimadashboard/msr_php/xsd/EinheitenStromSpeicher.xsd	Adds PHP legacy schema artifact.
ddj_cloud/scrapers/klimadashboard/msr_php/xsd/EinheitenSolar.xsd	Adds PHP legacy schema artifact.
ddj_cloud/scrapers/klimadashboard/msr_php/xsd/EinheitenKernkraft.xsd	Adds PHP legacy schema artifact.
ddj_cloud/scrapers/klimadashboard/msr_php/xsd/EinheitenGeothermieGrubengasDruckentspannung.xsd	Adds PHP legacy schema artifact.
ddj_cloud/scrapers/klimadashboard/msr_php/xsd/EinheitenGenehmigung.xsd	Adds PHP legacy schema artifact.
ddj_cloud/scrapers/klimadashboard/msr_php/xsd/EinheitenGasverbraucher.xsd	Adds PHP legacy schema artifact.
ddj_cloud/scrapers/klimadashboard/msr_php/xsd/EinheitenGasSpeicher.xsd	Adds PHP legacy schema artifact.
ddj_cloud/scrapers/klimadashboard/msr_php/xsd/EinheitenGasErzeuger.xsd	Adds PHP legacy schema artifact.
ddj_cloud/scrapers/klimadashboard/msr_php/xsd/EinheitenBiomasse.xsd	Adds PHP legacy schema artifact.
ddj_cloud/scrapers/klimadashboard/msr_php/xsd/EinheitenAenderungNetzbetreiberzuordnungen.xsd	Adds PHP legacy schema artifact.
ddj_cloud/scrapers/klimadashboard/msr_php/xsd/Bilanzierungsgebiete.xsd	Adds PHP legacy schema artifact.
ddj_cloud/scrapers/klimadashboard/msr_php/xsd/AnlagenStromSpeicher.xsd	Adds PHP legacy schema artifact.
ddj_cloud/scrapers/klimadashboard/msr_php/xsd/AnlagenKwk.xsd	Adds PHP legacy schema artifact.
ddj_cloud/scrapers/klimadashboard/msr_php/xsd/AnlagenGasSpeicher.xsd	Adds PHP legacy schema artifact.
ddj_cloud/scrapers/klimadashboard/msr_php/xsd/AnlagenEegWind.xsd	Adds PHP legacy schema artifact.
ddj_cloud/scrapers/klimadashboard/msr_php/xsd/AnlagenEegWasser.xsd	Adds PHP legacy schema artifact.
ddj_cloud/scrapers/klimadashboard/msr_php/xsd/AnlagenEegSpeicher.xsd	Adds PHP legacy schema artifact.
ddj_cloud/scrapers/klimadashboard/msr_php/xsd/AnlagenEegSolar.xsd	Adds PHP legacy schema artifact.
ddj_cloud/scrapers/klimadashboard/msr_php/xsd/AnlagenEegGeothermieGrubengasDruckentspannung.xsd	Adds PHP legacy schema artifact.
ddj_cloud/scrapers/klimadashboard/msr_php/xsd/AnlagenEegBiomasse.xsd	Adds PHP legacy schema artifact.
ddj_cloud/scrapers/klimadashboard/msr_php/wka_to_data.php	Adds PHP legacy reference implementation.
ddj_cloud/scrapers/klimadashboard/msr_php/wka_daily.php	Adds PHP legacy reference implementation.
ddj_cloud/scrapers/klimadashboard/msr_php/Tabellenstrukturen.txt	Adds PHP-era DB schema notes for reference.
ddj_cloud/scrapers/klimadashboard/klimadashboard.py	New orchestrator entrypoint (Energiemix + MaStR + DW + S3 uploads).
ddj_cloud/scrapers/klimadashboard/CLAUDE.md	Internal notes on implementation steps/stack.
ddj_cloud/scrapers/klimadashboard/.gitignore	Ignores local MaStR DB files.
ddj_cloud/scrapers/klimadashboard/.copier-answers.yml	Copier template answers for scraper scaffold.
ddj_cloud/scrapers/klimadashboard/.claude/settings.json	Claude tooling settings for local development.
ddj_cloud/scrapers/klimadashboard/init.py	Marks scraper folder as a Python package.
.vscode/extensions.json	Adds VS Code extension recommendation.
.github/workflows/deploy.yml	Exposes `DW_API_KEY_JE` secret for deployment env.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-13T15:39:01Z

+
+| Variable | Beschreibung | Wo? |
+|----------|-------------|-----|
+| `DATAWRAPPER_API_KEY` | API-Token für Datawrapper-Charts | [Datawrapper Account Settings](https://app.datawrapper.de/account/api-tokens), in .env des Projekts |


Die Doku nennt DATAWRAPPER_API_KEY, aber der Scraper-Code (energiemix/msr_dw_display) und die Deploy-Config erwarten DW_API_KEY_JE. Bitte README anpassen, damit die benötigte Umgebungsvariable eindeutig und konsistent ist.

Suggested change

| `DATAWRAPPER_API_KEY` | API-Token für Datawrapper-Charts | [Datawrapper Account Settings](https://app.datawrapper.de/account/api-tokens), in .env des Projekts |

| `DW_API_KEY_JE` | API-Token für Datawrapper-Charts; diese Umgebungsvariable wird von den Skripten und der Deploy-Konfiguration erwartet | [Datawrapper Account Settings](https://app.datawrapper.de/account/api-tokens), in `.env` des Projekts |

Copilot · 2026-04-13T15:39:01Z

+**Isoliertes venv:** Der Scraper nutzt PEP 723 inline script metadata und wird via `uv run`
+in einem eigenen virtuellen Environment ausgeführt (open-mastr benötigt pandas>=2.2,
+das Hauptprojekt nutzt pandas~=1.5).
+


Der Abschnitt zum „isolierten venv via PEP 723 / pandas~=1.5“ passt nicht zum aktuellen Repo: pyproject.toml pinnt pandas auf ~=2.3 und src/msr_scraper.py enthält keine PEP-723 Inline-Metadaten. Bitte diese Aussagen entfernen/aktualisieren, damit Build/Deployment-Erwartungen korrekt sind.

Copilot · 2026-04-13T15:39:02Z

+# mastr.db in local_storage (analog zu anderen Scrapern)
+DB_LOCAL_PATH = Path(__file__).parent.parent.parent.parent / "local_storage" / "klimadashboard" / "mastr.db"
+DB_S3_KEY = "klimadashboard/mastr.db"
+


DB_LOCAL_PATH zeigt auf <repo>/local_storage/.... In AWS Lambda ist das Deployment-Paket i.d.R. read-only; Schreibzugriffe auf diesen Pfad würden fehlschlagen. Bitte DB unter /tmp ablegen (oder rein in-memory arbeiten) und idealerweise zu Beginn mastr.db von S3 herunterladen, damit das Caching über DatumDownload überhaupt wirken kann.

Also da sind glaub ich beide KIs verwirrt.

local_storage sollte im Code tatsächlich eher nicht referenziert werden. Dabei geht es ja nur um eine Emulation des S3-Storages während man lokal testet/entwickelt. Solange du mit dem ddj_cloud.utils.storage Modul arbeitest, entscheidet das ja für dich, ob die Daten gerade von/nach S3 oder local_storage geladen werden.

Aber darüber, wie die Datenbanken jongliert werden, sollten wir noch mal reden.

Copilot · 2026-04-13T15:39:02Z

+    # Energiemix (Fraunhofer API)
+    df = update_energiemix()
+    upload_dataframe(df, "klimadashboard/test_energiemix1.csv")
+


Der Upload-Pfad klimadashboard/test_energiemix1.csv wirkt wie ein Test-Artefakt und könnte in Produktion dauerhaft liegen bleiben. Bitte auf einen stabilen, sprechenden Key umstellen oder entfernen, falls nicht benötigt.

Copilot · 2026-04-13T15:39:02Z

+Charts (alle monatlich):
+- Wind-Ausbau: Gesamtleistung onshore/offshore + nötig
+- Solar-Ausbau: Gesamtleistung + nötig
+- Wind-Zubau: Neue Kapazität onshore/offshore pro Monat
+- Solar-Zubau: Neue Kapazität pro Monat
+"""


Die Modul-Doku kündigt „+ nötig“ (Klimaziel-Linie) für Wind-/Solar-Charts an, aber build_wind_chart_data/build_solar_chart_data verwenden nur installiert/geplant (keine noetig_* Spalten). Entweder die Doku korrigieren oder die noetig_* Aggregationen/Uploads ergänzen, damit das Chart-Setup reproduzierbar ist.

Copilot · 2026-04-13T15:39:02Z

+      "Read(//Users/janeggers/Code/wdr-ddj-cloud/ddj_cloud/scrapers/talsperren/**)",
+      "WebFetch(domain:open-mastr.readthedocs.io)",
+      "WebFetch(domain:api.github.com)",
+      "Read(//Users/janeggers/miniconda3/lib/python3.12/site-packages/open_mastr/**)",
+      "Read(//Users/janeggers/Code/wdr-ddj-cloud/**)"


Diese Datei enthält machine-spezifische absolute Pfade (/Users/...) und lokale Read-Permissions, die für andere Entwickler/CI nicht gültig sind und unnötig repo-spezifische Details einchecken. Bitte die Settings auf repo-relative/portable Pfade reduzieren oder die Datei aus dem Repo entfernen.

Suggested change

"Read(//Users/janeggers/Code/wdr-ddj-cloud/ddj_cloud/scrapers/talsperren/**)",

"WebFetch(domain:open-mastr.readthedocs.io)",

"WebFetch(domain:api.github.com)",

"Read(//Users/janeggers/miniconda3/lib/python3.12/site-packages/open_mastr/**)",

"Read(//Users/janeggers/Code/wdr-ddj-cloud/**)"

"Read(ddj_cloud/scrapers/talsperren/**)",

"WebFetch(domain:open-mastr.readthedocs.io)",

"WebFetch(domain:api.github.com)",

"Read(ddj_cloud/**)"

Copilot · 2026-04-13T15:39:03Z

+- Datawrapper (Charts)
+- SQLite Database (MaStR-Daten)
+- Fraunhofer Energy Charts API (Energiemix)
+- MaStR SOAP API (Windkraft-Ausbau)


In der Tech-Stack-Liste steht noch „MaStR SOAP API (Windkraft-Ausbau)“, aber der aktuelle Scraper nutzt open-mastr (Bulk-Download) und keinen SOAP-Zugriff mehr. Bitte die Datei aktualisieren, damit Architektur/Abhängigkeiten nicht irreführend dokumentiert sind.

Suggested change

- MaStR SOAP API (Windkraft-Ausbau)

- open-mastr (Bulk-Download für Windkraft-Ausbau)

jh0ker

Also so wie es ist wird es in Lambda leider nicht funktionieren.

Zunächst mal gibt es ja wohl zwei verschiedene SQLite-Datenbanken. Einmal die von open-mastr, und dann eine zweite, die du erzeugst. Das sprengt erstens den Rahmen vom Storage, den wir in Lambda out-of-the-box bekommen können. Außerdem fragwürdig, ob wir das wirklich brauchen? Würde es nicht reichen, es bei der open-mastr DB zu belassen, und einfach direkt dagegen Anfragen zu stellen?

Außerdem wird momentan nur deine Custom-DB in S3 geladen (und nie wieder heruntergeladen). Das ist natürlich nicht sinnvoll. Wenn ich mir das ~/.open-MaStR Verzeichnis angucke:

> tree ~/.open-MaStR
/home/jhoeke/.open-MaStR
├── config
│   ├── filenames.yml
│   └── logging.yml
├── data
│   ├── sqlite
│   │   └── open-mastr.db
│   └── xml_download
│       └── Gesamtdatenexport_20260414.zip
└── logs
    └── open_mastr.log

5 directories, 5 files

dann sollten wir vermutlich die data/sqlite/open-mastr.db speichern & wiederherstellen, und möglicherweise noch die config/filenames.yml (falls sich da an den defaults irgendwann mal was ändert).

Wie verhält es sich mit den Gesamtdatenexport_*.zip files? Werden die jeden Tag komplett neu heruntergeladen? Selbst wenn: Caching in S3 kann sich trotzdem lohnen, vor allem während der Testphase, wenn man den Scraper doch noch mehrmals pro Tag laufen lässt. Vielleicht lohnt es sich dann noch, alte Gesamtdatenexporte irgendwann wieder zu löschen (oder die storage class zu ändern), damit der S3-Bucket nicht gegen Unendlich wächst ^^

Ich baue dir dafür gerade noch eine Funktionalität, damit man überhaupt Dateien direkt aus einer lokalen Datei hochladen kann. Momentan geht das ja nur via bytes und das ist ja Quatsch mit 5GB Dateien.

Auch habe ich testweise mal den Scraper bei mir laufen lassen, und die RAM Requirements momentan sind auch ein Problem. Der Prozess wollte ca. 32 GB und allein der Solar-DataFrame (df = pd.read_sql_query("SELECT * FROM solar_extended", db)) belegt 19GB. Am Ende landen aber nur ein paar MB als CSV im Ergebnis.

Vielleicht könnte man immer N Tage laden & mit Pandas verarbeiten. Lambda hat für RAM auch ein 10GB Limit, also das müssen wir auf jeden Fall einhalten.

Außerdem stellt sich mir noch die Frage: Braucht es den msr_php Ordner im Repo? Sieht so aus, als hättest du das (nur) als Referenz benutzt. Bringt es einen Mehrwert (als Doku etc), oder würde ein Link reichen?

jh0ker · 2026-04-14T15:51:09Z

    "ms-python.python",
-    "ms-python.vscode-pylance"
+    "ms-python.vscode-pylance",
+    "continue.continue"


Braucht es das für irgendwas?

jh0ker · 2026-04-14T15:57:07Z

+from datawrapper import Datawrapper
+from dotenv import load_dotenv
+
+load_dotenv()


Das sollte eigentlich nicht (mehr?) nötig sein

jh0ker · 2026-04-14T16:00:25Z


    uv run manage test <scraper_name>
-
+ 


jh0ker · 2026-04-14T16:02:59Z

+# mastr.db in local_storage (analog zu anderen Scrapern)
+DB_LOCAL_PATH = Path(__file__).parent.parent.parent.parent / "local_storage" / "klimadashboard" / "mastr.db"
+DB_S3_KEY = "klimadashboard/mastr.db"
+


Also da sind glaub ich beide KIs verwirrt.

local_storage sollte im Code tatsächlich eher nicht referenziert werden. Dabei geht es ja nur um eine Emulation des S3-Storages während man lokal testet/entwickelt. Solange du mit dem ddj_cloud.utils.storage Modul arbeitest, entscheidet das ja für dich, ob die Daten gerade von/nach S3 oder local_storage geladen werden.

Aber darüber, wie die Datenbanken jongliert werden, sollten wir noch mal reden.

jh0ker · 2026-04-14T16:23:48Z

+
+ENERGY_TYPES = ["wind", "solar", "biomass", "hydro", "combustion", "nuclear", "gsgk", "storage"]
+
+OPEN_MASTR_DB = Path.home() / ".open-MaStR" / "data" / "sqlite" / "open-mastr.db"


Ich glaube nicht, dass das in Lambda funktionieren wird, da wir in AWS Lambda soweit ich weiß keinen Zugriff auf irgendein Home directory haben.

Wir sollten daher wahrscheinlich für open-mastr über das Environment einen OUTPUT_PATH setzen und diesen benutzen. Siehe https://open-mastr.readthedocs.io/en/latest/advanced/#environment-variables

jh0ker · 2026-04-14T16:45:03Z

+        "contact_name": "Jan Eggers",
+        "contact_email": "jan.eggers@fm.wdr.de",
+        "memory_size": "1024",
+        "ephemeral_storage": "512",


Den werden wir auf jeden Fall anheben müssen, wenn die open-mastr Datenbank + XML-Cache ~8GB hat. Lambda geht bis ca. 10GB ephemeral storage, also ein bisschen Luft ist da noch. Aber die Datenbank dann noch einmal zu kopieren können wir uns eigentlich nicht leisten.

jh0ker · 2026-04-14T17:44:35Z

+DB_S3_KEY = "klimadashboard/mastr.db"
+
+
+def _upload_db():


Diese Funktion ist momentan wirklich geil sinnbefreit 😄 Es lädt die DB aus DB_LOCAL_PATH in den RAM und schreibt das dann an genau die selbe Stelle wieder neu hin.

jh0ker · 2026-04-14T18:08:07Z

Was mir auch noch Sorgen bereitet: Bei mir hat der Download vom Gesamtdatenexport schon über 15min gedauert (natürlich auch limitiert durch meinen lokalen Download-Speed). Wenn der wirklich jedes Mal komplett geladen werden muss, kann es sein, dass wir den Scraper aufteilen müssen, weil so eine Lambda kann max. 15min laufen. Ist aber vielleicht auch kein echtes Problem, müssen wir mal ein Auge drauf haben.

Jan Eggers and others added 28 commits April 2, 2026 09:25

Add design spec for msr_wind open-mastr refactoring

3d37560

Split msr_wind.py into scraper + processor using open-mastr library (no API key needed) with S3 database download/upload integration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add implementation plan for MaStR open-mastr refactoring

0acb111

7 tasks: add dependency, create scraper, create processor, update orchestrator with S3, cleanup old code, gitignore, smoke test. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: add MaStR scraper using open-mastr bulk download (isolated venv)

ed362ec

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: add wind processor with daily expansion calculation

aa1f063

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: integrate MaStR scraper+processor with S3 and subprocess

5eb2a89

Scraper runs in isolated venv via uv run (PEP 723). Downloads mastr.db from S3, runs scraper, runs processor, uploads. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

refactor: update documentation for new MaStR architecture

3d2a5be

- README_msr.md: new architecture, no API keys, isolated venv - README.md: updated file references and env vars Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore: add .gitignore for MaStR database files

8f91ad0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: handle mixed NaN/string types in shutdown date computation

2d2e7ae

pd.DataFrame.max(axis=1) fails when comparing NaN (float) with date strings. Convert to datetime first, then take max. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: use WindAnLandOderAufSee column instead of Lage for wind location

80dea6f

Lage column is NULL in open-mastr wind_extended table. The correct column for onshore/offshore filtering is WindAnLandOderAufSee. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: skip MaStR download if DB already has today's data

6816694

Check DatumDownload in wind_extended. If matches today, skip bulk download and return existing counts. Speeds up local testing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

refactor: remove S3 download of mastr.db, keep upload only

c7a3092

open-mastr always does full bulk download, so fetching old DB from S3 is pointless. Scraper rebuilds locally, then uploads. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: add YEARLY_AGGREGATES toggle for DW chart granularity

10009d1

YEARLY_AGGREGATES = False (default) uses monthly data. Set to True for yearly aggregation. Affects all 4 charts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: parse only last line of scraper stdout as JSON

0d7cf3a

open-mastr leaks tqdm/logging to stdout. Our JSON status is always the last line. Take only that for json.loads(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

klimadashboard Commit

b15dc7a

docs: update README with complete architecture

05d06bd

Add solar processor, DW display script, caching, chart IDs, correct S3 flow (upload only), local_storage path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

README auf Stand

1287427

Fix: Aktualisierungsdatum

eb6df1c

🔀 Merge branch 'main' into feature/klimadashboard

3abecac

Aktualisiert auf pandas 2.3, ohne venv für Scraper

0fe566c

jh0ker requested a review from Copilot April 13, 2026 15:32

Copilot started reviewing on behalf of jh0ker April 13, 2026 15:33 View session

Copilot AI reviewed Apr 13, 2026

View reviewed changes

jh0ker requested changes Apr 14, 2026

View reviewed changes

jh0ker added 2 commits April 14, 2026 23:16

🔀 Merge branch 'main' into feature/klimadashboard

cd25241

⬆️ Re-lock for Python 3.13

27ca7e3

	\| `DATAWRAPPER_API_KEY` \| API-Token für Datawrapper-Charts \| [Datawrapper Account Settings](https://app.datawrapper.de/account/api-tokens), in .env des Projekts \|
	\| `DW_API_KEY_JE` \| API-Token für Datawrapper-Charts; diese Umgebungsvariable wird von den Skripten und der Deploy-Konfiguration erwartet \| [Datawrapper Account Settings](https://app.datawrapper.de/account/api-tokens), in `.env` des Projekts \|

	- MaStR SOAP API (Windkraft-Ausbau)
	- open-mastr (Bulk-Download für Windkraft-Ausbau)


		ENERGY_TYPES = ["wind", "solar", "biomass", "hydro", "combustion", "nuclear", "gsgk", "storage"]

		OPEN_MASTR_DB = Path.home() / ".open-MaStR" / "data" / "sqlite" / "open-mastr.db"

		DB_S3_KEY = "klimadashboard/mastr.db"


		def _upload_db():

Conversation

JanEggers-hr commented Apr 13, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jh0ker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jh0ker commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants