Version Checks (indicate both or one)
Issue Description
When using powerplantmatching for some non-European countries in PyPSA-Earth, we have observed some duplication, as mentioned in this old issue. For example, such duplicating appears when running build_powerplants rule of PyPSA-Earth 0.8.0 for Colombia. After some investigation, the reason is the way GEM datasets are being treated if taken in combination with other databases.
Reproducible Example
## basic code
import powerplantmatching as pm
import yaml
config = {
'matching_sources':
[
'GEO', 'GPD',
'GBPT', 'GGPT', 'GCPT', 'GGTPT', 'GNPT', 'GSPT', 'GWPT', 'GHPT'
],
'fully_included_sources':
[
'GEO', 'GPD',
'GBPT', 'GGPT', 'GCPT', 'GGTPT', 'GNPT', 'GSPT', 'GWPT', 'GHPT'
],
'target_countries': 'Colombia',
'main_query': '(DateOut >= 2022 or DateOut != DateOut)'
}
ppl = (
pm.powerplants(
from_url=False,
update=True,
config_update=config,
)
.query('Fueltype not in ["Solar", "Wind"]')
)
ppl.to_csv("test_ppm_co_minimal.csv")
Expected Behavior
A modelling-ready dataset of power plants for Colombia. Instead, a dataset is produced with multiple duplicates which require manual cleaning-up ("test_ppm_co_minimal.csv"). E.g. Termosierra power plant present in two versions: as Termosierra Ccgt Colombia (426 MWe) and Termosierra (460 MWe).
This duplication can be removed if GEM-sources datasets ('GEO', 'GPD') are excluded or if only they are kept (the output files are test_ppm_co_minimal_without_gem.csv and test_ppm_co_minimal_gem_only.csv, respectively).
Installed Versions
Details
powerplantmatching 0.7.1 pyhd8ed1ab_0 conda-forge
Version Checks (indicate both or one)
I have confirmed this bug exists on the lastest release of powerplantmatching.
I have confirmed this bug exists on the current
masterbranch of powerplantmatching.Issue Description
When using powerplantmatching for some non-European countries in PyPSA-Earth, we have observed some duplication, as mentioned in this old issue. For example, such duplicating appears when running
build_powerplantsrule of PyPSA-Earth0.8.0for Colombia. After some investigation, the reason is the way GEM datasets are being treated if taken in combination with other databases.Reproducible Example
Expected Behavior
A modelling-ready dataset of power plants for Colombia. Instead, a dataset is produced with multiple duplicates which require manual cleaning-up ("test_ppm_co_minimal.csv"). E.g. Termosierra power plant present in two versions: as
Termosierra Ccgt Colombia (426 MWe)andTermosierra (460 MWe).This duplication can be removed if GEM-sources datasets (
'GEO', 'GPD') are excluded or if only they are kept (the output files aretest_ppm_co_minimal_without_gem.csvandtest_ppm_co_minimal_gem_only.csv, respectively).Installed Versions
Details
powerplantmatching 0.7.1 pyhd8ed1ab_0 conda-forge