Date: 2025-11-15 Purpose: Research findings for implementing general solutions for SDMX-based providers (OECD, Eurostat, BIS)
Key Finding: SDMX REST APIs are complex and require dataset-specific dimension knowledge. Each dataflow has unique dimension structures that must be discovered dynamically for a general solution.
General Solution Approach:
- Query Data Structure Definition (DSD) endpoint to discover dimensions
- Use wildcards (
.) for unknown dimensions - Cache DSD results to avoid repeated lookups
https://sdmx.oecd.org/public/rest
Purpose: Get metadata about a dataflow's dimensions and codelists
Format:
GET /datastructure/{agency}/{dsd_id}/{version}?references=children&detail=full
Headers:
Accept: application/vnd.sdmx.structure+json;version=1.0
Example:
https://sdmx.oecd.org/public/rest/datastructure/OECD.SDD.TPS/DSD_LFS/1.0?references=children&detail=full
Response Structure:
{
"data": {
"dataStructures": [{
"id": "DSD_LFS",
"name": "Labour force statistics",
"dataStructureComponents": {
"dimensionList": {
"dimensions": [
{
"id": "REF_AREA",
"position": 0,
"localRepresentation": {
"enumeration": "urn:sdmx:org.sdmx.infomodel.codelist.Codelist=OECD:CL_AREA(1.1)"
}
},
// ... more dimensions
]
}
}
}]
}
}Format:
GET /data/{agency},{dsd}@{dataflow},{version}/{dimension_key}?parameters
Dimension Key Format: Values separated by . (dot)
- Use actual value for known dimensions
- Use
.(empty) for wildcard (all values) - Number of positions must match DSD dimension count
Example: Unemployment data (9 dimensions required)
https://sdmx.oecd.org/public/rest/data/OECD.SDD.TPS,DSD_LFS@DF_IALFS_UNE_M,1.0/DEU........M
↑ ↑
Country (pos 0) Freq (pos 8)
Headers:
Accept: application/vnd.sdmx.data+json;version=2.0.0
Query Parameters:
startPeriod=YYYY-MM- Start dateendPeriod=YYYY-MM- End datedimensionAtObservation=AllDimensions- Include all dimensions
DSD: DSD_LFS
Agency: OECD.SDD.TPS
Dimensions (9 total):
REF_AREA(position 0) - Country code (e.g.,DEU,USA,FRA)MEASURE(position 1) - Measurement typeUNIT_MEASURE(position 2) - Units (e.g.,PT_LF_SUB,PC)TRANSFORMATION(position 3) - Data transformation (e.g.,_Zfor none)ADJUSTMENT(position 4) - Seasonal adjustment (e.g.,Y/N)SEX(position 5) - Gender (e.g.,_Tfor total)AGE(position 6) - Age group (e.g.,Y_GE15)ACTIVITY(position 7) - Economic activity sectorFREQ(position 8) - Frequency (e.g.,Mfor monthly,Afor annual)
Working Query Pattern:
/data/OECD.SDD.TPS,DSD_LFS@DF_IALFS_UNE_M,1.0/{country}........{freq}?startPeriod={date}
DSD: DSD_NAAG
Agency: OECD.SDD.NAD
Dimensions (5 total):
FREQ(position 0) - FrequencyREF_AREA(position 1) - CountryMEASURE(position 2) - Measurement typeUNIT_MEASURE(position 3) - UnitsCHAPTER(position 4) - Chapter/category
Working Query Pattern:
/data/OECD.SDD.NAD,DSD_NAAG@DF_NAAG_I,1.0/{freq}.{country}...
-
HTTP 406 Error: Wrong
Acceptheader- Solution: Use
application/vnd.sdmx.structure+jsonfor structure queries - Use
application/vnd.sdmx.data+jsonfor data queries
- Solution: Use
-
HTTP 422 "Not enough key values": Incomplete dimension key
- Solution: Provide all required dimensions (use
.for wildcards) - Example: 9 dimensions needed → provide 9 positions
- Solution: Provide all required dimensions (use
-
HTTP 404 "NoResultsFound": Invalid dimension values or no data
- Solution: Query DSD to find valid codelist values
- Use wildcards if unsure of valid values
Format:
https://stats.oecd.org/SDMX-JSON/data/{dataset}/{key}/all?contentType=csv
Note: This API was updated in July 2024. Some endpoints may be deprecated.
Recommended for Python applications: Handles OECD complexity automatically
import pandasdmx as pdmx
oecd = pdmx.Request("OECD")
data = oecd.data(
resource_id="DF_IALFS_UNE_M",
key="DEU/all?startTime=2023-01"
).to_pandas()https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1
- Query Parameter Filtering: Dimensions passed as query parameters, not in URL path
- Simpler Structure: More forgiving with missing dimensions
- Dataset-Specific Defaults: Each dataset has common default dimensions
Structure Query:
GET /datastructure/ESTAT/{dataset_code}
Accept: application/vnd.sdmx.structure+json
Data Query:
GET /data/{dataset_code}?format=JSON&lang=EN&geo={country}&freq={freq}&startPeriod={year}&endPeriod={year}&{dimension}={value}...
https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/une_rt_a?format=JSON&lang=EN&geo=DE&freq=A&startPeriod=2023&endPeriod=2023&age=Y15-74&sex=T
Required Dimensions:
geo- Country (e.g.,DE,FR,IT)freq- Frequency (e.g.,Aannual,Mmonthly)age- Age group (e.g.,Y15-74)sex- Gender (e.g.,Ttotal,Mmale,Ffemale)startPeriod,endPeriod- Time range (YYYY format for annual)
Reference: See backend/providers/eurostat.py line 47-90
Pattern: Each dataset has a set of default dimension values that work for most queries.
Example:
DATASET_DEFAULT_FILTERS = {
"une_rt_a": {"age": "Y15-74", "sex": "T"}, # Unemployment rate
"nama_10_gdp": {"na_item": "B1GQ", "unit": "CP_MEUR"}, # GDP
"prc_hicp_aind": {"coicop": "CP00"}, # Inflation
}- Query dataset structure (optional - can use defaults)
- Apply dataset-specific default filters
- Always include:
geo,freq,startPeriod,endPeriod - Add dataset-specific dimensions from defaults
https://stats.bis.org/api/v1
Data Query:
GET /data/{dataflow_code}/{dimension_key}?startPeriod={date}&endPeriod={date}
Accept: application/vnd.sdmx.data+json;version=1.0.0
Dimensions: {freq}.{country}
https://stats.bis.org/api/v1/data/WS_CBPOL/M.US?startPeriod=2023-01&endPeriod=2023-12
Dimensions: {freq}.{country}.{sector}
https://stats.bis.org/api/v1/data/WS_CREDIT_GAP/Q.US.P?startPeriod=2023-Q1&endPeriod=2023-Q4
Sector Codes:
P- Private non-financial sectorG- GovernmentH- Households
Dimensions: {freq}.{country}.{sector}.{unit}
https://stats.bis.org/api/v1/data/WS_CREDIT/Q.US.P.770
Unit Codes:
770- All sectors
Reference: See backend/providers/bis.py line 165-196
Simple datasets (2 dimensions):
WS_CBPOL,WS_XRU,WS_EER,WS_PP→{freq}.{country}
Complex datasets (3-4 dimensions):
WS_CREDIT,WS_DSR→{freq}.{country}.P.770WS_DBS→{freq}.{country}.A.A.AWS_GLI→{freq}.{country}.A.M
- Map indicator name to BIS dataflow code
- Use dataflow-specific dimension pattern
- Default to
M(monthly) frequency if not specified - Use ISO 2-letter country codes (US, GB, DE, etc.)
Step 1: DSD Cache
class DSDCache:
"""Cache Data Structure Definitions to avoid repeated API calls."""
def __init__(self):
self.cache = {} # {dataflow_id: dsd_metadata}
async def get_dsd(self, provider, agency, dsd_id, version):
cache_key = f"{provider}:{agency}:{dsd_id}:{version}"
if cache_key not in self.cache:
# Fetch from API
dsd = await self._fetch_dsd(provider, agency, dsd_id, version)
self.cache[cache_key] = dsd
return self.cache[cache_key]Step 2: Dimension Discovery
async def build_dimension_key(self, dsd, user_params):
"""Build dimension key with wildcards for unknown dimensions."""
dimensions = dsd['dimensions'] # List of {id, position, codelist}
key_parts = [''] * len(dimensions)
# Fill known dimensions
for dim in dimensions:
dim_id = dim['id']
position = dim['position']
if dim_id in user_params:
key_parts[position] = user_params[dim_id]
elif dim_id in DEFAULT_VALUES:
key_parts[position] = DEFAULT_VALUES[dim_id]
else:
key_parts[position] = '' # Wildcard
return '.'.join(key_parts)Step 3: Provider-Specific Handlers
class SDMXProvider:
def __init__(self, provider_type):
self.provider_type = provider_type # 'OECD', 'EUROSTAT', 'BIS'
self.dsd_cache = DSDCache()
async def fetch_data(self, dataflow, params):
# Get DSD
dsd = await self.dsd_cache.get_dsd(...)
# Build query based on provider type
if self.provider_type == 'OECD':
return await self._fetch_oecd_style(dsd, dataflow, params)
elif self.provider_type == 'EUROSTAT':
return await self._fetch_eurostat_style(dsd, dataflow, params)
elif self.provider_type == 'BIS':
return await self._fetch_bis_style(dsd, dataflow, params)Option 1: pandasdmx
pip install pandasdmxPros:
- Handles SDMX complexity automatically
- Supports 20+ data providers
- Active maintenance
Cons:
- Additional dependency
- Learning curve
- May not support all our custom requirements
Option 2: pysdmx
pip install pysdmxPros:
- Modern, well-documented
- Async support
- Pythonic API
Cons:
- Newer library (less battle-tested)
- May have breaking changes
scripts/test_oecd_dsd.py- Query OECD DSD endpointsscripts/test_oecd_wildcards.py- Test wildcard dimension patterns
python3 scripts/test_oecd_dsd.py
python3 scripts/test_oecd_wildcards.py/tmp/oecd_dsd_DF_IALFS_UNE_M_attempt1.json- Unemployment DSD/tmp/oecd_dsd_DF_NAAG_I_attempt1.json- GDP DSD/tmp/oecd_dsd_DF_PRICES_ALL_attempt1.json- Prices DSD
- Eurostat: Already working with dimension defaults ✅
- BIS: Simpler structure - implement dataflow-specific patterns ✅
- OECD: Document as "complex" - requires DSD lookup system
- Implement DSD cache system
- Add dynamic dimension key building
- Test with top 5-10 dataflows per provider
- Consider integrating pandasdmx library
- Build admin UI for managing dataflow mappings
- Implement automatic DSD refresh
SDMX APIs are powerful but complex. The key to a general solution is:
- Dynamic DSD lookup - Don't hardcode dimensions
- Intelligent defaults - Use common dimension values as fallbacks
- Caching - Avoid repeated structure queries
- Provider-specific handling - Each API has quirks (query params vs URL path)
Next Steps:
- Implement DSD cache system in
backend/services/dsd_cache.py - Create unified SDMX provider base class
- Migrate Eurostat, OECD, BIS to use dynamic DSD lookup