Skip to content
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
f6f66f3
WHO Tuberculosis Rifampicin Resistant Data Import
pravnkumar-cloudsufi May 18, 2026
61205f3
WHO Tuberculosis Rifampicin Resistant Data Import
pravnkumar-cloudsufi May 18, 2026
dbbc953
WHO Tuberculosis Rifampicin Resistant Data Import
pravnkumar-cloudsufi May 18, 2026
25f03f4
Delete statvar_imports/tuberculosis_rifampicin_resistant/testdata/Tub…
pravnkumar-cloudsufi May 21, 2026
711f99b
Delete statvar_imports/tuberculosis_rifampicin_resistant/testdata/Tub…
pravnkumar-cloudsufi May 21, 2026
e48170a
Changing the test data file name
pravnkumar-cloudsufi May 21, 2026
e803c4f
Changing the test data file name
pravnkumar-cloudsufi May 21, 2026
2927229
Changing the test data file name
pravnkumar-cloudsufi May 21, 2026
830c5ca
Changing the manifest file
pravnkumar-cloudsufi May 21, 2026
7ffb9aa
Changing the manifest file
pravnkumar-cloudsufi May 21, 2026
6fbd602
adding who indicator import - adolescent birth rate (#1935)
smarthg-gi May 18, 2026
c621f62
feat: Add queries to pre-compute derived edges and Cache table (#2015)
SandeepTuniki May 20, 2026
5b95e39
[DCP Ingestion] Replace c/g/Root by dc/g/Root in Seeding. (#2021)
gmechali May 20, 2026
62449d3
[DCP - BQ Federation] Propagate location in to the BQ connection (#2022)
gmechali May 20, 2026
d3ac93c
fix: Register ValuesEntry helper in Observations PROTO BUNDLE (#2023)
SandeepTuniki May 21, 2026
04263a0
Remove VariableMetadata table (#2020)
vish-cs May 21, 2026
f260a8a
Fix the provenance summary generation query (#2024)
SandeepTuniki May 21, 2026
ecaf82c
NASA_VIIRSActiveFiresEvents_Fixes_corrected config path as well as ad…
balit-raibot May 21, 2026
c33de55
fix: Skip "dc/base/" prefix for custom data instances (#2025)
SandeepTuniki May 22, 2026
766cfe9
Add timeout for aggregations (#2026)
vish-cs May 22, 2026
7e2b7bf
Tuberculosis Rifampicin Resistant Import
pravnkumar-code May 26, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions statvar_imports/tuberculosis_rifampicin_resistant/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# WHO Tuberculosis: Tuberculosis: Treatment outcomes of people with RR/MDR-TB

## Overview
This dataset provides the percentage of TB patients who started rifampicin-resistant TB treatment and whose treatment outcome was recorded as treatment success (cured or treatment completed), treatment failed, died, lost to follow-up, or not evaluated, within the reporting period.

## Data Source

**Source URL:**
https://data.who.int/indicators/i/39E4281/F1912F6

The data comes from the official WHO reporting database and includes comprehensive, country-level health metrics detailing annual Tuberculosis notifications and case classifications.

## How To Download Input Data
To download the data, you'll need to run the provided download script `who_data_download_tuberculosis_rifampicin_resistant.py`. This script automatically queries the WHO API for the indicator, merges it with the WHO geographical master list to append standard `iso3` country codes, and saves the cleaned `Tuberculosis_rr_mdr_tb_outcomes.csv` file inside an "input_files" folder.

type of place: Country.

statvars: Health / Tuberculosis.

years: 2010 to 2022

place_resolution: manually.

release_frequency: P1Y

## Processing Instructions
To process the WHO Tuberculosis data and generate statistical variables, use the following commands from your root `data` directory:

**Download input file**
```bash
python3 statvar_imports/tuberculosis_rifampicin_resistant/who_data_download_tuberculosis_rifampicin_resistant.py
```

**For Test Data Run**
```bash
python3 tools/statvar_importer/stat_var_processor.py \
--input_data="statvar_imports/tuberculosis_rifampicin_resistant/testdata/Tuberculosis_rr_mdr_tb_outcomes.csv" \
--pv_map="statvar_imports/tuberculosis_rifampicin_resistant/tuberculosis_rifampicin_resistant_pvmap.csv" \
--output_path="statvar_imports/tuberculosis_rifampicin_resistant/output_files/tuberculosis_rifampicin_resistant" \
--config_file="statvar_imports/tuberculosis_rifampicin_resistant/tuberculosis_rifampicin_resistant_metadata.csv" \
--existing_statvar_mcf="gs://unresolved_mcf/scripts/statvar/stat_vars.mcf"
```

**For Main data run**
```bash
python3 tools/statvar_importer/stat_var_processor.py \
--input_data="statvar_imports/tuberculosis_rifampicin_resistant/input_files/Tuberculosis_rr_mdr_tb_outcomes.csv" \
--pv_map="statvar_imports/tuberculosis_rifampicin_resistant/tuberculosis_rifampicin_resistant_pvmap.csv" \
--output_path="statvar_imports/tuberculosis_rifampicin_resistant/output_files/tuberculosis_rifampicin_resistant" \
--config_file="statvar_imports/tuberculosis_rifampicin_resistant/tuberculosis_rifampicin_resistant_metadata.csv" \
--existing_statvar_mcf="gs://unresolved_mcf/scripts/statvar/stat_vars.mcf"
```

#### Refresh type: Fully Autorefresh
26 changes: 26 additions & 0 deletions statvar_imports/tuberculosis_rifampicin_resistant/manifest.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
{
"import_specifications": [
{
"import_name": "WHO_TuberculosisTreatmentOutcomes_RR_MDR_TB",
Comment thread
pravnkumar-cloudsufi marked this conversation as resolved.
Outdated
"curator_emails": [
"support@datacommons.org"
],
"provenance_url": "<https://data.who.int/indicators/i/39E4281/F1912F6>",
"provenance_description": "Treatment outcomes among those who started rifampicin-resistant TB treatment during a specified reporting period.",
"scripts": [
"who_data_download_tuberculosis_rifampicin_resistant.py",
"../../../tools/statvar_importer/stat_var_processor.py --input_data=input_files/Tuberculosis_rr_mdr_tb_outcomes.csv --pv_map=tuberculosis_rr_mdr_tb_outcomes_pvmap.csv --config_file=tuberculosis_rifampicin_resistant_metadata.csv --output_path=output/tuberculosis_rr_mdr_tb_output --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf"
Comment thread
pravnkumar-cloudsufi marked this conversation as resolved.
Outdated
],
"import_inputs": [
{
"template_mcf": "output/tuberculosis_rr_mdr_tb_output.tmcf",
"cleaned_csv": "output/tuberculosis_rr_mdr_tb_output.csv"
}
],
"source_files": [
"input_files/Tuberculosis_rr_mdr_tb_outcomes.csv"
],
"cron_schedule": "0 10 10,21 * *"
}
]
}

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Node: E:TB_output->E0
observationDate: C:TB_output->observationDate
observationAbout: C:TB_output->observationAbout
value: C:TB_output->value
variableMeasured: C:TB_output->variableMeasured
scalingFactor: 100
typeOf: dcs:StatVarObservation
unit: dcs:Percent
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
config,value
provenance_url,https://data.who.int/indicators/i/39E4281/F1912F6
output_columns,"observationDate,observationAbout,variableMeasured,value,unit,scalingFactor"
Comment thread
pravnkumar-cloudsufi marked this conversation as resolved.
#places_within,country/POL
#place_types,"AdministrativeArea,AdministrativeArea1,AdministrativeArea2,State"
#debug,1
#input_rows,100
#word_delimiter,''
#skip_rows,1
populationType,Person
measuredProperty,count
header_rows,1
mapped_columns,6
dc_api_root,https://api.datacommons.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
key,p1,v1,p2,v2,p3,v3,p4,v4,p5,v5,p6,v6,p7,v7
iso3,observationAbout,country/{Data},,,,,,
YEAR,observationDate,{Data},,,,,,
DISAGGR_1:Died,treatmentOutcome,dcs:DiedDuringTreatment,,,,,,
DISAGGR_1:Lost to follow-up,treatmentOutcome,dcs:LostToFollowUp,,,,,,
DISAGGR_1:Not evaluated,treatmentOutcome,dcs:TreatmentNotEvaluated,,,,,,
DISAGGR_1:Successfully treated,treatmentOutcome,dcs:SuccessfullyTreated,,,,,,
DISAGGR_1:Treatment failed,treatmentOutcome,dcs:TreatmentFailed,,,,,,
VALUE,value,{Number},populationType,dcs:Person,measuredProperty,dcs:count,medicalCondition,dcs:MultidrugOrRifampicinResistantTuberculosis,unit,dcs:Percent,scalingFactor,100,,
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
import os
import requests
import io
import pandas as pd
import logging

# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

def download_tb_rr_mdr_data():
# 1. Get the Clean Data from the API using the new Indicator ID
api_url = "https://xmart-api-public.who.int/DATA_/RELAY_TB_DATA"
params = {
"$filter": "IND_ID eq '39E4281F1912F6'",
"$select": "IND_ID,INDICATOR_NAME,YEAR,COUNTRY,DISAGGR_1,VALUE",
"$format": "csv"
}

logging.info("1. Fetching clean percentage data from WHO API...")
api_response = requests.get(api_url, params=params)
Comment thread
pravnkumar-cloudsufi marked this conversation as resolved.
Outdated

if api_response.status_code != 200:
logging.error(f"Failed to fetch API data. HTTP {api_response.status_code}")
return

# Load the clean API data into a pandas table
api_df = pd.read_csv(io.StringIO(api_response.text))

# 2. Get ONLY the iso3 code from the master database
logging.info("2. Fetching country iso3 codes from WHO master database...")
master_url = "https://extranet.who.int/tme/generateCSV.asp?ds=notifications"
master_response = requests.get(master_url)
Comment thread
pravnkumar-cloudsufi marked this conversation as resolved.
Outdated
if master_response.status_code != 200:
logging.error(f"Failed to fetch master data. HTTP {master_response.status_code}")
return

# We only pull the 'country' (for matching) and 'iso3' columns
geo_columns = ['country', 'iso3']
master_df = pd.read_csv(io.StringIO(master_response.text), usecols=geo_columns).drop_duplicates()

# 3. Merge the two datasets together based on the country name
logging.info("3. Merging data and formatting...")
# The API uses uppercase 'COUNTRY', the master uses lowercase 'country'
merged_df = pd.merge(api_df, master_df, left_on='COUNTRY', right_on='country', how='left')
Comment thread
pravnkumar-cloudsufi marked this conversation as resolved.

# Drop the duplicate lowercase 'country' column used for joining
merged_df = merged_df.drop(columns=['country'])

# Reorder columns so the iso3 code sits right next to the Country name
final_columns = [
'IND_ID', 'INDICATOR_NAME', 'YEAR', 'COUNTRY', 'iso3','DISAGGR_1', 'VALUE'
]
merged_df = merged_df[final_columns]

# 4. Save to CSV in a new folder
output_dir = "statvar_imports/tuberculosis_rifampicin_resistant/input_files"
filename = os.path.join(output_dir, "Tuberculosis_rr_mdr_tb_outcomes.csv")

os.makedirs(output_dir, exist_ok=True)

# Save without the pandas index column
merged_df.to_csv(filename, index=False)
logging.info(f"Success! Data saved locally as '{filename}'")

if __name__ == "__main__":
download_tb_rr_mdr_data()