This script connects to the DHS API and extracts data country by country creating two datasets per country in HDX (national and subnational). The scraper takes around 10 hours to run. It makes in the order of 200 reads from DHS and 1000 read/writes (API calls) to HDX in total. It creates around 7000 temporary files of at most 1Mb in size and uploads them into HDX. It will be run monthly.
Development is currently done using Python 3.13. The environment can be created with:
uv syncThis creates a .venv folder with the versions specified in the project's uv.lock file.
For the script to run, you will need to have a file called .hdx_configuration.yaml in your home directory containing your HDX key, e.g.:
hdx_key: "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"
hdx_read_only: false
hdx_site: prod
You will also need to supply the universal .useragents.yaml file in your home directory as specified in the parameter user_agent_config_yaml passed to facade in run.py. The collector reads the key hdx-scraper-dhs as specified in the parameter user_agent_lookup.
Alternatively, you can set up environment variables: USER_AGENT, HDX_KEY,
HDX_SITE, EXTRA_PARAMS, TEMP_DIR, and LOG_FILE_ONLY.
To run, execute:
uv run python -m hdx.scraper.dhspre-commit will be installed when syncing uv. It is run every time you make a git commit if you call it like this:
pre-commit installWith pre-commit, all code is formatted according to ruff guidelines.
To check if your changes pass pre-commit without committing, run:
pre-commit run --all-filesuv is used for package management. If
you've introduced a new package to the source code (i.e. anywhere in src/),
please add it to the project.dependencies section of pyproject.toml with
any known version constraints.
To add packages required only for testing, add them to the
[dependency-groups].
Any changes to the dependencies will be automatically reflected in
uv.lock with pre-commit, but you can re-generate the files without committing by
executing:
uv lock --upgradeuv is used for project management. The project can be built using:
uv buildLinting and syntax checking can be run with:
uv run ruff checkTo run the tests and view coverage, execute:
uv run pytest