Skip to content

OCHA-DAP/hdx-scraper-dhs

Repository files navigation

Collector for DHS Datasets

Build Status Coverage Status Ruff

This script connects to the DHS API and extracts data country by country creating two datasets per country in HDX (national and subnational). The scraper takes around 10 hours to run. It makes in the order of 200 reads from DHS and 1000 read/writes (API calls) to HDX in total. It creates around 7000 temporary files of at most 1Mb in size and uploads them into HDX. It will be run monthly.

Development

Environment

Development is currently done using Python 3.13. The environment can be created with:

    uv sync

This creates a .venv folder with the versions specified in the project's uv.lock file.

Installing and running

For the script to run, you will need to have a file called .hdx_configuration.yaml in your home directory containing your HDX key, e.g.:

hdx_key: "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"
hdx_read_only: false
hdx_site: prod

You will also need to supply the universal .useragents.yaml file in your home directory as specified in the parameter user_agent_config_yaml passed to facade in run.py. The collector reads the key hdx-scraper-dhs as specified in the parameter user_agent_lookup.

Alternatively, you can set up environment variables: USER_AGENT, HDX_KEY, HDX_SITE, EXTRA_PARAMS, TEMP_DIR, and LOG_FILE_ONLY.

To run, execute:

    uv run python -m hdx.scraper.dhs

Pre-commit

pre-commit will be installed when syncing uv. It is run every time you make a git commit if you call it like this:

    pre-commit install

With pre-commit, all code is formatted according to ruff guidelines.

To check if your changes pass pre-commit without committing, run:

    pre-commit run --all-files

Packages

uv is used for package management. If you've introduced a new package to the source code (i.e. anywhere in src/), please add it to the project.dependencies section of pyproject.toml with any known version constraints.

To add packages required only for testing, add them to the [dependency-groups].

Any changes to the dependencies will be automatically reflected in uv.lock with pre-commit, but you can re-generate the files without committing by executing:

    uv lock --upgrade

Project

uv is used for project management. The project can be built using:

    uv build

Linting and syntax checking can be run with:

    uv run ruff check

To run the tests and view coverage, execute:

    uv run pytest

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors