This guide walks you through setting up a local development environment and running your first end-to-end pipeline.
Note: The steps defined here assume using a Linux-like terminal. On Windows you'll need to use WSL.
Install the following tools before proceeding:
| Tool | Purpose | Install guide |
|---|---|---|
| Docker | Runs the local service stack | Ensure at least 4 CPU / 8 GB RAM allocated |
| uv | Python and virtual environment management | Installation |
| prek | Pre-commit hooks / static checks | pip install prek |
| Git | Version control | Your OS package manager |
git clone git@github.com:ISISNeutronMuon/analytics-data-platform.git
cd analytics-data-platformThe Python version is defined in elt-common/pyproject.toml, use this to install the correct
version of Python:
cd elt-common
uv python install
cd ..From the root of the repository run:
uv venvRun source .venv/bin/activate to activate the Python environment in the current shell.
Install the elt-common package and dev dependencies in editable mode:
cd elt-common
uv pip install --editable . --group devSee elt-common/README.md for more details.
prek installCheck prek runs successfully:
prek run --all-filesAdd the following entry to /etc/hosts on your machine:
127.0.0.1 adp-routerExplanation: Some tools (PyIceberg, Superset in the browser) need to resolve service URLs consistently
inside and outside the Docker container network. The adp-router service is defined in docker-compose
and points to a Traefik instance that routes traffic to the appropriate service.
This definition means the adp-router domain on the host points to the local machine as it does
inside the docker network.
Create (or append to) $HOME/.dlt/secrets.toml with the local development credentials:
[destination.pyiceberg.credentials]
uri = "http://localhost:50080/iceberg/catalog"
warehouse = "facility_ops_landing"
oauth2_server_uri = "http://localhost:50080/auth/realms/analytics-data-platform/protocol/openid-connect/token"
client_id = "machine-infra"
client_secret = "s3cr3t"
scope = "lakekeeper"Bring up all services with Docker Compose:
cd infra/local
docker compose --profile superset up --waitOnce running, the following services are available:
| Service | URL | Credentials |
|---|---|---|
| Keycloak (master realm) | http://localhost:50080/auth | admin / admin |
| Lakekeeper UI | http://localhost:50080/iceberg/ui | adpsuperuser / adppassword |
| Superset | http://localhost:50080/workspace/facility_ops | adpsuperuser / adppassword |
| Trino | https://localhost:58443 | (use --insecure flag) |
| Marimo notebooks | http://localhost:50080/marimo/ | — |
From the repository root, run an ingestion script:
cd warehouses/facility_ops_landing/ingest/accelerator/statusdisplay
python statusdisplay.pyVerify the data landed by logging into the
Lakekeeper UI and checking that the
facility_ops_landing warehouse contains tables in the accelerator_statusdisplay namespace.
Note: There is currently an issue with the table preview tab being broken. See #329.
Run the dbt models that depend on the ingested data:
cd warehouses/facility_ops/transform
uv pip install -r ./requirements/requirements.txt
dbt run --select '+models/marts/accelerator/cycles.sql'You will see InsecureRequestWarning messages — this is expected locally due to
the self-signed Trino certificate.
Log in to Superset, open SQL Lab from the SQL tab, and run:
SELECT * FROM facility_ops.analytics_accelerator.cycles- Read the ELT pipeline development guide to understand how to add new data sources and transformations.
- Review the system architecture to understand how the services fit together.
- See contributing for branching and PR conventions.