Skip to content

Latest commit

 

History

History
159 lines (112 loc) · 4.63 KB

File metadata and controls

159 lines (112 loc) · 4.63 KB

Getting Started

This guide walks you through setting up a local development environment and running your first end-to-end pipeline.

Note: The steps defined here assume using a Linux-like terminal. On Windows you'll need to use WSL.

Prerequisites

Install the following tools before proceeding:

Tool Purpose Install guide
Docker Runs the local service stack Ensure at least 4 CPU / 8 GB RAM allocated
uv Python and virtual environment management Installation
prek Pre-commit hooks / static checks pip install prek
Git Version control Your OS package manager

Clone the repository

git clone git@github.com:ISISNeutronMuon/analytics-data-platform.git
cd analytics-data-platform

Install uv-managed Python interpreter

The Python version is defined in elt-common/pyproject.toml, use this to install the correct version of Python:

cd elt-common
uv python install
cd ..

Setup a Python environment

From the root of the repository run:

uv venv

Run source .venv/bin/activate to activate the Python environment in the current shell. Install the elt-common package and dev dependencies in editable mode:

cd elt-common
uv pip install --editable . --group dev

See elt-common/README.md for more details.

Install pre-commit hooks

prek install

Check prek runs successfully:

prek run --all-files

Setup /etc/hosts

Add the following entry to /etc/hosts on your machine:

127.0.0.1    adp-router

Explanation: Some tools (PyIceberg, Superset in the browser) need to resolve service URLs consistently inside and outside the Docker container network. The adp-router service is defined in docker-compose and points to a Traefik instance that routes traffic to the appropriate service. This definition means the adp-router domain on the host points to the local machine as it does inside the docker network.

Configure dlt secrets

Create (or append to) $HOME/.dlt/secrets.toml with the local development credentials:

[destination.pyiceberg.credentials]
uri = "http://localhost:50080/iceberg/catalog"
warehouse = "facility_ops_landing"
oauth2_server_uri = "http://localhost:50080/auth/realms/analytics-data-platform/protocol/openid-connect/token"
client_id = "machine-infra"
client_secret = "s3cr3t"
scope = "lakekeeper"

Start the local service stack

Bring up all services with Docker Compose:

cd infra/local
docker compose --profile superset up --wait

Once running, the following services are available:

Service URL Credentials
Keycloak (master realm) http://localhost:50080/auth admin / admin
Lakekeeper UI http://localhost:50080/iceberg/ui adpsuperuser / adppassword
Superset http://localhost:50080/workspace/facility_ops adpsuperuser / adppassword
Trino https://localhost:58443 (use --insecure flag)
Marimo notebooks http://localhost:50080/marimo/

Run your first pipeline

1. Ingest data (landing layer)

From the repository root, run an ingestion script:

cd warehouses/facility_ops_landing/ingest/accelerator/statusdisplay
python statusdisplay.py

Verify the data landed by logging into the Lakekeeper UI and checking that the facility_ops_landing warehouse contains tables in the accelerator_statusdisplay namespace.

Note: There is currently an issue with the table preview tab being broken. See #329.

2. Transform data (analytics layer)

Run the dbt models that depend on the ingested data:

cd warehouses/facility_ops/transform
uv pip install -r ./requirements/requirements.txt
dbt run --select '+models/marts/accelerator/cycles.sql'

You will see InsecureRequestWarning messages — this is expected locally due to the self-signed Trino certificate.

3. Query the results

Log in to Superset, open SQL Lab from the SQL tab, and run:

SELECT * FROM facility_ops.analytics_accelerator.cycles

Next steps