Audience: This is the developer documentation for contributors to the platform. For user-facing documentation (Superset guides, data dictionaries, etc.) see the published site built from
docs/.
An Apache Iceberg-based data lakehouse to support analytics for the facility. See Background for a high-level overview of a data lakehouse.
This repository is a monorepo containing all of the code for the platform. It may be separated out in the future.
.
├── docs/ # User-facing documentation site (MkDocs). See docs/src for content.
├── docs-devel/ # Developer documentation (this directory).
├── elt-common/ # Reusable Python package with common ELT helpers used by the warehouses
├── infra/
│ ├── ansible/ # Ansible playbooks/roles to deploy the system to the STFC (OpenStack) cloud.
│ ├── container-images/ # Container definitions for deployed services
│ └── local/ # docker-compose configuration for local development and end-to-end CI tests.
└── warehouses/ # One subdirectory per Lakekeeper warehouse. Each contains ELT code for that warehouse.
├── facility_ops_landing/ # Ingestion scripts (bronze layer)
└── facility_ops/ # dbt transformation models (silver/gold layers)
New to the project? Start with the Getting Started guide to set up your local development environment.
- Getting started — local setup, first pipeline run
- System architecture — services, tools, ADRs
- Data architecture — medallion layout, catalogs
- ELT pipeline development — how to build and modify pipelines
- CI/CD and testing — running tests, CI workflows
- Deployment — cloud provisioning and service deployment
- Contributing — branching, PRs, code style