This repository contains a lightweight Python pipeline that downloads a curated set of pupil attendance releases from GOV.WALES, combines them into a tidy dataset, and produces static visual outputs.
The example is intentionally small and practical:
- it downloads four representative releases instead of scraping the full publication history
- it builds one combined CSV for inspection and reuse
- it creates a few static PNG charts that tell a clear story
The analysis focus is Bridgend County Borough attendance in the context of Wales-wide attendance trends, with a specific interest in primary-age patterns where the published data allows it.
The pipeline uses the attendance publication series at:
Included sources:
- the latest release workbook, used for Wales multi-year overall and sector history
- 2023/24 final release, used for Bridgend local-authority extraction
- 2024/25 final release, used for Bridgend local-authority extraction
- 2025/26 latest year-to-date release, used for Bridgend local-authority extraction and the latest ranking chart
The public ODS files provide:
- Wales attendance by sector, including primary schools
- local-authority attendance for all maintained schools
The public ODS files do not expose a public Bridgend-by-primary split. Older releases also change workbook structure materially, so not every publication is suitable for the same parser. Because of that, this example uses:
- Bridgend local-authority attendance for all maintained schools
- Wales sector-level attendance trends, including primary schools
That keeps the example faithful to the public source rather than inferring a breakdown that is not published.
Install dependencies:
pip install -r requirements.txtInstall the reusable diagram package in editable mode:
pip install -e .This repository contains a configuration of pre-commit hooks. These are language agnostic and focussed on repository security such as detection of passwords and API keys. If approaching this project as a developer, you are encouraged to install and enable pre-commit by running the following in your shell:
- Install pre-commit:
pip install pre-commit- Enable pre-commit:
pre-commit installRun the pipeline:
python src/ingest_data.pyThe script will:
- download the selected ODS files into the data directory as a local cache
- extract Wales overall, Wales sector, and local-authority attendance measures
- write a combined tidy CSV to the outputs directory
- generate static PNG charts
Generated files:
- outputs/attendance_metrics.csv
- outputs/bridgend_vs_wales_attendance.png
- outputs/wales_sector_attendance.png
- outputs/latest_local_authority_ranking.png
- outputs/attendance_pipeline.mmd
- outputs/attendance_pipeline_static.mmd
The pipeline now includes decorator-based Mermaid tracing.
@mermaid_step(...)marks a function as a diagram node@mermaid_flow(...)traces a top-level run and writes a Mermaid flowchart file
This means process diagrams are generated from the code itself, so when decorated functions are added, removed, or rearranged, the flow diagram updates with the implementation.
The diagram tooling has been extracted into an installable package:
- package source:
src/pipeline_diagrams/ - package name:
pipeline-diagrams - import path after installation:
pipeline_diagrams
Example usage in another pipeline:
from pathlib import Path
from pipeline_diagrams import mermaid_flow, mermaid_step, write_static_flow_diagram
@mermaid_step("Load data")
def load_data() -> None:
pass
@mermaid_step("Transform data")
def transform_data() -> None:
load_data()
@mermaid_flow(Path("outputs/runtime_flow.mmd"), title="Runtime Pipeline Flow")
@mermaid_step("Run pipeline")
def main() -> None:
transform_data()
write_static_flow_diagram(
source_root=Path("src"),
output_path=Path("outputs/static_flow.mmd"),
entrypoints=["main"],
title="Static Pipeline Flow",
)Compatibility wrappers remain in src/flow_diagram.py and src/static_flow_diagram.py so the current attendance example continues to run without changes to its command-line entry point.
The repository also generates a static Mermaid diagram from Python source inspection.
- runtime diagram:
outputs/attendance_pipeline.mmd - static diagram:
outputs/attendance_pipeline_static.mmd
The static version is built from the Python AST and follows direct function calls from the main entry point, so it can show possible structure without executing the full pipeline path.
flowchart TD
A[GOV.WALES release pages] --> B[Download curated ODS files]
B --> C[Parse Table_1, Table_2, Table_9]
C --> D[Normalise tidy attendance dataset]
D --> E[Combined CSV output]
D --> F[Static PNG charts]
Run the tests with:
python -m unittest discover -s testsThe code, unless otherwise stated, is released under the MIT Licence.
The documentation for this work is subject to Crown copyright and is available under the terms of the Open Government 3.0 licence.