Skip to content

feat(elt-pipelines): Add initial project with example pipeline#368

Merged
WHTaylor merged 10 commits into
mainfrom
321-pipelines-project
Jun 29, 2026
Merged

feat(elt-pipelines): Add initial project with example pipeline#368
WHTaylor merged 10 commits into
mainfrom
321-pipelines-project

Conversation

@WHTaylor

@WHTaylor WHTaylor commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

ref #321

Creates an elt-pipelines project with a statusdisplay pipeline, which uses elt-common to ingest data from the ISIS cycles endpoint. The pipeline can be run using the instructions from the README.

  • elt-common is currently included as a dependency in elt-pipelines using a relative path pointing at the package in the parent folder. This makes it easy to work on both locally, but means anything wanting to run pipelines needs both packages in its working directory; don't think it's a big deal, but maybe not ideal? Are we aiming to publish elt-common to PyPI so we can use it as a 'normal' dependency?
  • The new pipeline is ingesting into an elt_cycles table for testing purposes. Once we want to migrate to this pipeline in production, it should probably start ingesting into cycles instead, but because it's using a different schema from the current DLT pipeline we'll need to replace the table entirely, so there will be a bit of extra work needed at the time
  • Something that only just occurred to me - why is it called statusdisplay? Should it change to something like cycles or isiscycles?

Summary by CodeRabbit

  • New Features

    • Added a new pipeline that imports cycle and phase status data from an external status source.
    • Included handling for timestamp formatting and reliable loading of the fetched data.
  • Documentation

    • Added an initial README with setup instructions, project structure, and example pipeline usage.
  • Chores

    • Added project configuration and ignore rules for common Python development and build artefacts.
    • Expanded formatting coverage for the new pipeline area.

@WHTaylor WHTaylor requested a review from a team as a code owner June 24, 2026 14:19
@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 283deb4f-b582-4288-8c6a-0e3591d77eba

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Introduces the elt-pipelines Python package: adds pyproject.toml with runtime/optional dependencies, a .gitignore, a README covering setup and directory conventions, extends the ruff-format pre-commit hook to cover the new package, and implements a first ETL extractor that fetches and normalises accelerator cycle/phase data from the ISIS status API.

Changes

elt-pipelines scaffold and statusdisplay extractor

Layer / File(s) Summary
Package config, tooling, and docs
elt-pipelines/pyproject.toml, elt-pipelines/.gitignore, .pre-commit-config.yaml, elt-pipelines/README.md
Creates pyproject.toml declaring elt-common, pydantic-settings, and an optional statusdisplay extra; adds .gitignore for Python artefacts; extends ruff-format pre-commit hook to include elt-pipelines; adds README with development setup, CLI usage, and directory structure documentation.
statusdisplay ETL extractor
elt-pipelines/facility_ops/ingest/accelerator/statusdisplay/statusdisplay.py
Defines Extract subclassing BaseExtract, wiring an elt_cycles resource in replace mode. Implements fetch() with 20 s timeout and HTTP error handling, reformat()/clean() for ISO-8601 → "%Y-%m-%d %H:%M:%S" normalisation of phase start/end fields, and NDJSON serialisation loaded via pyarrow.json.read_json from an in-memory buffer.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐇 A new pipeline hops into view,
With cycles and phases all shiny and new,
Fetch, reformat, and clean with delight,
Arrow reads JSON through the night,
The warehouse grows — what a sight! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarises the addition of the new elt-pipelines project and its initial example pipeline.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
elt-pipelines/.gitignore (1)

1-8: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Ignore the local virtual environment directory.

The setup instructions create .venv, but it is not ignored here, so it can be accidentally committed.

Suggested patch
 # ignore basic python artifacts
 .env
+.venv/
 **/__pycache__/
 **/*.py[cod]
 **/*$py.class
 **/build/
 **/*.egg-info/
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@elt-pipelines/.gitignore` around lines 1 - 8, The .gitignore entry set is
missing the local virtual environment directory, so update the ignore rules to
also exclude the project’s .venv folder alongside the existing Python artifacts.
Keep the change in the same ignore list near the other environment/build entries
so the setup-created virtual environment is not accidentally committed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@elt-pipelines/pipelines/ingest/accelerator/statusdisplay/statusdisplay.py`:
- Around line 33-38: The fetch() helper currently calls requests.get(CYCLES_URL)
without a timeout and only handles non-OK responses, so transport failures can
escape unhandled. Update fetch() to wrap the requests.get call in try/except
requests.RequestException, add a timeout to the request, and raise one
RuntimeError that includes the CYCLES_URL context for both request failures and
bad responses.
- Around line 26-30: The JSON loading path in statusdisplay’s fetch/read flow
needs an empty-input guard because pyarrow.json.read_json will fail on a
zero-length stream. Update the logic around clean(fetch()) and the yield
pyarrow.json.read_json(f) call to detect when no rows are returned and
immediately yield an empty table or return early instead of building and parsing
an empty buffer.

In `@elt-pipelines/README.md`:
- Around line 16-22: The setup instructions for the `uv sync` flow are
incomplete because `elt-pipelines` depends on the local sibling checkout of
`elt-common`. Update the README section that shows `uv venv`, `source
.venv/bin/activate`, and `uv sync` to explicitly tell users to clone or place
`elt-common` alongside `elt-pipelines` before running those commands, so the
editable local source can be resolved. Use the existing setup text in the README
and mention the dependency on `../elt-common` near the `uv sync` instructions.

---

Outside diff comments:
In `@elt-pipelines/.gitignore`:
- Around line 1-8: The .gitignore entry set is missing the local virtual
environment directory, so update the ignore rules to also exclude the project’s
.venv folder alongside the existing Python artifacts. Keep the change in the
same ignore list near the other environment/build entries so the setup-created
virtual environment is not accidentally committed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 91b01f08-3b41-40bf-bb15-c76f161ed326

📥 Commits

Reviewing files that changed from the base of the PR and between 857eba7 and 781e1d1.

⛔ Files ignored due to path filters (1)
  • elt-pipelines/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • elt-pipelines/.gitignore
  • elt-pipelines/README.md
  • elt-pipelines/pipelines/ingest/accelerator/statusdisplay/statusdisplay.py
  • elt-pipelines/pyproject.toml

Comment thread elt-pipelines/README.md
@martyngigg martyngigg self-assigned this Jun 25, 2026
@martyngigg

Copy link
Copy Markdown
Member

Thanks for this. I'll take a look at the code shortly but to answer the questions:

* `elt-common` is currently included as a dependency in `elt-pipelines` using a relative path pointing at the package in the parent folder. This makes it easy to work on both locally, but means anything wanting to run pipelines needs both packages in its working directory; don't think it's a big deal, but maybe not ideal? Are we aiming to publish `elt-common` to PyPI so we can use it as a 'normal' dependency?

I think for the pipelines here that's fine, at least for now. I'd been mostly hoping to avoid publishing to PyPI if possible as I wasn't really aiming to create a general purpose package for all of the world. In that case the naming is then more challenging. Given you can install with pip from a git url, including a versioned one, then I thought that was a fine way to go. It's pure Python so installing this way should be as easy as a package from PyPi.

* The new pipeline is ingesting into an `elt_cycles` table for testing purposes. Once we want to migrate to this pipeline in production, it should probably start ingesting into `cycles` instead, but because it's using a different schema from the current DLT pipeline we'll need to replace the table entirely, so there will be a bit of extra work needed at the time

Makes sense!

* Something that only just occurred to me - why is it called `statusdisplay`? Should it change to something like `cycles` or `isiscycles`?

I was naming things after the system they came from and the source of the cycles is the system that supports the isis status display. Happy to change if it's found to be too confusing - I'm not particularly wedded to it. It's emphasizing the need for more documentation around this though!

WHTaylor added 2 commits June 25, 2026 12:20
For some reason this didn't pick up when running the linter manually, but did fail in CI https://github.com/ISISNeutronMuon/analytics-data-platform/actions/runs/28166532207/job/83419815751. Making a purposefully incorrect change was enough to trigger the formatter
@WHTaylor

Copy link
Copy Markdown
Contributor Author

ref the last two commits, I was poking around the pre-commit set up (because the markdown-lint step is slightly annoyingly slow) and saw that the ruff linter/formatter needed to be specifically set up to include the new directory. It then didn't pick up the already existing incorrect formatting on commit because the file hadn't changed.

Comment thread elt-pipelines/pipelines/ingest/accelerator/statusdisplay/statusdisplay.py Outdated
Comment thread elt-pipelines/pyproject.toml
WHTaylor added 3 commits June 25, 2026 17:10
This is technically the only use of pydantic-settings in the project, so it could be removed as a dependency. However, any pipeline that wants to include custom configuration will need to extend BaseSettings, so I think we should leave the dependency as is.
Comment thread elt-pipelines/README.md
Comment thread elt-pipelines/README.md Outdated
@WHTaylor WHTaylor merged commit da6cf38 into main Jun 29, 2026
2 checks passed
@WHTaylor WHTaylor deleted the 321-pipelines-project branch June 29, 2026 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants