Skip to content

feat: Gtfs change tracker 1634#1737

Open
jcpitre wants to merge 22 commits into
mainfrom
gtfs_change_tracker-1634
Open

feat: Gtfs change tracker 1634#1737
jcpitre wants to merge 22 commits into
mainfrom
gtfs_change_tracker-1634

Conversation

@jcpitre

@jcpitre jcpitre commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

closes #1634

This pull request introduces a new GTFS change tracking feature to the pipeline. It adds a new Cloud Function (gtfs-change-tracker) that computes and stores structured diffs between pairs of GTFS datasets, integrates this function into the batch processing workflow, and updates the infrastructure to support the new service. The changes span Python code, requirements, and Terraform infrastructure.

GTFS Change Tracking Feature:

  • Added a new Cloud Function gtfs-change-tracker that computes diffs between two GTFS datasets, uploads the changelog to GCS, and persists a summary in the database. This includes the main logic (src/main.py), configuration (function_config.json), and dependencies (requirements.txt, requirements_dev.txt). [1] [2] [3] [4]

  • Implemented a utility function create_http_gtfs_change_tracker_task to enqueue Cloud Tasks for the new function, and integrated it into the pipeline so that a change tracking task is created when a new dataset is processed and a previous dataset exists. [1] [2] [3]

Infrastructure and Deployment:

  • Updated Terraform to provision the new Cloud Function, deploy its code, create a dedicated Cloud Tasks queue, and set up required IAM permissions. This ensures the function is properly deployed and can be invoked by the batch processor. [1] [2] [3] [4] [5] [6] [7]Summary:

Testing

Testing was done with only 1 datasset. Here is the curl call used:

 curl -X POST "https://northamerica-northeast1-mobility-feeds-dev.cloudfunctions.net/gtfs-change-tracker-dev" \ 
   -H "Authorization: bearer $(gcloud auth print-identity-token)" \
   -H "Content-Type: application/json" \
   -d '{
     "feed_stable_id": "ntd-90200",
     "base_dataset_stable_id": "ntd-90200-202604081703",
     "new_dataset_stable_id": "ntd-90200-202605211420"
}'

The resulting json diff report is:
ntd-90200-202605211420_ntd-90200-202604081703_changelog.json

Please make sure these boxes are checked before submitting your pull request - thanks!

  • Run the unit tests with ./scripts/api-tests.sh to make sure you didn't break anything
  • Add or update any needed documentation to the repo
  • Format the title like "feat: [new feature short description]". Title must follow the Conventional Commit Specification(https://www.conventionalcommits.org/en/v1.0.0/).
  • Linked all relevant issues
  • Include screenshot(s) showing how this pull request works and fixes the issue(s)

@jcpitre jcpitre changed the title Gtfs change tracker 1634 feat: Gtfs change tracker 1634 Jun 11, 2026
Comment thread functions-python/gtfs_datasets_comparer/src/main.py
Comment thread infra/functions-python/main.tf Outdated
Comment thread functions-python/gtfs_datasets_comparer/src/main.py
Comment thread functions-python/gtfs_change_tracker/README.md Outdated
Comment thread infra/functions-python/main.tf Outdated
# google_cloudfunctions2_function does not expose volume mounts in its schema.
# This terraform_data resource mounts both the datasets GCS bucket and an in-memory tmpfs
# on the underlying Cloud Run service after the function is deployed.
resource "terraform_data" "gtfs_change_tracker_gcs_mount" {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇

Comment thread functions-python/gtfs_change_tracker/src/main.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Backend] Generate and upload changelog JSON when a new dataset version is detected

4 participants