Skip to content

[Backend] Generate and upload changelog JSON when a new dataset version is detected #1634

@cka-y

Description

@cka-y

Describe the problem

There is no persistent artifact capturing what changed between two subsequent datasets for a given feed, and no automated process to generate and store it.

Proposed solution

Create a new Cloud Function gtfs-change-tracker (in functions-python/gtfs_change_tracker/) that orchestrates change tracking for MobilityData feeds. It is HTTP-triggered and handles orchestration via Cloud Tasks.

Responsibilities:

  1. Receive feed_id, previous_dataset_id, current_dataset_id
  2. Resolve GCS URLs for both datasets
  3. Call the gtfs_diff module with those URLs
  4. Upload the resulting changelog JSON to GCS at:
    <dataset_bucket>/<feed_stable_id>/changelogs/<previous_dataset_id>_<current_dataset_id>_changelog.json
  5. Write one row to gtfs_dataset_changelog with the GCS URL

This function is intentionally MobilityData-specific. All generic diffing logic lives in gtfs_diff (see Issue 2a).

Alternatives considered

  • Inline the diff logic here: rejected in favour of keeping gtfs_diff reusable and portable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions