Describe the problem
There is no persistent artifact capturing what changed between two subsequent datasets for a given feed, and no automated process to generate and store it.
Proposed solution
Create a new Cloud Function gtfs-change-tracker (in functions-python/gtfs_change_tracker/) that orchestrates change tracking for MobilityData feeds. It is HTTP-triggered and handles orchestration via Cloud Tasks.
Responsibilities:
- Receive
feed_id, previous_dataset_id, current_dataset_id
- Resolve GCS URLs for both datasets
- Call the
gtfs_diff module with those URLs
- Upload the resulting changelog JSON to GCS at:
<dataset_bucket>/<feed_stable_id>/changelogs/<previous_dataset_id>_<current_dataset_id>_changelog.json
- Write one row to
gtfs_dataset_changelog with the GCS URL
This function is intentionally MobilityData-specific. All generic diffing logic lives in gtfs_diff (see Issue 2a).
Alternatives considered
- Inline the diff logic here: rejected in favour of keeping
gtfs_diff reusable and portable.
Describe the problem
There is no persistent artifact capturing what changed between two subsequent datasets for a given feed, and no automated process to generate and store it.
Proposed solution
Create a new Cloud Function
gtfs-change-tracker(infunctions-python/gtfs_change_tracker/) that orchestrates change tracking for MobilityData feeds. It is HTTP-triggered and handles orchestration via Cloud Tasks.Responsibilities:
feed_id,previous_dataset_id,current_dataset_idgtfs_diffmodule with those URLs<dataset_bucket>/<feed_stable_id>/changelogs/<previous_dataset_id>_<current_dataset_id>_changelog.jsongtfs_dataset_changelogwith the GCS URLThis function is intentionally MobilityData-specific. All generic diffing logic lives in
gtfs_diff(see Issue 2a).Alternatives considered
gtfs_diffreusable and portable.