Skip to content

[Backend] Generic GTFS Diff Engine #1637

@cka-y

Description

@cka-y

Describe the problem

There is no reusable, infrastructure-agnostic module for computing the difference between two GTFS datasets. Any diff logic built inside mobility-feed-api risks being tightly coupled to MobilityData-specific infrastructure (feed IDs, GCS paths, database schemas), preventing reuse across projects or publication as a standalone community tool.

Proposed solution

Create a standalone GitHub repository containing a generic gtfs_diff Python package published to PyPI. The package takes two GTFS dataset sources (URLs or local paths) and produces a structured changelog. It must have zero knowledge of MobilityData-specific concepts (no feed IDs, no GCS paths, no database).

PyPI publishing: automated via GitHub Actions on tags matching v*, using pypa/gh-action-pypi-publish and a PYPI_API_TOKEN secret.

Acceptance criteria:

  • GtfsDiff.compute() correctly identifies added, removed, and modified rows across all supported GTFS files
  • Unit tests cover: all-same feeds (empty diff), one feed empty, added rows, removed rows, modified rows, unsupported file gracefully skipped
  • Package is importable via pip install <package-name>
  • PyPI publish workflow triggers on version tag push
  • README.md includes install instructions and a usage example
  • Zero references to MobilityData-specific infrastructure in any module

Alternatives considered

  • Inline the diff logic inside gtfs-change-tracker: creates tight coupling, prevents reuse as a standalone package.
  • Keep the module inside mobility-feed-api: still couples the logic to MobilityData infrastructure and makes PyPI publication awkward.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions