Skip to content

Implementation: Generic GTFS Diff Engine #2

@cka-y

Description

@cka-y

Describe the problem

There is no reusable, infrastructure-agnostic Python module for computing the difference between two GTFS datasets. Any diff logic built inside mobility-feed-api risks being tightly coupled to MobilityData-specific concepts (feed IDs, GCS paths, database schemas), preventing reuse across projects or publication as a standalone community tool.

Proposed solution

Create a generic gtfs_diff Python package in a standalone GitHub repository. The package takes two GTFS dataset sources (URLs or local paths) and produces a structured changelog. It must have zero knowledge of MobilityData-specific concepts (no feed IDs, no GCS paths, no database).

Acceptance criteria:

  • GtfsDiff.compute() correctly identifies added, removed, and modified rows across all supported GTFS files
  • Unit tests cover: all-same feeds (empty diff), one feed empty, added rows, removed rows, modified rows, unsupported file gracefully skipped
  • Package is importable locally after pip install -e .
  • README.md includes install instructions and a usage example
  • Zero references to MobilityData-specific infrastructure in any module

Alternatives considered

  • Inline the diff logic inside gtfs-change-tracker: creates tight coupling, prevents reuse as a standalone package.
  • Keep the module inside mobility-feed-api: still couples the logic to MobilityData infrastructure and makes PyPI publication awkward.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions