Skip to content

feat: refactor latest dataset as a relationship of gtfsfeed#1405

Merged
davidgamez merged 10 commits intomainfrom
feat/refactor_latest_dataset
Oct 20, 2025
Merged

feat: refactor latest dataset as a relationship of gtfsfeed#1405
davidgamez merged 10 commits intomainfrom
feat/refactor_latest_dataset

Conversation

@davidgamez
Copy link
Copy Markdown
Member

@davidgamez davidgamez commented Oct 16, 2025

Summary:
Closes #1057

This PR enhace all queries performance by adding a SQL relationship defining the latest dataset. This avoids loading the list of datasets that are increases exponetially in production environment.

From our AI friend

This pull request refactors how the "latest" GTFS dataset is tracked and accessed throughout the codebase. Instead of relying on a boolean latest flag within the Gtfsdataset model, the code now uses a direct reference (latest_dataset or latest_dataset_id) on the Gtfsfeed model. This change improves data integrity and simplifies queries and object mappings. The update affects database queries, model mappings, test data setup, and unit tests.

Database model and query changes:

  • Refactored queries to use Gtfsfeed.latest_dataset_id for identifying the latest dataset, removing reliance on the Gtfsdataset.latest flag. This includes updates in get_gtfs_feed_datasets, get_gtfs_feeds_query, and get_all_gtfs_feeds to join or load the latest dataset via the feed's reference rather than filtering by a boolean. [1] [2] [3] [4] [5]

  • Updated ORM eager loading to use joinedload(Gtfsfeed.latest_dataset) instead of contains_eager(Gtfsfeed.gtfsdatasets) to directly load the latest dataset. [1] [2] [3]

Model and mapping changes:

  • Modified the GtfsFeedImpl.from_orm method to set latest_dataset using the feed's latest_dataset reference, and updated mapping of visualization_dataset_id.

Test and fixture updates:

  • Updated test data creation and population scripts to set the latest dataset via latest_dataset or latest_dataset_id on the feed, and removed the latest flag from dataset objects. [1] [2] [3] [4] [5] [6] [7]

  • Refactored unit tests to check for the latest dataset using the new reference, and simplified test object construction for edge cases. [1] [2] [3]

Other minor improvements:

  • Improved logging statements for clarity and consistency in batch_process_dataset/src/main.py. [1] [2] [3] [4] [5]
  • Removed unused imports for code cleanliness. [1] [2]

Expected behavior:

API and batch function should expect to behave as before, no new feature added as part of this PR.

Testing tips:

Tested in DEV:

  • API endpoint
  • bash_datasets
  • bash_process_dataset
  • export_csv
  • process_validation_report
  • reverse_geolocation
  • update_feed_status
  • task_executor
  • rebuild_missing_dataset
  • update_json_files_precission

Please make sure these boxes are checked before submitting your pull request - thanks!

  • Run the unit tests with ./scripts/api-tests.sh to make sure you didn't break anything
  • Add or update any needed documentation to the repo
  • Format the title like "feat: [new feature short description]". Title must follow the Conventional Commit Specification(https://www.conventionalcommits.org/en/v1.0.0/).
  • Linked all relevant issues
  • Include screenshot(s) showing how this pull request works and fixes the issue(s)

@davidgamez davidgamez changed the title Feat/refactor latest dataset feat: refactor latest dataset as a relationship of gtfsfeed Oct 20, 2025
@davidgamez davidgamez marked this pull request as ready for review October 20, 2025 15:35
@davidgamez davidgamez requested a review from cka-y October 20, 2025 15:35
Copy link
Copy Markdown
Contributor

@cka-y cka-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@davidgamez davidgamez merged commit 9c6daad into main Oct 20, 2025
10 of 12 checks passed
@davidgamez davidgamez deleted the feat/refactor_latest_dataset branch October 20, 2025 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add latest dataset as a relation entity

2 participants