feat: refactor latest dataset as a relationship of gtfsfeed#1405
Merged
davidgamez merged 10 commits intomainfrom Oct 20, 2025
Merged
feat: refactor latest dataset as a relationship of gtfsfeed#1405davidgamez merged 10 commits intomainfrom
davidgamez merged 10 commits intomainfrom
Conversation
This reverts commit 5556c67.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Closes #1057
This PR enhace all queries performance by adding a SQL relationship defining the latest dataset. This avoids loading the list of datasets that are increases exponetially in production environment.
From our AI friend
This pull request refactors how the "latest" GTFS dataset is tracked and accessed throughout the codebase. Instead of relying on a boolean
latestflag within theGtfsdatasetmodel, the code now uses a direct reference (latest_datasetorlatest_dataset_id) on theGtfsfeedmodel. This change improves data integrity and simplifies queries and object mappings. The update affects database queries, model mappings, test data setup, and unit tests.Database model and query changes:
Refactored queries to use
Gtfsfeed.latest_dataset_idfor identifying the latest dataset, removing reliance on theGtfsdataset.latestflag. This includes updates inget_gtfs_feed_datasets,get_gtfs_feeds_query, andget_all_gtfs_feedsto join or load the latest dataset via the feed's reference rather than filtering by a boolean. [1] [2] [3] [4] [5]Updated ORM eager loading to use
joinedload(Gtfsfeed.latest_dataset)instead ofcontains_eager(Gtfsfeed.gtfsdatasets)to directly load the latest dataset. [1] [2] [3]Model and mapping changes:
GtfsFeedImpl.from_ormmethod to setlatest_datasetusing the feed'slatest_datasetreference, and updated mapping ofvisualization_dataset_id.Test and fixture updates:
Updated test data creation and population scripts to set the latest dataset via
latest_datasetorlatest_dataset_idon the feed, and removed thelatestflag from dataset objects. [1] [2] [3] [4] [5] [6] [7]Refactored unit tests to check for the latest dataset using the new reference, and simplified test object construction for edge cases. [1] [2] [3]
Other minor improvements:
batch_process_dataset/src/main.py. [1] [2] [3] [4] [5]Expected behavior:
API and batch function should expect to behave as before, no new feature added as part of this PR.
Testing tips:
Tested in DEV:
Please make sure these boxes are checked before submitting your pull request - thanks!
./scripts/api-tests.shto make sure you didn't break anything