feat: add feed availability task and scheduler#1705
Draft
davidgamez wants to merge 13 commits into
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
This PR adds the cloud function and scheduler to check for and persist GTFS feed availability. It expanded the scope of the issue to perform a quick zip content check. Initially, it was intended to perform only HEAD HTTP requests, but testing in DEV, I realized that some of the servers don't support HEAD requests(~160). As a workaround, the check executes a HEAD request and if it fails, a GET request.
From our AI friend
This pull request introduces a new feature for checking the availability of GTFS feeds via HTTP HEAD/GET requests, along with several supporting refactors and documentation updates. The main changes include implementing robust HTTP request logic for feed checks, refactoring SSL context and HTTP header handling, and adding a new task handler and documentation for this feature.
New GTFS Feed Availability Check Feature:
perform_requestand supporting functions toutils.pyfor checking GTFS feed availability using HTTP HEAD requests, with optional GET fallback and ZIP file detection via magic bytes. This includes robust error handling and content-type inference.check_gtfs_feed_availability_handler, and registered it in the main task executor, enabling the new feed availability check to be triggered as a task. [1] [2]check_gtfs_feed_availabilitytask, its parameters, and the expected response format, including verbose error reporting.Refactoring and Helper Improvements:
create_feed_ssl_contextfunction, with improved handling for legacy servers and optional disabling of certificate checks for problematic feeds.build_feed_request_params, supporting per-feed header overrides and multiple authentication schemes.Dependency and Import Updates:
time,urllib3.exceptions, andtimezoneto support the new HTTP request and datetime logic.Expected behavior:
The GTFS availability is persisted in the DB.
Testing tips:
Internal team: Can be tested via retool in the dev environment
Please make sure these boxes are checked before submitting your pull request - thanks!
./scripts/api-tests.shto make sure you didn't break anything