feat: add support for timezones - 946#952
Conversation
liquibase/changes/feat_946.sql
Outdated
| ALTER TABLE gtfsdataset | ||
| ADD COLUMN service_date_range_timezone TIMESTAMP WITH TIME ZONE No newline at end of file |
There was a problem hiding this comment.
We can call the column agency_timezone since it applies to the entire feed and it should be of type VARCHAR(255) since the values will be ex: America/Montreal or Pacific/Fiji
We need to alter the columns service_date_range_start and service_date_range_end to be TIMESTAMP WITH TIME ZONE instead of DATE
There was a problem hiding this comment.
Altering the service_date_range_start and service_date_range_end columns caused a liquibase.exception.DatabaseException due to a dependency on the feedsearch materialized view (_RETURN rule). Since modifying the column type directly isn’t possible, we discussed on Slack and decided to add a new column instead.
|
Is this PR ready for review or are we planning to add code changes to it as discussed yesterday? |
Good call, it is not ready for review just yet, we are going to make the changes effected by the schema change here |
Got it! Converted to draft to indicate to reviewers that it isn't ready |
d8be5be to
1ef7f0e
Compare
1ef7f0e to
faf0ee8
Compare
|
Preview Firebase Hosting URL: https://mobility-feeds-dev--pr-952-jvn7l2fz.web.app |
docs/DatabaseCatalogAPI.yaml
Outdated
| example: 2023-07-10T06:00:00Z | ||
| format: date-time | ||
| service_date_range_end: | ||
| description: The start date of the service date range for the dataset. |
There was a problem hiding this comment.
[question] should we specify in the descriptions of service_date_range_[start | end] that it's in UTC time?
There was a problem hiding this comment.
yeah I think that would be good info
| formatted_service_start_date = None | ||
| formatted_service_end_date = None |
There was a problem hiding this comment.
[nit] declared but unused variables. can be safely deleted
There was a problem hiding this comment.
these variables are used!
There was a problem hiding this comment.
sorry i misspoke -- since the variables are use but their initialization to None is not, it's redundant initialization and it can be removed from these lines (124-125). This is just a nit! Feel free to ignore it 😄
| dataset.service_date_range_end = extracted_service_end_date | ||
| # If timezone is None (not found in the validation report), it will use UTC as base. | ||
| # It will not save the timezone in the database if it is None. | ||
| formatting_timezone = extracted_timezone |
There was a problem hiding this comment.
[suggestion] add a timezone validation to fallback on UTC if it's invalid. Based on the current logic, if the timezone is defined but invalid, it will throw an exception and no service dates will be added for this dataset.
| refresh_materialized_view(session, t_feedsearch.name) | ||
| session.commit() |
There was a problem hiding this comment.
You should commit before you refresh the materialized view otherwise the changes won't take effect until the next refresh
There was a problem hiding this comment.
I referenced this from functions-python/extract_location/src/reverse_geolocation/location_extractor.py should it be changed there as well?
...n/backfill_dataset_service_date_range/tests/test_backfill_dataset_service_date_range_main.py
Show resolved
Hide resolved
|
|
||
| summary = json_report.get("summary", {}) | ||
| if isinstance(summary.get("agencies"), list) and summary["agencies"]: | ||
| extracted_timezone = summary["agencies"][0].get("timezone", None) |
| refresh_materialized_view(session, t_feedsearch.name) | ||
| session.commit() |
There was a problem hiding this comment.
same comment about commit before refresh
| formatted_service_start_date = datetime.strptime( | ||
| extracted_service_start_date, "%Y-%m-%d" | ||
| ) |
There was a problem hiding this comment.
Could you create a function for this code to avoid repetition?
| if isinstance(summary.get("agencies"), list) and summary["agencies"]: | ||
| extracted_timezone = summary["agencies"][0].get("timezone", None) | ||
|
|
||
| formatted_service_start_date = None |
There was a problem hiding this comment.
Should we just continue the loop if extracted_service_start_date or extracted_service_end_date are None? This will avoid a ValueErrror in case there is no service date in the feed.
| json_data.get("summary", {}) | ||
| .get("feedInfo", {}) | ||
| .get("feedServiceWindowEnd", None) | ||
| extracted_service_end_date = summary.get("feedInfo", {}).get( |
There was a problem hiding this comment.
[suggestion]: We can use here the newly added get_nested_value function to account for empty strings(trimmed). The same for the extracted time zone value.
| local_service_start_date = formatted_service_start_date.replace( | ||
| hour=0, minute=0, tzinfo=ZoneInfo(formatting_timezone) | ||
| ) | ||
| utc_service_start_date = local_service_start_date.astimezone( | ||
| ZoneInfo("UTC") |
There was a problem hiding this comment.
Could you create a function to avoid repetition?
700156c to
78709f7
Compare
…ge_start and service_date_range_end
0ae8191 to
12ca886
Compare
Summary:
Closes #946 #947 #948
Expected behavior:
the schema in gtfsdataset supports timezone for service date range values
Also includes timezone_agency in the dataset / search endpoints (API)
Updated the functions to set data / compare service date range with timezones (UTC)
Testing tips:
For backfill function
For update feed status
For process validation report
Please make sure these boxes are checked before submitting your pull request - thanks!
./scripts/api-tests.shto make sure you didn't break anythingExample of the final result of backfill function or process_validation_report function

Example of the new endpoint return:

When the agency timezone exists it sets the utc offset accordingly
