Skip to content

feat: add support for timezones - 946#952

Merged
Alessandro100 merged 30 commits intomainfrom
937-add-support-for-timezones
Mar 25, 2025
Merged

feat: add support for timezones - 946#952
Alessandro100 merged 30 commits intomainfrom
937-add-support-for-timezones

Conversation

@qcdyx
Copy link
Copy Markdown
Contributor

@qcdyx qcdyx commented Mar 6, 2025

Summary:

Closes #946 #947 #948

Expected behavior:

the schema in gtfsdataset supports timezone for service date range values

Also includes timezone_agency in the dataset / search endpoints (API)

Updated the functions to set data / compare service date range with timezones (UTC)

  • backfill_dataset_service_date_range: backfills timezone if available
  • update_feeds_status
  • process_validation_report

Testing tips:

For backfill function

  • delete the service date range for a dataset validated in 6.0 and assure it has a validation report with timezone, run the function and see the service date range with timezone saved

For update feed status

  • manually update a feed status to wrong, run the script and see the updated correct status. Assure the feeds latest dataset has service date range or it wont update

For process validation report

  • Find a feed, delete its corresponding validation report, and run the function. Should generate the validation report and fill the service date range + timezone if exists

Please make sure these boxes are checked before submitting your pull request - thanks!

  • Run the unit tests with ./scripts/api-tests.sh to make sure you didn't break anything
  • Add or update any needed documentation to the repo
  • Format the title like "feat: [new feature short description]". Title must follow the Conventional Commit Specification(https://www.conventionalcommits.org/en/v1.0.0/).
  • Linked all relevant issues
  • Include screenshot(s) showing how this pull request works and fixes the issue(s)

Example of the final result of backfill function or process_validation_report function
Screenshot 2025-03-11 at 16 17 03

Example of the new endpoint return:
image

When the agency timezone exists it sets the utc offset accordingly
image

@qcdyx qcdyx linked an issue Mar 6, 2025 that may be closed by this pull request
@qcdyx qcdyx marked this pull request as draft March 6, 2025 16:54
@qcdyx qcdyx marked this pull request as ready for review March 6, 2025 19:59
@qcdyx qcdyx requested a review from Alessandro100 March 10, 2025 12:43
Comment on lines +1 to +2
ALTER TABLE gtfsdataset
ADD COLUMN service_date_range_timezone TIMESTAMP WITH TIME ZONE No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can call the column agency_timezone since it applies to the entire feed and it should be of type VARCHAR(255) since the values will be ex: America/Montreal or Pacific/Fiji

We need to alter the columns service_date_range_start and service_date_range_end to be TIMESTAMP WITH TIME ZONE instead of DATE

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Altering the service_date_range_start and service_date_range_end columns caused a liquibase.exception.DatabaseException due to a dependency on the feedsearch materialized view (_RETURN rule). Since modifying the column type directly isn’t possible, we discussed on Slack and decided to add a new column instead.

@qcdyx qcdyx requested a review from Alessandro100 March 10, 2025 14:34
@cka-y
Copy link
Copy Markdown
Contributor

cka-y commented Mar 11, 2025

Is this PR ready for review or are we planning to add code changes to it as discussed yesterday?

@Alessandro100
Copy link
Copy Markdown
Contributor

Is this PR ready for review or are we planning to add code changes to it as discussed yesterday?

Good call, it is not ready for review just yet, we are going to make the changes effected by the schema change here

@cka-y cka-y marked this pull request as draft March 11, 2025 12:33
@cka-y
Copy link
Copy Markdown
Contributor

cka-y commented Mar 11, 2025

Is this PR ready for review or are we planning to add code changes to it as discussed yesterday?

Good call, it is not ready for review just yet, we are going to make the changes effected by the schema change here

Got it! Converted to draft to indicate to reviewers that it isn't ready

@Alessandro100 Alessandro100 force-pushed the 937-add-support-for-timezones branch 3 times, most recently from d8be5be to 1ef7f0e Compare March 12, 2025 13:23
@Alessandro100 Alessandro100 marked this pull request as ready for review March 12, 2025 13:31
@Alessandro100 Alessandro100 force-pushed the 937-add-support-for-timezones branch from 1ef7f0e to faf0ee8 Compare March 12, 2025 13:32
@emmambd emmambd linked an issue Mar 12, 2025 that may be closed by this pull request
@github-actions
Copy link
Copy Markdown

Preview Firebase Hosting URL: https://mobility-feeds-dev--pr-952-jvn7l2fz.web.app

@emmambd emmambd linked an issue Mar 12, 2025 that may be closed by this pull request
example: 2023-07-10T06:00:00Z
format: date-time
service_date_range_end:
description: The start date of the service date range for the dataset.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question] should we specify in the descriptions of service_date_range_[start | end] that it's in UTC time?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think that would be good info

Comment on lines +124 to +125
formatted_service_start_date = None
formatted_service_end_date = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] declared but unused variables. can be safely deleted

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these variables are used!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry i misspoke -- since the variables are use but their initialization to None is not, it's redundant initialization and it can be removed from these lines (124-125). This is just a nit! Feel free to ignore it 😄

dataset.service_date_range_end = extracted_service_end_date
# If timezone is None (not found in the validation report), it will use UTC as base.
# It will not save the timezone in the database if it is None.
formatting_timezone = extracted_timezone
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[suggestion] add a timezone validation to fallback on UTC if it's invalid. Based on the current logic, if the timezone is defined but invalid, it will throw an exception and no service dates will be added for this dataset.

Comment on lines 189 to 197
refresh_materialized_view(session, t_feedsearch.name)
session.commit()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should commit before you refresh the materialized view otherwise the changes won't take effect until the next refresh

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I referenced this from functions-python/extract_location/src/reverse_geolocation/location_extractor.py should it be changed there as well?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the entire folder as part of #985


summary = json_report.get("summary", {})
if isinstance(summary.get("agencies"), list) and summary["agencies"]:
extracted_timezone = summary["agencies"][0].get("timezone", None)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines 67 to 68
refresh_materialized_view(session, t_feedsearch.name)
session.commit()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment about commit before refresh

Comment on lines +127 to +129
formatted_service_start_date = datetime.strptime(
extracted_service_start_date, "%Y-%m-%d"
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you create a function for this code to avoid repetition?

if isinstance(summary.get("agencies"), list) and summary["agencies"]:
extracted_timezone = summary["agencies"][0].get("timezone", None)

formatted_service_start_date = None
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just continue the loop if extracted_service_start_date or extracted_service_end_date are None? This will avoid a ValueErrror in case there is no service date in the feed.

json_data.get("summary", {})
.get("feedInfo", {})
.get("feedServiceWindowEnd", None)
extracted_service_end_date = summary.get("feedInfo", {}).get(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[suggestion]: We can use here the newly added get_nested_value function to account for empty strings(trimmed). The same for the extracted time zone value.

Comment on lines +162 to +167
local_service_start_date = formatted_service_start_date.replace(
hour=0, minute=0, tzinfo=ZoneInfo(formatting_timezone)
)
utc_service_start_date = local_service_start_date.astimezone(
ZoneInfo("UTC")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you create a function to avoid repetition?

@Alessandro100 Alessandro100 force-pushed the 937-add-support-for-timezones branch from 700156c to 78709f7 Compare March 12, 2025 20:36
@Alessandro100 Alessandro100 force-pushed the 937-add-support-for-timezones branch from 0ae8191 to 12ca886 Compare March 17, 2025 15:39
Copy link
Copy Markdown
Contributor

@cka-y cka-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Alessandro100 Alessandro100 merged commit 904cfeb into main Mar 25, 2025
15 of 17 checks passed
@Alessandro100 Alessandro100 deleted the 937-add-support-for-timezones branch March 25, 2025 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants