Skip to content

Optimize revalidate of feed detail pages #67

@Alessandro100

Description

@Alessandro100

In web-app we expose an endpoint 'api/revalidate' which when called will manually invalidate the server side cache. With this endpoint you can invalidate by

all feeds
gbfs feeds
gtfs feeds
gtfs rt feeds
specific feeds (by id)
End goal: Refresh the cached server feeds when there are changes in the feeds

Current scenario
A cron job is run from Vercel once a day which invalidates all cache.

Ideal scenario
We should only invalidate the cache of the feeds that have been updated with the most precision as possible. Ex: If overnight feeds mdb-2, mdb-5, mdb-1992 were updated, only invalidate those

Tasks:

  • For a revalidation is implemented here. The url to revalidate is http://mobilitydatabase.org/api/revalidate POST with a header containing the secret. The secret will be defined in 1Password and needs to be added to GCP using Teraform.
  • For GBFS:
    • Revalidate on schedule - @4am UTC from Monday to Saturday and @7am UTC on Sunday [This is taken care of by a cron job in Vercel - no backend work required]
  • For GTFS-RT:
    • Revalidate when the feed is updated in the mobility-feed-catalog repository. This runs the populate-db.sh script. Only updated feeds should be revalidated after the script runs.
    • Revalidate when the location extraction for the schedule feed related to the realtime feed is completed. This involves the reverse geolocation logic from functions-python/reverse_geolocation/src/reverse_geolocation_processor.py
  • For GTFS - This part will require deduplication as multiple processes can happen back to back, but we would want to revalidate a feed a single time (suggested implementation: using cloud tasks with id to avoid task duplication within a given timeframe):
    • Revalidate when the feed is updated in the mobility-feed-catalog repository. This runs the populate-db.sh script. Only updated feeds should be revalidated after the script runs.
    • Revalidate when the location extraction is complete for the gtfs feed. This involves the reverse geolocation logic from functions-python/reverse_geolocation/src/reverse_geolocation_processor.py
    • Revalidate when data syncs with external data source run (functions-python/tasks_executor/src/tasks/data_import):
      • JBDA import
      • TDG import
      • TLD import
    • Revalidate when the pmtiles generation is done running (functions-python/pmtiles_builder)
    • Revalidate when the validation report is done generating (functions-python/update_validation_report)
    • Revalidate when the feed has a new dataset (hash change) -> (functions-python/batch_process_dataset)

Metadata

Metadata

Labels

performanceImproves the speed or efficiency of the application

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions