You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am looking for guidance before attempting an implementation.
In time-scheduled DW/DataLake DAGs, data_interval_start and data_interval_end are often still the primary processing contract. Asset-triggered DAGs are useful, but switching a heavy batch consumer to schedule=[Asset(...)] changes the run semantics because asset-triggered runs intentionally do not have a logical date or data interval.
The use case I am trying to model is:
keep the consumer DAG time-scheduled, for example @daily
keep SQL/business logic based on the consumer's data interval
wait for an upstream Airflow asset partition, for example orders / dt=2026-05-21
avoid coupling the consumer to the producer DAG's exact schedule via ExternalTaskSensor(execution_date_fn=...)
Today I can solve this with provider-specific sensors, such as S3, Hive, BigQuery, or Databricks partition sensors, or with ExternalTaskSensor. But I could not find an Airflow-native task-level way to wait for an AssetEvent / asset partition recorded in Airflow metadata.
the consumer DAG is still created by its time schedule
the sensor waits for an Airflow asset event matching asset + partition_key
the consumer can continue using its own data_interval_start / data_interval_end
the producer DAG can be scheduled, manual, backfilled, or otherwise changed, as long as it emits the expected asset partition event
This is not intended to replace asset-triggered scheduling, and I am not proposing to derive data_interval for asset-triggered DAG runs. I am trying to clarify whether there is room for a time-scheduled DAG to wait on asset partition readiness as a task-level dependency.
This seems related to the problem discussed in #55489, where users need a common date/partition concept across scheduled, triggered, and asset-aware workflows. The maintainer comments there also point toward asset partitions / dag_run.partition_key as the more general direction.
Questions:
Is this gap already intended to be covered by asset partitions in Airflow 3.2+?
If not, would a task-level asset partition readiness sensor fit Airflow's direction?
Should such a feature live in the standard provider, core, or only be documented as a pattern using provider-specific sensors?
What semantics would be acceptable: latest event exists, any event exists, source-run-aware matching, or something else?
If this direction makes sense, I would be interested in helping with a small scoped follow-up, likely starting with docs or a design issue before any implementation PR.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
I am looking for guidance before attempting an implementation.
In time-scheduled DW/DataLake DAGs,
data_interval_startanddata_interval_endare often still the primary processing contract. Asset-triggered DAGs are useful, but switching a heavy batch consumer toschedule=[Asset(...)]changes the run semantics because asset-triggered runs intentionally do not have a logical date or data interval.The use case I am trying to model is:
@dailyorders / dt=2026-05-21ExternalTaskSensor(execution_date_fn=...)Today I can solve this with provider-specific sensors, such as S3, Hive, BigQuery, or Databricks partition sensors, or with
ExternalTaskSensor. But I could not find an Airflow-native task-level way to wait for anAssetEvent/ asset partition recorded in Airflow metadata.A concrete shape might look like this:
The intended semantics would be:
asset + partition_keydata_interval_start/data_interval_endThis is not intended to replace asset-triggered scheduling, and I am not proposing to derive
data_intervalfor asset-triggered DAG runs. I am trying to clarify whether there is room for a time-scheduled DAG to wait on asset partition readiness as a task-level dependency.This seems related to the problem discussed in #55489, where users need a common date/partition concept across scheduled, triggered, and asset-aware workflows. The maintainer comments there also point toward asset partitions /
dag_run.partition_keyas the more general direction.Questions:
If this direction makes sense, I would be interested in helping with a small scoped follow-up, likely starting with docs or a design issue before any implementation PR.
Beta Was this translation helpful? Give feedback.
All reactions