Problem
DataDesigner's DAG executes every column for every row unconditionally. In multi-stage synthesis pipelines, expensive downstream generation (LLM calls, segmentation, etc.) runs even when an earlier gate column indicates the row should be filtered out.
Today the only workarounds are:
- Generate all columns unconditionally and post-filter — wasting LLM calls on rows that will be discarded
- Split into multiple
DataDesigner.create() calls with intermediate filtering — losing single-pipeline ergonomics
Proposed Feature
Add a skip_when field to column configs that accepts a Jinja2 expression. When the expression evaluates truthy for a row, generation is skipped and the cell is set to None. Skips should auto-propagate through the DAG — downstream columns that depend on a skipped column also skip without requiring explicit configuration.
Example Use Case
config_builder.add_column(
name="complexity_score", column_type="llm-structured", ...
)
config_builder.add_column(
name="categories",
column_type="llm-structured",
skip_when="{{ complexity_score.overall_complexity_score < 6 }}",
...
)
# Everything downstream of categories auto-skips — no extra config needed
config_builder.add_column(name="instances", ...)
config_builder.add_column(name="multi_hop_query", ...)
Problem
DataDesigner's DAG executes every column for every row unconditionally. In multi-stage synthesis pipelines, expensive downstream generation (LLM calls, segmentation, etc.) runs even when an earlier gate column indicates the row should be filtered out.
Today the only workarounds are:
DataDesigner.create()calls with intermediate filtering — losing single-pipeline ergonomicsProposed Feature
Add a
skip_whenfield to column configs that accepts a Jinja2 expression. When the expression evaluates truthy for a row, generation is skipped and the cell is set toNone. Skips should auto-propagate through the DAG — downstream columns that depend on a skipped column also skip without requiring explicit configuration.Example Use Case