Commit a9af365
feat: add skip.when conditional column generation (#502)
* plan: add skip_when for conditional column generation (#479)
Adds implementation plan for a `skip_when` field on `SingleColumnConfig`
that enables conditional column generation. When the Jinja2 expression
evaluates truthy, the cell is set to None and the generator is skipped.
Skips auto-propagate through the DAG to downstream columns.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* plan: remove HopChain example from skip_when plan
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* plan: replace HopChain example with generic product review example
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* plan: add open questions on skip sentinel value and row filtering
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* plan: major revision — SkipConfig model, sync engine support, decouple propagation
- Introduce SkipConfig(when, value) as nested model on SingleColumnConfig
- Move propagate_skip to SingleColumnConfig as independent field, fixing
bug where columns with no SkipConfig couldn't participate in propagation
- Add full sync engine implementation (Steps 4a-4d) covering both
_fan_out_with_threads and _run_full_column_generator dispatch paths
- Add serialization boundary stripping for both DatasetBatchManager (sync)
and RowGroupBufferManager (async)
- Simplify architecture diagrams for readability
- Update all references, design decisions, verification plan
Made-with: Cursor
* updates
* plan: document get_required_columns for skip propagation
- Explain why propagation must not use get_upstream_columns() once
skip.when adds DAG edges; add _required_columns and
get_required_columns() to the execution graph plan
- Point async _run_cell at get_required_columns for parity with sync
- Clarify DropSkippedRowsProcessorConfig vs stripping __skipped__ for
DataFrames; tighten resolved-questions wording
- Extend DAG/graph verification with gating_col regression case
Refs #479
Made-with: Cursor
* plan: centralize __skipped__ handling in skip_provenance
- Document new skip_provenance.py (key constant, read/write/strip API)
- Point sync builder, async scheduler, and batch buffers at shared helpers
- Strip metadata before every DataFrame from buffer dicts, including
FULL_COLUMN active subsets
- Split §3 into skip_evaluator vs skip_provenance; extend verification
Refs #479
Made-with: Cursor
* plan: align doc title with SkipConfig / skip.when
Drop legacy skip_when naming in headings and #362 cross-reference.
Refs #479
Made-with: Cursor
* plan: address review — delimiter validation, centralized error handling, caller-owns-deserialization
- SkipConfig._validate_when_syntax now checks find_undeclared_variables
is non-empty, rejecting expressions without {{ }} delimiters that
would silently skip every row
- evaluate_skip_when centralizes try/except so both sync and async
engines get identical fail-safe behavior on eval errors
- evaluate_skip_when takes a single pre-deserialized record; caller
runs deserialize_json_values once and passes to both skip eval and
generator (no double deserialization, no redundant parameter)
- Update _should_skip_cell, async _run_cell, Files Modified table,
and verification section accordingly
Refs #479
Made-with: Cursor
* plan: add get_side_effect_columns accessor to execution graph spec
Document _side_effects_by_producer inverse map and
get_side_effect_columns() accessor on ExecutionGraph, needed by
_write_skip_to_record / apply_skip_to_record to clear __trace,
__reasoning_content, etc. on skip. Added to both Step 2b metadata
section and Files Modified table.
The __skipped__ leak into active_df (greptile's other P1) was already
fixed in 7046378 via strip_skip_metadata_from_records.
Refs #479
Made-with: Cursor
* add skip.when conditional column generation
Introduce SkipConfig on SingleColumnConfig to gate column generation
with a Jinja2 expression. Columns can be skipped by expression or by
upstream propagation (propagate_skip flag).
- SkipConfig: Pydantic model with config-time syntax/delimiter/variable
validation and cached column extraction from the Jinja2 AST
- skip_evaluator: runtime expression evaluation via NativeSandboxedEnvironment
with fail-safe error handling (skip on expected failures)
- skip_provenance: centralized __skipped__ record tracking shared by
sync builder, async scheduler, and buffer managers
- DAG/ExecutionGraph: skip.columns wired as dependency edges in both
topological sort and static execution graph
- Validation: validate_skip_references checks reference existence,
sampler/seed scope, and allow_resize conflicts
- Sync builder: cell-by-cell and full-column skip with merge-back
- Async scheduler: cell and batch skip with live-buffer provenance
Made-with: Cursor
* fix review findings for skip.when implementation
- Add skip evaluation to _fan_out_with_async (was missing, causing
skipped rows to still be sent to the LLM)
- Preserve __skipped__ provenance on non-skipped records after
full-column generation so multi-hop propagation works
- Use single live-buffer reference in _run_batch skip loop for
consistency with _run_cell
- Move Template import to TYPE_CHECKING and reorder import blocks
- Replace O(n²) sum() with itertools.chain in dag.py
- Add set_required_columns/set_propagate_skip/set_skip_config
setters to ExecutionGraph for symmetry with existing API
Made-with: Cursor
* add conditional generation with skip recipe and refactor skip helpers
Add a new recipe demonstrating skip.when patterns (expression gate,
propagation, opt-out) with a customer support ticket pipeline.
Also extract _should_skip_record in async_scheduler, remove the
redundant propagate_skip param from should_skip_by_propagation, and
pass a precomputed all_side_effects set through the DAG sort.
Made-with: Cursor
* updates
* fixes
* remove recipe > inject conditional gen into existing tutorial
* regen colab notebooks
* fix: handle missing execution graph in _column_can_skip
Return False when the graph has not been initialized instead of raising,
since skip logic cannot apply before generators are set up.
Made-with: Cursor
* parametrize some tests
* public before private
* slight refactor for readability
* parametrize some tests
* minor fixes
* reanme internla skip tracker key name
* clarify intent in comment
* when skipped _run_cell should return skipped value even though the consumer doesn't currenlty care about it
* remove inline import
* minor refactor for clarity
* fix: preserve skip metadata across replace_buffer and exclude allow_resize from skip branch
Two bugs in the sequential engine's _run_full_column_generator:
1. replace_buffer(df.to_dict()) erased __internal_skipped_columns in
three code paths (MultiColumnConfig, non-skip-aware, has_skipped=False
fallthrough), breaking propagate_skip for downstream columns when an
independent FULL_COLUMN generator ran between skip-setting and
propagating columns.
2. _column_can_skip returned True for allow_resize=True columns via
propagation, causing the skip-aware merge path to raise on the 1:1
row-count check for 1:N generators.
- Add restore_skip_metadata helper to skip_tracker.py
- Guard _column_can_skip against allow_resize=True columns
- Refactor _run_full_column_generator into three focused methods
- Remove dead allow_resize / _log_resize_if_changed from skip path
- Remove redundant _require_graph() calls in skip helpers
- Add single_column_config_by_name cached property
- Add integration tests for both bugs and unit tests for the helper
Made-with: Cursor
* address review comments on skip.when PR (#502)
- Extract shared skip decision logic (_should_skip_cell / _should_skip_record)
into should_skip_column_for_record() in skip_evaluator.py so both sync and
async engines call the same function (andreatgretel review comment)
- Extend SkipConfig self-reference validation to cover side-effect columns
(e.g. review__trace on the review column) — previously only checked
self.name, now checks self.name | self.side_effect_columns
- Add async engine integration tests for skip paths: cell-by-cell with
propagation and full-column batch skip (exercises _run_cell / _run_batch)
- Fix test_allow_resize_column_not_blocked_by_upstream_skip to use default
propagate_skip=True so it actually exercises the allow_resize guard
- Move get_skipped_column_names from skip_tracker to skip_evaluator (sole
production consumer)
Made-with: Cursor
* address cr feedback
* Fix issue with full column generating messing up order of skipped rows
* add skip conditional generation edge case tests
- test_skip_evaluator: parametrized should_skip_column_for_record covering
propagation, expression gates, short-circuiting, and disabled propagation
- test_execution_graph: skip metadata accessors (get_skip_config,
should_propagate_skip, get_required_columns, get_side_effect_columns,
resolve_side_effect, skip.when DAG edges)
- test_dataset_builder: chained transitive propagation (4 levels),
two independent skip gates, custom skip.value, row count preservation
Made-with: Cursor
* fix: make expression jinja validator private
Rename assert_expression_valid_jinja to _assert_expression_valid_jinja
to match the private naming convention used by other model validators.
Made-with: Cursor
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent f267e19 commit a9af365
File tree
31 files changed
+2605
-245
lines changed- architecture
- docs
- colab_notebooks
- notebook_source
- packages
- data-designer-config
- src/data_designer/config
- tests/config
- data-designer-engine
- src/data_designer/engine
- dataset_builders
- utils
- tests/engine
- dataset_builders
- utils
31 files changed
+2605
-245
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
43 | | - | |
| 43 | + | |
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
| 47 | + | |
47 | 48 | | |
48 | 49 | | |
49 | 50 | | |
| |||
53 | 54 | | |
54 | 55 | | |
55 | 56 | | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
56 | 76 | | |
57 | 77 | | |
58 | | - | |
| 78 | + | |
59 | 79 | | |
60 | 80 | | |
61 | 81 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | | - | |
| 5 | + | |
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
13 | | - | |
| 13 | + | |
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
25 | | - | |
| 25 | + | |
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
| |||
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
37 | | - | |
| 37 | + | |
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
| |||
45 | 45 | | |
46 | 46 | | |
47 | 47 | | |
48 | | - | |
| 48 | + | |
49 | 49 | | |
50 | 50 | | |
51 | 51 | | |
| |||
56 | 56 | | |
57 | 57 | | |
58 | 58 | | |
59 | | - | |
| 59 | + | |
60 | 60 | | |
61 | 61 | | |
62 | 62 | | |
| |||
74 | 74 | | |
75 | 75 | | |
76 | 76 | | |
77 | | - | |
| 77 | + | |
78 | 78 | | |
79 | 79 | | |
80 | 80 | | |
| |||
84 | 84 | | |
85 | 85 | | |
86 | 86 | | |
87 | | - | |
| 87 | + | |
88 | 88 | | |
89 | 89 | | |
90 | 90 | | |
| |||
97 | 97 | | |
98 | 98 | | |
99 | 99 | | |
100 | | - | |
| 100 | + | |
101 | 101 | | |
102 | 102 | | |
103 | 103 | | |
| |||
106 | 106 | | |
107 | 107 | | |
108 | 108 | | |
109 | | - | |
| 109 | + | |
110 | 110 | | |
111 | 111 | | |
112 | 112 | | |
| |||
123 | 123 | | |
124 | 124 | | |
125 | 125 | | |
126 | | - | |
| 126 | + | |
127 | 127 | | |
128 | 128 | | |
129 | 129 | | |
| |||
153 | 153 | | |
154 | 154 | | |
155 | 155 | | |
156 | | - | |
| 156 | + | |
157 | 157 | | |
158 | 158 | | |
159 | 159 | | |
| |||
168 | 168 | | |
169 | 169 | | |
170 | 170 | | |
171 | | - | |
| 171 | + | |
172 | 172 | | |
173 | 173 | | |
174 | 174 | | |
| |||
177 | 177 | | |
178 | 178 | | |
179 | 179 | | |
180 | | - | |
| 180 | + | |
181 | 181 | | |
182 | 182 | | |
183 | 183 | | |
| |||
194 | 194 | | |
195 | 195 | | |
196 | 196 | | |
197 | | - | |
| 197 | + | |
198 | 198 | | |
199 | 199 | | |
200 | 200 | | |
| |||
203 | 203 | | |
204 | 204 | | |
205 | 205 | | |
206 | | - | |
| 206 | + | |
207 | 207 | | |
208 | 208 | | |
209 | 209 | | |
| |||
212 | 212 | | |
213 | 213 | | |
214 | 214 | | |
215 | | - | |
| 215 | + | |
216 | 216 | | |
217 | 217 | | |
218 | 218 | | |
| |||
293 | 293 | | |
294 | 294 | | |
295 | 295 | | |
296 | | - | |
| 296 | + | |
297 | 297 | | |
298 | 298 | | |
299 | 299 | | |
| |||
302 | 302 | | |
303 | 303 | | |
304 | 304 | | |
305 | | - | |
| 305 | + | |
306 | 306 | | |
307 | 307 | | |
308 | 308 | | |
| |||
339 | 339 | | |
340 | 340 | | |
341 | 341 | | |
342 | | - | |
| 342 | + | |
343 | 343 | | |
344 | 344 | | |
345 | 345 | | |
| |||
354 | 354 | | |
355 | 355 | | |
356 | 356 | | |
357 | | - | |
| 357 | + | |
358 | 358 | | |
359 | 359 | | |
360 | 360 | | |
| |||
390 | 390 | | |
391 | 391 | | |
392 | 392 | | |
393 | | - | |
| 393 | + | |
394 | 394 | | |
395 | 395 | | |
396 | 396 | | |
| |||
407 | 407 | | |
408 | 408 | | |
409 | 409 | | |
410 | | - | |
| 410 | + | |
411 | 411 | | |
412 | 412 | | |
413 | 413 | | |
| |||
417 | 417 | | |
418 | 418 | | |
419 | 419 | | |
420 | | - | |
| 420 | + | |
421 | 421 | | |
422 | 422 | | |
423 | 423 | | |
| |||
428 | 428 | | |
429 | 429 | | |
430 | 430 | | |
431 | | - | |
| 431 | + | |
432 | 432 | | |
433 | 433 | | |
434 | 434 | | |
| |||
438 | 438 | | |
439 | 439 | | |
440 | 440 | | |
441 | | - | |
| 441 | + | |
442 | 442 | | |
443 | 443 | | |
444 | 444 | | |
| |||
451 | 451 | | |
452 | 452 | | |
453 | 453 | | |
454 | | - | |
| 454 | + | |
455 | 455 | | |
456 | 456 | | |
457 | 457 | | |
| |||
461 | 461 | | |
462 | 462 | | |
463 | 463 | | |
464 | | - | |
| 464 | + | |
465 | 465 | | |
466 | 466 | | |
467 | 467 | | |
| |||
474 | 474 | | |
475 | 475 | | |
476 | 476 | | |
477 | | - | |
| 477 | + | |
478 | 478 | | |
479 | 479 | | |
480 | 480 | | |
| |||
484 | 484 | | |
485 | 485 | | |
486 | 486 | | |
487 | | - | |
| 487 | + | |
488 | 488 | | |
489 | 489 | | |
490 | 490 | | |
| |||
497 | 497 | | |
498 | 498 | | |
499 | 499 | | |
500 | | - | |
| 500 | + | |
501 | 501 | | |
502 | 502 | | |
503 | 503 | | |
| |||
509 | 509 | | |
510 | 510 | | |
511 | 511 | | |
512 | | - | |
| 512 | + | |
513 | 513 | | |
514 | 514 | | |
515 | 515 | | |
516 | 516 | | |
517 | 517 | | |
518 | 518 | | |
519 | | - | |
| 519 | + | |
520 | 520 | | |
521 | 521 | | |
522 | 522 | | |
| |||
0 commit comments