Commit d4443d7
* plan: add skip_when for conditional column generation (#479)
Adds implementation plan for a `skip_when` field on `SingleColumnConfig`
that enables conditional column generation. When the Jinja2 expression
evaluates truthy, the cell is set to None and the generator is skipped.
Skips auto-propagate through the DAG to downstream columns.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* plan: remove HopChain example from skip_when plan
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* plan: replace HopChain example with generic product review example
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* plan: add open questions on skip sentinel value and row filtering
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* plan: major revision — SkipConfig model, sync engine support, decouple propagation
- Introduce SkipConfig(when, value) as nested model on SingleColumnConfig
- Move propagate_skip to SingleColumnConfig as independent field, fixing
bug where columns with no SkipConfig couldn't participate in propagation
- Add full sync engine implementation (Steps 4a-4d) covering both
_fan_out_with_threads and _run_full_column_generator dispatch paths
- Add serialization boundary stripping for both DatasetBatchManager (sync)
and RowGroupBufferManager (async)
- Simplify architecture diagrams for readability
- Update all references, design decisions, verification plan
Made-with: Cursor
* updates
* plan: document get_required_columns for skip propagation
- Explain why propagation must not use get_upstream_columns() once
skip.when adds DAG edges; add _required_columns and
get_required_columns() to the execution graph plan
- Point async _run_cell at get_required_columns for parity with sync
- Clarify DropSkippedRowsProcessorConfig vs stripping __skipped__ for
DataFrames; tighten resolved-questions wording
- Extend DAG/graph verification with gating_col regression case
Refs #479
Made-with: Cursor
* plan: centralize __skipped__ handling in skip_provenance
- Document new skip_provenance.py (key constant, read/write/strip API)
- Point sync builder, async scheduler, and batch buffers at shared helpers
- Strip metadata before every DataFrame from buffer dicts, including
FULL_COLUMN active subsets
- Split §3 into skip_evaluator vs skip_provenance; extend verification
Refs #479
Made-with: Cursor
* plan: align doc title with SkipConfig / skip.when
Drop legacy skip_when naming in headings and #362 cross-reference.
Refs #479
Made-with: Cursor
* plan: address review — delimiter validation, centralized error handling, caller-owns-deserialization
- SkipConfig._validate_when_syntax now checks find_undeclared_variables
is non-empty, rejecting expressions without {{ }} delimiters that
would silently skip every row
- evaluate_skip_when centralizes try/except so both sync and async
engines get identical fail-safe behavior on eval errors
- evaluate_skip_when takes a single pre-deserialized record; caller
runs deserialize_json_values once and passes to both skip eval and
generator (no double deserialization, no redundant parameter)
- Update _should_skip_cell, async _run_cell, Files Modified table,
and verification section accordingly
Refs #479
Made-with: Cursor
* plan: add get_side_effect_columns accessor to execution graph spec
Document _side_effects_by_producer inverse map and
get_side_effect_columns() accessor on ExecutionGraph, needed by
_write_skip_to_record / apply_skip_to_record to clear __trace,
__reasoning_content, etc. on skip. Added to both Step 2b metadata
section and Files Modified table.
The __skipped__ leak into active_df (greptile's other P1) was already
fixed in 7046378 via strip_skip_metadata_from_records.
Refs #479
Made-with: Cursor
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent f78c4e0 commit d4443d7
1 file changed
+667
-0
lines changed
0 commit comments