Skip to content

fix: drop invalid expression rows#757

Open
nabinchha wants to merge 5 commits into
mainfrom
codex/issue-749-expression-row-drops
Open

fix: drop invalid expression rows#757
nabinchha wants to merge 5 commits into
mainfrom
codex/issue-749-expression-row-drops

Conversation

@nabinchha

Copy link
Copy Markdown
Contributor

📋 Summary

Expression columns currently fail the whole column when a single row renders to empty text or hits a row-specific Jinja/cast error. This PR makes those failures behave like other row-level generation failures: invalid rows are dropped with structured warning counts, while fully broken expressions still fail loudly.

🔗 Related Issue

Fixes #749

🔄 Changes

  • Drop only the affected row when an expression render is empty/whitespace, hits a row-specific template error, or fails dtype conversion.
  • Preserve input row indexes from ExpressionColumnGenerator so shrunken expression batches can be merged back safely.
  • Update the sync builder to allow expression row drops in both full-column and skip-aware merge paths.
  • Update the async scheduler to map expression results by row-group index, drop missing result rows, and treat all-dropped expression batches as fatal instead of salvaging the whole row group.
  • Add regression coverage for generator-level row drops, sync builder shrink behavior, skip-aware sync shrink behavior, partial async row-group shrink, and async all-dropped failure.

🔍 Attention Areas

⚠️ Reviewers: Please pay special attention to the following:

  • async_scheduler.py — this is the core async merge/error-boundary change, including the distinction between partial row drops and all-dropped expression failures.
  • dataset_builder.py — sync full-column and skip-aware paths now allow expression generators to shrink batches while preserving skip metadata.

🧪 Testing

  • make test passes (not run; focused engine checks were run instead)
  • Unit tests added/updated
  • E2E tests added/updated (N/A — no E2E surface changed)
  • uv run --group dev pytest packages/data-designer-engine/tests/engine/column_generators/generators/test_expression.py packages/data-designer-engine/tests/engine/dataset_builders/test_dataset_builder.py::test_expression_column_row_drops_shrink_sync_batch packages/data-designer-engine/tests/engine/dataset_builders/test_dataset_builder.py::test_expression_column_row_drops_shrink_sync_skip_aware_batch packages/data-designer-engine/tests/engine/dataset_builders/test_async_builder_integration.py::test_expression_row_drops_shrink_async_row_group packages/data-designer-engine/tests/engine/dataset_builders/test_async_builder_integration.py::test_expression_all_dropped_async_row_group_fails_loudly -q
  • uv run --group dev pytest packages/data-designer-engine/tests/engine/dataset_builders/test_async_builder_integration.py -q
  • .venv/bin/ruff check <changed files>
  • .venv/bin/ruff format --check <changed files>

✅ Checklist

  • Follows commit message conventions
  • Commits are signed off (DCO) — existing commits do not include Signed-off-by trailers
  • Architecture docs updated (N/A — existing docs already describe dropped-row outcomes; this PR does not change the public config surface)

Handle per-row expression render and cast failures by dropping only affected rows, preserving row identity in sync and async full-column paths, and failing when no valid rows remain.

Fixes #749
@nabinchha nabinchha requested a review from a team as a code owner June 17, 2026 01:15
@github-actions

Copy link
Copy Markdown
Contributor

PR #757 Review — fix: drop invalid expression rows

Summary

The PR converts row-specific expression failures (empty Jinja renders, row-level template exceptions, dtype-cast errors) from full-column failures into per-row drops with structured warning counts, while preserving "fail-loud" behavior when all rows in a batch drop. It threads the resulting variable-size DataFrames through three places that previously assumed rigid 1:1 input/output sizing:

  1. ExpressionColumnGenerator.generate — collects per-row drop reasons in a Counter, preserves input row indexes on the returned DataFrame, and raises UserTemplateError only when every row drops.
  2. Sync DatasetBuilder_run_full_column_generator_without_skip and _merge_skipped_and_generated now allow expression generators to shrink the buffer, with skip-metadata round-trip honoring allow_resize=True.
  3. Async AsyncTaskScheduler._run_batch — maps batch results by row-group index instead of positional offset, drops rows whose expression results are missing, and elevates UserTemplateError from worker tasks to a fatal scheduler error so all-dropped batches abort instead of silently zeroing the row group.

Test coverage is solid: generator-level drops (empty/template-error/cast-error/all-dropped), sync shrink, sync skip-aware shrink, async partial drop, and async all-dropped fatal.

Findings

Correctness

  • expression.py:55-65except Exception is too broad and degrades diagnostics for systemic config bugs. The bare catch will swallow UserTemplateUnsupportedFiltersError, mistyped variable references, and similar template-level (not row-level) errors. Because these usually fail every row identically, the user ends up seeing the generic Expression column 'X' produced no valid rows. instead of the actionable upstream message (e.g. "No filter named 'foo'"). Consider narrowing the catch to row-level shapes (UndefinedError, TypeError, ZeroDivisionError, …) and letting other UserTemplateError subclasses propagate, or preserving the original exception chain when the all-dropped error is raised (raise UserTemplateError(...) from last_exc) so the root cause is still visible.

  • expression.py:42-43 — duplicate dtype validation. ExpressionColumnConfig.dtype is already a Literal["int","float","str","bool"] enforced by Pydantic, and _cast_type's else branch already raises ValueError on the impossible case. Adding a third runtime guard (with its own _VALID_DTYPES constant) just creates a third source of truth that has to stay in sync with the Literal. Recommend dropping the new check and the constant.

  • expression.py:88-92_is_empty_rendered_expression None branch is dead code. safe_render returns str; it raises rather than returning None. The value is None branch can never fire today.

  • async_scheduler.py:1702-1713 — fatal-error log line uses task.row_index which is None for batch tasks. The new logger.error("Fatal expression failure on %s[rg=%s, row=%s]", task.column, task.row_group, task.row_index, ...) will always log row=None for the only path that reaches it (expression task_type == "batch"). Either drop the row= field or label it as task=batch.

  • async_scheduler.py:1797-1804 — duplicated fatal-error detection in two layers. _run_generator_call now bypasses its DatasetGenerationError wrapping for UserTemplateError, and _execute_task_inner_impl re-checks _is_fatal_expression_template_error on the same exception. The bypass is needed (otherwise the wrapped exception would fail the isinstance(UserTemplateError) check), but the symmetry would be clearer with a comment explaining why both checks exist, or by introducing a sentinel (e.g. _FatalExpressionDrop) wrapper that survives the wrap.

Code quality / reuse

  • Validation is duplicated between sync (_merge_row_dropped_generated_records) and async (_require_expression_row_drop_result). Both do: dedup-check, "indexes outside active set" check, length cap. Worth extracting a single helper (e.g. validate_row_drop_result(result_df, active_indices) -> None) to keep the two paths from drifting. The error messages also differ slightly — "Expression generator returned…" (sync) vs "Batch generator for column 'X' returned…" (async) — unifying them helps users grep logs.

  • retained_indexes: list[object] — overly broad type. A pandas index value is Hashable and in this codebase's actual use is int. list[Hashable] (or list[int]) is closer to ground truth and matches what _require_expression_row_drop_result later coerces with set(result_indexes).

  • _log_row_drops builds a sorted Counter breakdown manually. Counter.most_common() already gives a deterministic ordering by count; if the intent is alphabetical-by-name, a comment would help future readers, since the natural read of Counter is frequency-ordered.

Tests

  • Good: Coverage spans generator unit tests, sync builder paths (with and without skip), and both async outcomes (partial shrink + all-dropped fatal). The async tests exercise RowGroupBufferManager and CompletionTracker interactions, which is the right place to catch index-mapping bugs.

  • Gap: No test exercises a downstream column that depends on the expression result after rows shrink. With replace_buffer(allow_resize=True), the next column's generator now runs against a smaller batch — that's the intended behavior, but a regression test confirming expression → llm-text → expression chains stay aligned (and that skip-metadata round-trips through expression drops) would harden against future merge-back regressions.

  • Gap: test_generate_drops_row_specific_template_errors uses {{ 1 / denominator }} with denominator=0. Worth adding a case that exercises the broad except Exception with a non-arithmetic shape — e.g. {{ items[0] }} where items is sometimes [] — so the test suite would catch a future narrowing of the catch clause if someone tightens it without thinking through the failure surface.

Observability

  • Drop logs include a structured breakdown — good. Consider also emitting a counter through whatever metrics surface the engine already has (self._reporter?) so dashboards can track expression-drop rates without log scraping. Not blocking.

Backward compatibility / public surface

  • The behavior change is the whole point of the fix, but it's worth flagging in the changelog: prior to this PR, a single empty render aborted the column; after, it warns and drops the row. Consumers asserting len(output) == len(input) for expression columns will now see fewer rows. The PR description says "Architecture docs already describe dropped-row outcomes" — verify the public-facing docs (Fern site under fern/) actually reflect this for expression columns specifically; the existing dropped-row docs may have only described LLM/sampler paths.

Verdict

Approve with minor changes. The architectural shift (per-row drops with index-preserving merge) is sound, the async fatal-error path is correctly wired, and tests cover the important scenarios. The asks above (narrowing except Exception, removing the duplicate dtype validation, sharing the row-drop validator between sync and async, fixing the row=None log) are quality cleanups, not blockers. The downstream-chain test would be the most valuable add before merge.

@greptile-apps

greptile-apps Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR changes expression column generation from a fail-whole-column strategy to a drop-invalid-rows strategy: each row that produces an empty render, a row-specific Jinja error, or a cast failure is silently dropped with structured warning counts, while expressions that drop every row still abort loudly via UserTemplateError. The change threads through the generator, the sync builder's full-column and skip-aware merge paths, and the async scheduler's batch and from-scratch task paths.

  • ExpressionColumnGenerator: processes rows one at a time, retains per-row indexes in the result DataFrame, and raises UserTemplateError only when zero rows survive.
  • dataset_builder.py: adds _merge_row_dropped_generated_records for the skip-aware path and passes allow_resize=True to replace_buffer for expression generators; the skip-metadata round-trip uses hidden restore-ID columns (not position-based matching), so it correctly handles missing rows.
  • async_scheduler.py: adds _run_full_column_from_scratch_with_row_drops for root expression columns (dispatched as from_scratch), maps result rows by index in _batch_result_by_row_index, and promotes UserTemplateError from expression tasks to a fatal scheduler error.

Confidence Score: 5/5

Safe to merge — the row-drop logic is correctly threaded through all three execution paths (generator, sync builder, async scheduler) and the skip-metadata round-trip uses hidden restore-ID columns rather than positional matching, so dropped rows can't corrupt surviving rows' skip state.

All three execution paths (generator per-row loop, sync full-column and skip-aware merge, async batch and from-scratch) handle partial drops and all-dropped failures consistently. The index-keyed merge in both sync and async paths is correct. The skip-metadata restoration is ID-based, not position-based, so row drops do not misalign skip metadata for surviving rows. Tests cover all key combinations including skip+drop interaction and the all-dropped fatal path for both batch and from-scratch tasks.

No files require special attention.

Important Files Changed

Filename Overview
packages/data-designer-engine/src/data_designer/engine/column_generators/generators/expression.py Adds per-row drop logic with Counter-based breakdown, preserves row indexes in the result DataFrame, and raises UserTemplateError only when all rows are dropped.
packages/data-designer-engine/src/data_designer/engine/dataset_builders/async_scheduler.py Adds _run_full_column_from_scratch_with_row_drops for root expression columns, replaces sequential result_idx with index-keyed lookup in _batch_result_by_row_index, and promotes UserTemplateError to a fatal scheduler error via _is_fatal_expression_template_error.
packages/data-designer-engine/src/data_designer/engine/dataset_builders/dataset_builder.py Adds _merge_row_dropped_generated_records for skip-aware merging of dropped-row results, passes allow_resize=True to both replace_buffer call sites when the generator supports row drops.
packages/data-designer-engine/tests/engine/column_generators/generators/test_expression.py Adds unit tests for empty-render drops, template-error drops, type-cast drops, and all-rows-dropped fatal case; index preservation is verified explicitly.
packages/data-designer-engine/tests/engine/dataset_builders/test_async_builder_integration.py Adds four async integration tests covering partial row-group shrink, all-dropped fatal abort, root-expression all-dropped fatal, and skip+row-drop interaction.
packages/data-designer-engine/tests/engine/dataset_builders/test_dataset_builder.py Adds sync builder tests for full-column row-drop shrink and skip-aware row-drop shrink.

Reviews (3): Last reviewed commit: "Merge branch 'main' into codex/issue-749..." | Re-trigger Greptile

Comment on lines +2063 to 2070
result_records = result_df.to_dict(orient="records")
if supports_row_drops:
return dict(zip(result_df.index.to_list(), result_records, strict=True))
return dict(zip(active_row_indices, result_records, strict=True))

def _get_rg_size(self, row_group: int) -> int:
try:
return self._row_groups.row_group_size(row_group)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Third guard in _require_expression_row_drop_result is unreachable

Given that the two preceding checks already pass — (1) no duplicate result indexes, and (2) every result index is a member of active_index_set — the result set is a subset of active_row_indices with no duplicates. Because active_row_indices is itself duplicate-free (built from range(rg_size) minus pre-dropped rows), len(active_index_set) == len(active_row_indices), so len(result_df) <= len(active_row_indices) is guaranteed and the third if can never fire. This is dead code, not a runtime bug, but it may create a false sense of coverage for this guard.

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/data-designer-engine/src/data_designer/engine/dataset_builders/async_scheduler.py
Line: 2063-2070

Comment:
**Third guard in `_require_expression_row_drop_result` is unreachable**

Given that the two preceding checks already pass — (1) no duplicate result indexes, and (2) every result index is a member of `active_index_set` — the result set is a subset of `active_row_indices` with no duplicates. Because `active_row_indices` is itself duplicate-free (built from `range(rg_size)` minus pre-dropped rows), `len(active_index_set) == len(active_row_indices)`, so `len(result_df) <= len(active_row_indices)` is guaranteed and the third `if` can never fire. This is dead code, not a runtime bug, but it may create a false sense of coverage for this guard.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

)
return result

def _task_supports_row_drops(self, task: Task, generator: ColumnGenerator) -> bool:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think root expression columns fall through this check. Root full-column columns are scheduled as from_scratch, so an expression with no upstream refs, like ExpressionColumnConfig(name="empty_root", expr="{{ \"\" }}"), does not hit the new fatal expression path and the scheduler drops the whole row group instead of raising. It also reaches the zero-column DataFrame case in expression.py before rendering. Maybe route root expressions through the batch path, or treat expression from_scratch tasks as row-drop-capable too?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in cdf9ffd by taking option 2 (treat expression from_scratch tasks as row-drop-capable).

  • _task_supports_row_drops now accepts both batch and from_scratch for expression configs, so the fatal-error path fires for root expressions too.
  • New helper _run_full_column_from_scratch_with_row_drops mirrors _run_batch's expression handling (active-row-index tracking, _require_expression_row_drop_result validation, per-row _drop_row + update_cell) but skips the pre_skipped evaluation since root expressions have no upstream refs.
  • Fixed the zero-column case you flagged too: DataFrame.to_dict(orient="records") returns [] for any 0-column DataFrame regardless of row count, so a root expression dispatched against a fresh row group buffer would otherwise skip rendering entirely and fail the strict=True zip. ExpressionColumnGenerator.generate now synthesizes empty per-row dicts in that case so the loop still fires once per row.

Covered by the new test_root_expression_all_dropped_async_fails_loudly regression test.

assert tracker.is_row_group_complete(1, 2, all_cols)


@pytest.mark.asyncio(loop_scope="session")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: could add the async equivalent of the sync skip+drop regression here. The temp smoke I ran passes, but a committed test would pin the active_row_indices mapping when a skipped row appears before a dropped expression row.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in cdf9ffd as test_expression_row_drops_with_skip_async_row_group. It mirrors the sync test_expression_column_row_drops_shrink_sync_skip_aware_batch setup: a seed producing ["skip-me", "", "keep"] feeds an expression with SkipConfig(when="{{ seed == 'skip-me' }}", value="SKIPPED") and expr="{{ seed }}". The test asserts:

  • row 0 (skipped) keeps seed="skip-me" with copy="SKIPPED"
  • row 1 (empty render) is dropped via tracker.is_dropped(0, 1)
  • row 2 is preserved with copy="keep"
  • the warning log shows dropped 1/2 (the skipped row never reaches generate)

This pins the active_row_indices mapping when a skipped row precedes a dropped expression row.

Address PR #757 review feedback from @andreatgretel:

- Root expression columns are scheduled as ``from_scratch`` rather than
  ``batch``, so the prior ``_task_supports_row_drops`` check (which gated
  on ``task_type == "batch"``) silently dropped the whole row group on a
  user template error instead of raising. Accept both task types and route
  expression ``from_scratch`` tasks through a shared row-drop helper that
  preserves indices and honors the fatal-when-all-dropped contract.
- ``DataFrame.to_dict(orient='records')`` returns ``[]`` for any 0-column
  DataFrame regardless of row count, so a root expression dispatched
  against a fresh row group buffer would otherwise skip rendering
  entirely. Synthesize empty per-row dicts in ``ExpressionColumnGenerator``
  so the loop fires once per row.
- Add async coverage for the root-expression fatal path and for the
  skip-then-drop interaction (async equivalent of the existing sync
  skip-aware regression).
@nabinchha

Copy link
Copy Markdown
Contributor Author

Pushed cdf9ffd addressing @andreatgretel's review feedback.

Changes

  • async_scheduler.py: _task_supports_row_drops now accepts both batch and from_scratch task types for expression configs. New _run_full_column_from_scratch_with_row_drops helper routes root expression seed tasks through the same per-row drop / fatal-when-all-dropped contract as batch expression tasks, so partial drops shrink the row group and all-dropped expressions raise DatasetGenerationError instead of silently dropping the whole row group.
  • expression.py: handle the 0-column DataFrame case. DataFrame.to_dict(orient="records") returns [] for any 0-column DataFrame regardless of row count, so a root expression dispatched against a fresh row group buffer would otherwise skip rendering entirely and fail the strict=True zip. Synthesizes empty per-row dicts so the loop fires once per row.
  • test_async_builder_integration.py: two new regression tests — test_root_expression_all_dropped_async_fails_loudly (root expression fatal path) and test_expression_row_drops_with_skip_async_row_group (async equivalent of the sync skip+drop coverage).

Tests

  • New tests pass.
  • Full test_async_scheduler.py + test_dataset_builder.py suite still passes (207 tests).
  • Engine suite: 2253 passed (5 pre-existing tiktoken-network failures unrelated to this PR).
  • ruff check and ruff format --check clean on changed files.

@nabinchha nabinchha requested a review from andreatgretel June 23, 2026 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Handle row-level failures in Jinja expression columns by dropping invalid rows

2 participants