Skip to content

refactor: remove allow_resize support#766

Open
andreatgretel wants to merge 3 commits into
mainfrom
andreatgretel/fix/remove-allow-resize
Open

refactor: remove allow_resize support#766
andreatgretel wants to merge 3 commits into
mainfrom
andreatgretel/fix/remove-allow-resize

Conversation

@andreatgretel

Copy link
Copy Markdown
Contributor

📋 Summary

Remove the deprecated allow_resize column config field now that row-count changes are handled at workflow boundaries. This also removes the sync resize fallback paths and updates docs/tests to point users toward workflow chaining for expansion and filtering.

🔗 Related Issue

Part of #552

🔄 Changes

  • Remove allow_resize from SingleColumnConfig; configs that pass it now fail at the Pydantic boundary as an extra field.
  • Remove resize handling from DatasetBuilder, DatasetBatchManager, skip metadata restore, and custom cell-by-cell generator validation.
  • Keep the legacy sync engine available only through the existing DATA_DESIGNER_ASYNC_ENGINE=0 opt-out.
  • Reject row-count changes from pre-batch processors and document workflow-boundary transforms as the migration path.
  • Update Fern and architecture docs, including the agent rollout ingestion example, to avoid resize-enabled custom columns.
  • Remove obsolete resize behavior tests and add/adjust invariant coverage.

🔍 Attention Areas

⚠️ Reviewers: Please pay special attention to the following:

  • base.py - public config schema no longer accepts allow_resize.
  • dataset_builder.py - removes the sync resize fallback and resize-specific builder branches.

🧪 Testing

  • make test passes (not run; targeted affected suite passed)
  • Unit tests added/updated
  • E2E tests added/updated (N/A - not applicable)

Additional validation:

  • .venv/bin/ruff check --fix .
  • .venv/bin/ruff format .
  • .venv/bin/ruff check .
  • .venv/bin/ruff format --check .
  • PYTHONDONTWRITEBYTECODE=1 .venv/bin/pytest -p no:cacheprovider packages/data-designer-config/tests/config/test_columns.py packages/data-designer-config/tests/config/test_skip_config.py packages/data-designer-engine/tests/engine/test_validation.py packages/data-designer-engine/tests/engine/column_generators/generators/test_custom.py packages/data-designer-engine/tests/engine/column_generators/generators/test_async_generators.py packages/data-designer-engine/tests/engine/dataset_builders/utils/test_dataset_batch_manager.py packages/data-designer-engine/tests/engine/dataset_builders/utils/test_skip_tracker.py packages/data-designer-engine/tests/engine/dataset_builders/test_async_builder_integration.py packages/data-designer-engine/tests/engine/dataset_builders/test_dataset_builder.py packages/data-designer/tests/interface/test_data_designer.py (368 passed)

✅ Checklist

  • Follows commit message conventions
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

Remove the allow_resize column config field and the sync resize fallback paths now that row-count changes belong at workflow boundaries.

Enforce size-preserving batch replacement and pre-batch processors, update custom column validation, and revise docs/tests for workflow chaining migration.
@andreatgretel andreatgretel requested a review from a team as a code owner June 23, 2026 20:07
@github-actions

Copy link
Copy Markdown
Contributor

Fern preview: https://nvidia-preview-pr-766.docs.buildwithfern.com/nemo/datadesigner

Fern previews include the docs-website version archive with PR changes synced into latest. Notebook tutorials are rendered without execution outputs in previews.

@greptile-apps

greptile-apps Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR completes the removal of the deprecated allow_resize config field, eliminating all engine-internal row-count-change paths in favor of workflow-boundary transforms. The config schema now hard-rejects allow_resize as an extra field, the async/sync fallback path driven by that field is gone, and both engines unconditionally enforce row-count invariance at every generation-time stage.

  • Config layer: allow_resize removed from SingleColumnConfig; Pydantic's extra="forbid" now rejects it at instantiation. CONFIG_HASH_VERSION bumped to 2, invalidating all pre-removal checkpoints. The stored-config error path flips from COMPATIBLE to INCOMPATIBLE, preventing silent resume against schema-invalid legacy checkpoints.
  • Engine layer: _resolve_async_compatibility, _cell_resize_mode, _finalize_fan_out resize branch, replace_buffer(allow_resize=...), and restore_skip_metadata(allow_resize=...) all removed. _run_stage in ProcessorRunner now raises DatasetProcessingError unconditionally when a PRE_BATCH processor changes row count.
  • Tests and docs: All allow_resize integration tests replaced with invariant-coverage tests; Fern and architecture docs updated to point at workflow chaining as the migration path.

Confidence Score: 5/5

Clean removal with no behavioral regressions; every removed path has either a replacement guard or a test confirming the new rejection behavior.

The removal is thorough and consistent across config, engine, and tests. The fingerprint version bump correctly invalidates old checkpoints, the stored-config error handling was tightened from COMPATIBLE to INCOMPATIBLE (preventing silent resume against now-invalid legacy configs), and every code path that previously required allow_resize has been replaced with an invariant check or removed. No logic errors or broken contracts were found.

No files require special attention. The processor_runner.py change is the subtlest — the log line after _raise_if_pre_batch_resized is unreachable for PRE_BATCH, but this is intentional and covered by the new test.

Important Files Changed

Filename Overview
packages/data-designer-config/src/data_designer/config/base.py Removes allow_resize field from SingleColumnConfig and the validator that prevented it from being combined with skip.
packages/data-designer-config/src/data_designer/config/fingerprint.py Bumps CONFIG_HASH_VERSION 1→2, invalidating all pre-removal checkpoints and preventing unsafe resume from allow_resize-era runs.
packages/data-designer-engine/src/data_designer/engine/dataset_builders/dataset_builder.py Removes _resolve_async_compatibility, _has_allow_resize_columns, _cell_resize_mode, and all resize-aware branches; both build and build_preview now use DATA_DESIGNER_ASYNC_ENGINE unconditionally. Also fixes config-read error handling to return INCOMPATIBLE instead of COMPATIBLE.
packages/data-designer-engine/src/data_designer/engine/dataset_builders/utils/dataset_batch_manager.py replace_buffer loses allow_resize param; row-count equality is now unconditionally enforced and the _num_records_list resize-bookkeeping path is removed.
packages/data-designer-engine/src/data_designer/engine/dataset_builders/utils/processor_runner.py PRE_BATCH row-count changes now always raise inside _run_stage; run_pre_batch_on_df drops the strict_row_count knob and relies on the shared enforcement.
packages/data-designer-engine/src/data_designer/engine/dataset_builders/utils/skip_tracker.py restore_skip_metadata loses the allow_resize parameter; the 1:1 row-identity check is now always enforced (previously conditional on allow_resize=False).
packages/data-designer/src/data_designer/interface/data_designer.py _resolve_client_concurrency_mode drops the allow_resize sync-fallback branch; ClientConcurrencyMode now resolves purely from DATA_DESIGNER_ASYNC_ENGINE.
packages/data-designer-engine/tests/engine/dataset_builders/test_dataset_builder.py All allow_resize integration tests removed; new test_pre_batch_processor_row_count_change_rejected and test_build_resume_always_raises_on_unreadable_stored_config cover the new invariants.

Reviews (3): Last reviewed commit: "fix: address allow_resize review feedbac..." | Re-trigger Greptile

@github-actions

Copy link
Copy Markdown
Contributor

Thanks for putting this together, @andreatgretel — this is a clean, thorough teardown of allow_resize.

Summary

This PR removes the deprecated allow_resize column-config field and all of its supporting machinery: the sync-engine auto-fallback (_resolve_async_compatibility), the cell-resize buffering paths in DatasetBuilder, the replace_buffer(allow_resize=...) and restore_skip_metadata(allow_resize=...) parameters, the skip/resize validation conflict, and the related docs/tests. Row-count changes now belong at workflow boundaries, and pre-batch processors that change row count are rejected uniformly (no longer async-only). The implementation matches the stated intent in the PR description, and the deletion is complete — I found no dangling references to any removed symbol or attribute.

Findings

Suggestions — Take it or leave it

packages/data-designer-engine/src/data_designer/engine/dataset_builders/utils/processor_runner.py:108 — Post-batch row-count message still says "async engine"

  • What: run_pre_batch / run_pre_batch_on_df were unified to reject row-count changes for all engines, and the docs (processors.mdx) were updated to say pre-batch invariance is enforced unconditionally. But run_post_batch's error message still reads "...not supported with the async engine," and the strict_row_count docstring says "Used by the async engine." Since the sync engine is now opt-out-only and on its way out, the post-batch wording is a touch inconsistent with the new pre-batch framing.
  • Why: Minor — could momentarily confuse a user who hits the post-batch error and wonders whether switching engines would help. It wouldn't change behavior.
  • Suggestion: Optionally align the post-batch message with the pre-batch one ("Row-count changes in post-batch processors are not supported; use workflow chaining instead.") if you want the two stages to read consistently. Entirely take-it-or-leave-it.

packages/data-designer-config/tests/config/test_columns.py:286 — Test name slightly outdraws its assertion

  • What: test_allow_resize_extra_field_rejected asserts the generic Pydantic extra="forbid" rejection. That's exactly the right behavior to pin, but the test now really verifies "unknown fields are rejected" rather than anything allow_resize-specific.
  • Why: Purely cosmetic — the assertion is correct and valuable as a regression guard that allow_resize can't sneak back in silently.
  • Suggestion: No change needed; the name is fine as a breadcrumb. Mentioning only in case you'd prefer a comment noting it's a regression guard for the removed field.

What Looks Good

  • The deletion is complete and verifiable. I grepped the full tree for allow_resize, _resolve_async_compatibility, _has_allow_resize_columns, _cell_resize_*, _log_resize_if_changed, and _current_column_display_name — every removed symbol is gone with no stragglers, and the only surviving allow_resize mention is the intentional regression test. warn_at_caller is correctly left in place since it's still used elsewhere in column_configs.py.
  • Nice consolidation of the row-count guard. Factoring the pre-batch check into _raise_if_pre_batch_resized and calling it from both run_pre_batch and run_pre_batch_on_df removes the previous strict_row_count flag duplication and makes the "always enforce" semantics obvious in one place.
  • The error-type plumbing still holds. _raise_if_pre_batch_resized raises DatasetProcessingError, and the new test_pre_batch_processor_row_count_change_rejected expects DatasetGenerationError — these are sibling types, but the batch loop wraps the inner exception's message into DatasetGenerationError, so the match="Pre-batch processor changed row count" substring still lands. Good that the test exercises the real build() path rather than calling the helper directly.
  • Docs were updated in lockstep with behavior. architecture/config.md, architecture/dataset-builders.md, and the Fern pages all had their allow_resize references rewritten toward workflow-chaining, and the rewritten agent-rollout example uses real APIs (DataFrameSeedSource, result.load_dataset()) — I verified those symbols exist and the /concepts/workflow-chaining link target is a real page. This is the kind of doc/code alignment that's easy to skip and you didn't.

Verdict

Ship it (with nits) — This is a tidy, self-consistent removal with matching test and doc updates and no dangling references. The two suggestions are optional polish; nothing blocks merge.

One process note for the author (not a code issue): the PR checklist shows make test was not run (only the targeted affected suites, 368 passing). Given the breadth of the touched builder paths, it'd be worth letting full CI confirm before merge — but the targeted coverage looks well-chosen.


This review was generated by an AI assistant.

@johnnygreco

Copy link
Copy Markdown
Contributor

Nice work on this cleanup — the core removal is nicely concentrated and the public docs are much clearer about using workflow boundaries.

Summary

This branch removes allow_resize from the shared column config surface, deletes the sync-engine resize bookkeeping, and updates the main architecture/Fern docs to point row-count-changing workflows at workflow chaining or after-generation processors. The findings below are scoped to that specific removal: stale docs/comments/types that still describe the old allow_resize contract, or docs changed in this PR whose new guidance no longer matches the stricter row-count behavior introduced here.

Findings

Warnings — Worth addressing

Design issues, missing error handling, test gaps, or violations of project standards that could cause problems later.

fern/versions/latest/pages/concepts/processors.mdx:34 — Post-batch row-count guidance still implies sync post-batch resizing works

  • What: The updated warning says Data Designer enforces row-count invariance in process_before_batch(), and that the async engine also enforces it in process_after_batch(). That reads like sync process_after_batch() can still filter/expand rows. This is directly related to this PR because the PR removes the resize path from DatasetBatchManager.replace_buffer(...); the sync build path now calls run_post_batch(...), then _write_processed_batch(...), which calls strict replace_buffer(...), so any post-batch row-count change fails.
  • Why: This PR is removing the old mid-run resizing escape hatch, so the docs need to describe the new invariant consistently. Users who opt into the legacy sync engine may otherwise follow this new doc text, put filtering/expansion in process_after_batch(), and hit a runtime failure.
  • Suggestion: Update the warning to say generation-time processors must preserve row count in both process_before_batch() and process_after_batch(), and that row-count-changing work belongs in process_after_generation() or a workflow boundary. If sync post-batch resizing is intentionally still supported, we should instead make _write_processed_batch handle it deliberately and add a regression test for that path.

plans/workflow-chaining/workflow-chaining.md:30 — Tracked plan docs still describe allow_resize as active

  • What: A repo-wide cleanup grep still finds many allow_resize references in tracked documentation/planning files, including plans/workflow-chaining/workflow-chaining.md, plans/479/skip-when-conditional-generation.md, and plans/346/async-generators-and-task-queue.md. This is related to this PR because the PR explicitly removes allow_resize support and updates the shipped docs; these tracked docs still contain active-sounding fallback behavior, TODOs saying full removal still requires doc updates, and implementation snippets using allow_resize.
  • Why: The shipped Fern/architecture docs and package comments are clean except for the intentional rejection test, which is great. But these tracked plan files are still discoverable by future contributors and agents, so leaving old allow_resize guidance around weakens this PR's cleanup goal.
  • Suggestion: Either remove/update the obsolete sections, or clearly mark these files/sections as historical and complete so future readers do not treat the old allow_resize behavior as current guidance.

Suggestions — Take it or leave it

Style improvements, minor simplifications, or optional enhancements that would improve code quality.

packages/data-designer-engine/src/data_designer/engine/column_generators/generators/custom.py:177 — Async custom generator type still advertises list returns

  • What: CustomColumnGenerator.generate(...) now returns dict | pd.DataFrame, and this PR updates tests to assert cell-by-cell list returns are rejected. The async counterpart, agenerate(...), is still annotated as returning dict | pd.DataFrame | list[dict].
  • Why: This is directly tied to this PR because list[dict] was the old cell-by-cell resize return shape. The runtime behavior is already fixed because _postprocess_result rejects lists, but the remaining type hint still advertises the contract this PR removes.
  • Suggestion: Change the annotation to dict | pd.DataFrame so the sync and async generator contracts match.

What Looks Good

  • The base config removal is clean: extra="forbid" now rejects allow_resize at config construction, and the tests cover that behavior directly.
  • The engine simplification removes the resize-specific state and restores a single strict replace_buffer contract, which makes row identity and skip metadata easier to reason about.
  • The main shipped docs and architecture pages no longer mention allow_resize; the custom columns, plugin, and agent rollout examples now point users toward workflow boundaries instead.

Verdict

Needs changes — Please address the processor docs mismatch and the residual tracked plan references. The agenerate return annotation is a small nit that would be good to clean up while touching this area.


This review was generated by an AI assistant.

@johnnygreco johnnygreco left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes based on the review feedback already posted: please address the processor docs mismatch and the residual tracked plan references for the removed allow_resize behavior before merge.

@nabinchha

Copy link
Copy Markdown
Contributor

Thanks for putting this together, @andreatgretel — the surface-level removal is tidy and the docs/test updates land in the right places.

Summary

This PR deletes allow_resize from the column-config surface and removes the entire sync-engine resize machinery — _resolve_async_compatibility, _cell_resize_* bookkeeping, _log_resize_if_changed, the replace_buffer(allow_resize=...) and restore_skip_metadata(allow_resize=...) parameters, the SKIP_WITH_ALLOW_RESIZE validator branch, the _resolve_client_concurrency_mode allow_resize fallback, and the _has_allow_resize_columns resume guard. Pre-batch processors are now uniformly strict (the per-engine strict_row_count flag is gone), and the docs/tests are updated to point users at workflow chaining for row-count-changing work. Implementation matches the stated intent, and ruff check / ruff format --check are clean on all 20 changed Python files.

Findings

I'm scoping this review to issues not already raised by the existing greptile / @johnnygreco reviews. Those reviews already cover:

  • The processors.mdx post-batch wording mismatch (johnnygreco).
  • The plans/**/*.md residual allow_resize references (johnnygreco).
  • The agenerate return annotation at custom.py:177 still listing list[dict] (johnnygreco).
  • The earlier greptile post-batch wording / test-name nits — these were addressed in the chore: address bot review nits commit.

Suggestions — Take it or leave it

packages/data-designer-engine/src/data_designer/engine/column_generators/generators/custom.py:229_generate return type still includes list[dict]

  • What: Same stale-annotation pattern @johnnygreco flagged on agenerate (line 177), but it also appears on the sibling private helper. The runtime path can no longer return list[dict] (the cell-by-cell list branch in _postprocess_result is gone), so _generate's annotated return type is wider than its actual contract.

  • Why: Tied to the same cleanup goal as the existing nit — if you're already touching agenerate's annotation, knocking this out at the same time keeps both sync and async signatures in sync with the new contract.

  • Suggestion: Narrow to dict | pd.DataFrame for consistency:

    def _generate(self, data: dict | pd.DataFrame, is_dataframe: bool) -> dict | pd.DataFrame:

packages/data-designer-engine/src/data_designer/engine/dataset_builders/utils/processor_runner.py:63 — Pre-batch row-count INFO log fires immediately before the new error

  • What: _run_stage logs ℹ️ PRE_BATCH processors changed the record count by {delta:+d} records. whenever a processor changes row count, and then both run_pre_batch / run_pre_batch_on_df immediately call _raise_if_pre_batch_resized, which raises DatasetProcessingError. So the user sees an informational "count changed" log line a moment before the error fires — slightly misleading now that pre-batch row-count changes are universally rejected.
  • Why: Minor UX. The breadcrumb is still useful for after-generation and (legacy) sync post-batch, where row-count changes are legitimate, so suppressing it entirely would hurt those callers.
  • Suggestion: Either skip the INFO log when the stage is PRE_BATCH (since we'll error on the next line anyway), or call _raise_if_pre_batch_resized before the log in _run_stage when the stage is PRE_BATCH. Smallest change is probably to inline the check in _run_stage (and drop the helper) so the error and the log don't end up in opposite orders. Entirely optional.

What Looks Good

  • The deletion is genuinely thorough. I grepped for every removed symbol (_resolve_async_compatibility, _has_allow_resize_columns, _cell_resize_results, _cell_resize_mode, _log_resize_if_changed, _current_column_display_name, SKIP_WITH_ALLOW_RESIZE) — every one is gone from the source tree. The only surviving allow_resize mention in shipped code is the intentional regression test in test_columns.py. warn_at_caller is correctly preserved (still used in column_configs.py).
  • Nice unification of the pre-batch row-count guard. Factoring the check into _raise_if_pre_batch_resized and calling it from both run_pre_batch and run_pre_batch_on_df cleanly collapses the previous strict_row_count-flag dance — there's now exactly one definition of "pre-batch processors must preserve row count," and it lives next to its callers.
  • Architecture docs are kept in lock-step with the code. architecture/config.md no longer claims allow_resize is part of the SingleColumnConfig surface, architecture/dataset-builders.md drops the "Resume rejects allow_resize=True" paragraph in favor of the simpler "Resume relies on stable row-group boundaries" framing, and the FULL_COLUMN row-count contract is now stated plainly. This is the kind of cross-cutting doc maintenance that's easy to skip.
  • The new pre-batch invariant test exercises the real build() path (test_pre_batch_processor_row_count_change_rejected) rather than calling the private helper directly, which means the test would still catch a regression if someone reintroduced an inner-wrap that swallowed the inner error. Good test choice.
  • The agent-rollout-ingestion example is realistically rewritten. The new "run the rollout stage, transform between stages, seed the analysis stage with DataFrameSeedSource" pattern uses real public APIs (DataFrameSeedSource exists in seed_source_dataframe.py, result.load_dataset() is the documented accessor) and matches the migration path the PR is selling elsewhere in the docs.

Verdict

Ship it (with nits) once @johnnygreco's CHANGES_REQUESTED items land — the residual plans/** references, the processors.mdx post-batch wording, and the agenerate annotation are the gating items. My two additions above are optional polish you can fold in at the same time you address the agenerate annotation.

One process note: the PR checklist still shows make test was not run (only the targeted suites, 368 passing). Given how broad the builder changes are, it's worth letting a full CI run confirm before merge.


This review was generated by an AI assistant.

@nabinchha

Copy link
Copy Markdown
Contributor

Following up on my earlier review — circling back on resume specifically. I think there's one Warning-tier concern that the existing reviews haven't surfaced.

Warning — Worth addressing

packages/data-designer-config/src/data_designer/config/fingerprint.py:37 — Removing allow_resize silently invalidates resume for in-progress datasets created with the prior version

  • What: The PR correctly removes the _has_allow_resize_columns() resume guard from DatasetBuilder.build() — that's safe in itself because new runs can't have allow_resize=True columns anymore. The subtler problem is the config-hash compatibility check (dataset_builder.py:_check_resume_config_compatibility, lines ~801-815): the fingerprint is computed via fingerprint_config(config.to_dict()), and to_dict() calls model_dump(mode="json") (exportable_config.py:23) with no exclude_defaults. Since ConfigBase's model_config also doesn't set exclude_defaults, Pydantic includes every field — so pre-PR, every column contributed "allow_resize": false to the canonical JSON that got hashed. allow_resize is not in any of the _EXCLUDED_* sets in fingerprint.py, and CONFIG_HASH_VERSION is still 1.

    Net effect after upgrading to this PR:

    Pre-PR per-column dump Post-PR per-column dump
    sample keys {name, drop, allow_resize: false, column_type, skip, propagate_skip, ...} {name, drop, column_type, skip, propagate_skip, ...}
    config_hash hash_A hash_B (different)
    config_hash_version 1 1 (unchanged)

    In _check_resume_config_compatibility the version check passes (both 1), the hash check fails, and the run is marked INCOMPATIBLE.

  • Why: This silently breaks resume across the upgrade boundary for users whose configs are logically unchanged.

    • ResumeMode.ALWAYS → raises "🛑 Cannot resume: the current config or dropped-column artifact policy does not match the config used in the interrupted run." — misleading, since the user didn't change anything.
    • ResumeMode.IF_POSSIBLE → silently logs "▶️ Config has changed since the last run — starting a fresh generation" and discards the user's prior progress.

    There's also a small follow-on hazard in the JSON-config fallback path (dataset_builder.py:817-838): if metadata.json lacked config_hash and the stored builder_config.json happened to still carry allow_resize: true, BuilderConfig.model_validate(...) would raise ValidationError (extra=forbid), the except (OSError, json.JSONDecodeError, ValidationError) clause would log a warning and return _ConfigCompatibility.COMPATIBLE, and the run would attempt to resume a dataset it can't safely resume. The primary config_hash path makes this rare in practice — but it's still an inconsistent failure mode.

  • Suggestion: Three reasonable options, smallest first:

    1. Bump CONFIG_HASH_VERSION to 2 in fingerprint.py:37. The version-mismatch branch (dataset_builder.py:804-810) already does the right thing — it logs "Stored config_hash_version=1 does not match current version=2" and returns INCOMPATIBLE. Users get a clearer error, and IF_POSSIBLE still falls through to a fresh run. This is the most honest signal that the schema changed.
    2. Add "allow_resize" to a column-level _EXCLUDED_* set in fingerprint.py so the key is stripped from hashed columns whether it's present or not. Preserves resume across this upgrade for users whose configs are otherwise unchanged — more forgiving, less honest.
    3. Document the migration in the PR description / changelog so users know to expect a one-time fresh start after upgrade.

    Option 1 is probably the smallest principled change. Option 2 is the most user-friendly. They aren't mutually exclusive.

    While in there, it might also be worth changing the except (..., ValidationError): fallback in _check_resume_config_compatibility to return INCOMPATIBLE rather than COMPATIBLE — silently treating a parse failure as "compatible" feels backwards, especially for the case where the failure is exactly "this file references a removed schema field."


This follow-up was generated by an AI assistant.

@andreatgretel

Copy link
Copy Markdown
Contributor Author

@johnnygreco @nabinchha thanks for the detailed review. I pushed dd7e1353 to address the outstanding feedback:

  • Updated processors.mdx so generation-time processors (process_before_batch() and process_after_batch()) are documented as row-count invariant.
  • Marked the remaining allow_resize plan references as historical / completed so they are not active guidance.
  • Narrowed the stale CustomColumnGenerator.agenerate() and _generate() return annotations.
  • Moved the pre-batch row-count failure ahead of the INFO count-change log and added a regression assertion.
  • Bumped CONFIG_HASH_VERSION to 2 and changed invalid stored-config fallback handling to return incompatible, with resume coverage for schema-invalid legacy configs.

Greptile re-reviewed the latest commit and is green. Full CI is passing.

@johnnygreco johnnygreco left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow-up, @andreatgretel. I re-checked the requested changes against dd7e135: the processor docs now state both generation-time processor stages are row-count invariant, the tracked plan references are clearly marked historical/completed, and the stale custom generator return annotations are narrowed. I also verified the follow-up resume/fingerprint fixes and the added tests. My requested changes are resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants