Skip to content

Fix CHAR/VARCHAR length overflow when writing reconcile intermediate data#2390

Closed
moomindani wants to merge 1 commit intodatabrickslabs:mainfrom
moomindani:fix/recon-varchar-length-overflow
Closed

Fix CHAR/VARCHAR length overflow when writing reconcile intermediate data#2390
moomindani wants to merge 1 commit intodatabrickslabs:mainfrom
moomindani:fix/recon-varchar-length-overflow

Conversation

@moomindani
Copy link
Copy Markdown
Contributor

@moomindani moomindani commented Apr 22, 2026

Changes

What does this PR do?

Strip CHAR(n)/VARCHAR(n) length constraints from DataFrames before writing intermediate data to Delta during reconciliation. This prevents DELTA_EXCEED_CHAR_VARCHAR_LIMIT errors when source data contains space-padded CHAR values.

Root cause

Some data sources (e.g., Teradata) return CHAR(n) values with space padding via JDBC, resulting in values that exceed the declared column length (e.g., a CHAR(16) column returning 16 digits + 16 spaces = 32 characters). Delta enforces CHAR/VARCHAR length constraints through column metadata (__CHAR_VARCHAR_TYPE_STRING), causing writes to fail for these padded values.

This was observed with Teradata via Lakehouse Federation but not with Lakebase (PostgreSQL) via Lakehouse Federation.

Fix

Strip all column metadata via col.alias(name, metadata={}) before writing intermediate DataFrames to Delta. This removes the constraint that Delta uses for length enforcement. The intermediate data is temporary and does not need metadata preservation.

Linked issues

Fixes #2389

Tests

  • manually tested with Teradata via Lakehouse Federation
  • added unit tests
  • added integration tests

Test plan

  • test_strip_char_varchar_constraints_strips_metadata — verifies CHAR/VARCHAR metadata is stripped
  • test_strip_char_varchar_constraints_preserves_types — verifies column types are preserved
  • All existing reconcile unit tests pass

Some data sources (e.g., Teradata) return CHAR(n) values with space
padding via JDBC, resulting in values that exceed the declared column
length. Delta enforces CHAR/VARCHAR length constraints through column
metadata (__CHAR_VARCHAR_TYPE_STRING), causing writes to fail for these
padded values.

Strip all column metadata via col.alias(metadata={}) before writing
intermediate DataFrames to Delta. This removes the constraint that
Delta uses for length enforcement.

Observed with Teradata via Lakehouse Federation but not with Lakebase
(PostgreSQL) via Lakehouse Federation.

Co-authored-by: Isaac
@moomindani
Copy link
Copy Markdown
Contributor Author

Reopened as #2428 on an upstream branch to bypass the fork-PR OIDC restriction on JFrog auth (CI cannot run on fork PRs). All review comments here are preserved as history; further discussion will happen on #2428.

@moomindani moomindani closed this May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: Reconcile fails with DELTA_EXCEED_CHAR_VARCHAR_LIMIT when source has CHAR/VARCHAR padded values

1 participant