Skip to content

Fix CHAR/VARCHAR length overflow when writing reconcile intermediate data#2428

Open
moomindani wants to merge 4 commits intomainfrom
fix/recon-varchar-length-overflow
Open

Fix CHAR/VARCHAR length overflow when writing reconcile intermediate data#2428
moomindani wants to merge 4 commits intomainfrom
fix/recon-varchar-length-overflow

Conversation

@moomindani
Copy link
Copy Markdown
Contributor

Changes

What does this PR do?

Strip CHAR(n)/VARCHAR(n) length constraints from DataFrames before writing intermediate data to Delta during reconciliation. This prevents DELTA_EXCEED_CHAR_VARCHAR_LIMIT errors when source data contains space-padded CHAR values.

Root cause

Some data sources (e.g., Teradata) return CHAR(n) values with space padding via JDBC, resulting in values that exceed the declared column length (e.g., a CHAR(16) column returning 16 digits + 16 spaces = 32 characters). Delta enforces CHAR/VARCHAR length constraints through column metadata (__CHAR_VARCHAR_TYPE_STRING), causing writes to fail for these padded values.

This was observed with Teradata via Lakehouse Federation but not with Lakebase (PostgreSQL) via Lakehouse Federation.

Fix

Strip all column metadata via col.alias(name, metadata={}) before writing intermediate DataFrames to Delta. This removes the constraint that Delta uses for length enforcement. The intermediate data is temporary and does not need metadata preservation.

Linked issues

Fixes #2389

Tests

  • manually tested with Teradata via Lakehouse Federation
  • added unit tests
  • added integration tests

Test plan

  • test_strip_char_varchar_constraints_strips_metadata — verifies CHAR/VARCHAR metadata is stripped
  • test_strip_char_varchar_constraints_preserves_types — verifies column types are preserved
  • All existing reconcile unit tests pass

Reopened from #2390 on an upstream branch to bypass the fork-PR OIDC restriction on JFrog auth (CI cannot run on fork PRs). All review comments and history are preserved on the original PR.

Some data sources (e.g., Teradata) return CHAR(n) values with space
padding via JDBC, resulting in values that exceed the declared column
length. Delta enforces CHAR/VARCHAR length constraints through column
metadata (__CHAR_VARCHAR_TYPE_STRING), causing writes to fail for these
padded values.

Strip all column metadata via col.alias(metadata={}) before writing
intermediate DataFrames to Delta. This removes the constraint that
Delta uses for length enforcement.

Observed with Teradata via Lakehouse Federation but not with Lakebase
(PostgreSQL) via Lakehouse Federation.

Co-authored-by: Isaac
- black reformats list comprehension to single line in test helper
- ruff removes unused StringType import (was used in main, dropped after merge)

Co-authored-by: Isaac
@codecov
Copy link
Copy Markdown

codecov Bot commented May 9, 2026

Codecov Report

❌ Patch coverage is 60.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.78%. Comparing base (1c32cbb) to head (2ef8da0).

Files with missing lines Patch % Lines
...abricks/labs/lakebridge/reconcile/recon_capture.py 60.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2428      +/-   ##
==========================================
- Coverage   65.78%   65.78%   -0.01%     
==========================================
  Files          98       98              
  Lines        9237     9242       +5     
  Branches      992      992              
==========================================
+ Hits         6077     6080       +3     
- Misses       2984     2986       +2     
  Partials      176      176              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

`_write_df_to_delta` is a module-level function and accessed
ReconIntermediatePersist._strip_char_varchar_constraints from outside the
class, which pylint flags as protected-access. Rename to public since the
helper is effectively a utility.

Also rename mock_select unused arg to *_cols and fix test fn names.

Co-authored-by: Isaac
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

✅ 148/148 passed, 5 skipped, 24m59s total

Running from acceptance #4311

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: Reconcile fails with DELTA_EXCEED_CHAR_VARCHAR_LIMIT when source has CHAR/VARCHAR padded values

1 participant