Skip to content

fix(connectors): Add unwrap_envelope transform and envelope detection to fix source-sink format mismatch#3197

Merged
hubcio merged 5 commits into
apache:masterfrom
atharvalade:fix/shared-source-sink-contract
May 25, 2026
Merged

fix(connectors): Add unwrap_envelope transform and envelope detection to fix source-sink format mismatch#3197
hubcio merged 5 commits into
apache:masterfrom
atharvalade:fix/shared-source-sink-contract

Conversation

@atharvalade

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #3174

Rationale

Sources (e.g. Postgres) wrap row data in a DatabaseRecord envelope while sinks (e.g. Iceberg) expect flat JSON matching the target table schema — no shared contract exists, producing silent null failures.

What changed?

The Postgres source emits {table_name, operation_type, timestamp, data: {...}, old_data} envelopes, but the Iceberg sink's Arrow JSON reader maps these nested structures to top-level fields as null, silently violating non-nullable constraints.

This adds a reusable unwrap_envelope transform to the connector SDK that extracts a nested field (e.g. data) and promotes it as the top-level payload, plus explicit envelope detection in the Iceberg sink that errors with an actionable message instead of failing silently.

Local Execution

  • Passed
  • Pre-commit hooks ran (fmt, clippy, license-headers, trailing-whitespace, trailing-newline all clean; 119 tests pass across SDK + integration suites)

AI Usage

  1. Opus 4.6
  2. used for codebase exploration and following existing transform patterns
  3. All 8 new unit tests pass locally, clippy/fmt clean, existing 111 tests unaffected
  4. Yes, all code can be explained

@codecov

codecov Bot commented Apr 29, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 93.47826% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.39%. Comparing base (9947345) to head (38d37f7).

Files with missing lines Patch % Lines
...nectors/sdk/src/transforms/json/unwrap_envelope.rs 96.27% 5 Missing and 1 partial ⚠️
core/connectors/sdk/src/transforms/mod.rs 0.00% 3 Missing ⚠️
...e/connectors/sdk/src/transforms/unwrap_envelope.rs 85.00% 3 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             master    #3197       +/-   ##
=============================================
- Coverage     74.17%   55.39%   -18.79%     
  Complexity      943      943               
=============================================
  Files          1237     1238        +1     
  Lines        112641    99864    -12777     
  Branches      89200    76424    -12776     
=============================================
- Hits          83549    55317    -28232     
- Misses        26299    41956    +15657     
+ Partials       2793     2591      -202     
Components Coverage Δ
Rust Core 50.65% <93.47%> (-24.66%) ⬇️
Java SDK 58.44% <ø> (ø)
C# SDK 70.65% <ø> (ø)
Python SDK 81.43% <ø> (ø)
Node SDK 91.41% <ø> (-0.12%) ⬇️
Go SDK 39.91% <ø> (ø)
Files with missing lines Coverage Δ
core/connectors/sdk/src/transforms/json/mod.rs 53.84% <ø> (ø)
core/connectors/sdk/src/transforms/mod.rs 26.66% <0.00%> (-2.97%) ⬇️
...e/connectors/sdk/src/transforms/unwrap_envelope.rs 85.00% <85.00%> (ø)
...nectors/sdk/src/transforms/json/unwrap_envelope.rs 96.27% <96.27%> (ø)

... and 297 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@atharvalade atharvalade changed the title Add unwrap_envelope transform and envelope detection to fix source-sink format mismatch fix(connectors): Add unwrap_envelope transform and envelope detection to fix source-sink format mismatch Apr 29, 2026
@github-actions

github-actions Bot commented May 7, 2026

Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs.

If you need a review, please ensure CI is green and the PR is rebased on the latest master. Don't hesitate to ping the maintainers - either @core on Discord or by mentioning them directly here on the PR.

Thank you for your contribution!

@github-actions github-actions Bot added S-stale Inactive issue or pull request and removed S-stale Inactive issue or pull request labels May 7, 2026
@hubcio

hubcio commented May 14, 2026

Copy link
Copy Markdown
Contributor

@atharvalade please rebase this PR

/author

@github-actions github-actions Bot added the S-waiting-on-author PR is waiting on author response label May 14, 2026
@atharvalade atharvalade force-pushed the fix/shared-source-sink-contract branch from 8defa26 to 7c92361 Compare May 17, 2026 06:32
@atharvalade

Copy link
Copy Markdown
Contributor Author

@atharvalade please rebase this PR

/author

done

@atharvalade

Copy link
Copy Markdown
Contributor Author

/ready

@github-actions github-actions Bot added S-waiting-on-review PR is waiting on a reviewer and removed S-waiting-on-author PR is waiting on author response labels May 17, 2026

@hubcio hubcio left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review notes (not posted as line comments because targets are out-of-diff or already tracked):

  • the FFI (consume) return code is still discarded at core/connectors/runtime/src/sink.rs:642-650 (ConsumeCallback returns i32, sdk/src/sink.rs:28-36) and AutoCommit::When(AutoCommitWhen::PollingMessages) at core/connectors/runtime/src/sink.rs:467 commits offsets before the sink runs. this PR widens the blast radius: any envelope-detected batch now return Err(Error::InvalidRecord) from the sink, gets logged, and the offsets have already advanced. tracked in #2927 and #2928 but this PR adds a brand new fail path triggerable purely by source/sink misconfiguration, so they should land together.

  • core/connectors/sinks/delta_sink/src/sink.rs:106 has the same envelope-vs-flat-JSON mismatch and no detection or unwrap guidance. fixing one sink and leaving the other broken means #3174 is still reachable through the postgres → delta path. either replicate the detection (with the tighter shape suggested below) or, better, document unwrap_envelope as the single fix and drop sink-side sniffing entirely.

  • neither core/connectors/sdk/README.md nor core/connectors/sinks/iceberg_sink/README.md mention the new unwrap_envelope transform or the source-compatibility requirement for postgres_source → iceberg. discoverability is the whole point of this fix; please add at least a short entry to both.

  • architecturally, sink-side envelope sniffing is the wrong layer. transform discipline + docs is the real fix. if sink-side detection stays, it should be opt-in via config, not a hardcoded postgres-shaped heuristic.

overall, i'm not fan of this PR.

Comment thread core/connectors/sinks/iceberg_sink/src/router/mod.rs Outdated
Comment thread core/connectors/sinks/iceberg_sink/src/router/mod.rs Outdated
Comment thread core/connectors/sinks/iceberg_sink/src/router/mod.rs Outdated
Comment thread core/connectors/sinks/iceberg_sink/src/router/mod.rs Outdated
Comment thread core/connectors/sdk/src/transforms/json/unwrap_envelope.rs
Comment thread core/connectors/sdk/src/transforms/unwrap_envelope.rs
Comment thread core/connectors/sdk/src/transforms/json/unwrap_envelope.rs
Comment thread core/connectors/sdk/src/transforms/mod.rs Outdated
@github-actions github-actions Bot added S-waiting-on-author PR is waiting on author response and removed S-waiting-on-review PR is waiting on a reviewer labels May 20, 2026
@atharvalade

Copy link
Copy Markdown
Contributor Author

removed all sink-side sniffing and shifted to transform discipline + docs (SDK, Iceberg, and Postgres READMEs updated). The FFI return code and AutoCommit issues are pre-existing (#2927/#2928) and out of scope now that this PR no longer adds a new fail path through that code @hubcio

@atharvalade

Copy link
Copy Markdown
Contributor Author

/ready

@github-actions github-actions Bot added S-waiting-on-review PR is waiting on a reviewer and removed S-waiting-on-author PR is waiting on author response labels May 22, 2026
Comment thread core/connectors/sinks/iceberg_sink/README.md Outdated
@github-actions github-actions Bot added S-waiting-on-author PR is waiting on author response and removed S-waiting-on-review PR is waiting on a reviewer labels May 22, 2026
@atharvalade atharvalade force-pushed the fix/shared-source-sink-contract branch from 9114935 to b6d6b47 Compare May 23, 2026 03:58
@atharvalade

Copy link
Copy Markdown
Contributor Author

/ready

@github-actions github-actions Bot added S-waiting-on-review PR is waiting on a reviewer and removed S-waiting-on-author PR is waiting on author response labels May 23, 2026
@hubcio hubcio merged commit 96d4677 into apache:master May 25, 2026
55 checks passed
@github-actions github-actions Bot removed the S-waiting-on-review PR is waiting on a reviewer label May 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

No shared contract between source output and sink input format

3 participants