Skip to content

fix(connectors): handle unrecognized types with explicit coverage and diagnostics#3196

Open
atharvalade wants to merge 3 commits into
apache:masterfrom
atharvalade:fix/postgres-unrecognized-types-coerce-string
Open

fix(connectors): handle unrecognized types with explicit coverage and diagnostics#3196
atharvalade wants to merge 3 commits into
apache:masterfrom
atharvalade:fix/postgres-unrecognized-types-coerce-string

Conversation

@atharvalade
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #3175

Rationale

Unrecognized Postgres types silently coerced to String, returning opaque InvalidRecord on failure with no column or type context.

What changed?

The default _ branch in extract_column_value attempted to read any unrecognized Postgres type as Option<String>, failing silently with Error::InvalidRecord and no indication of which column or type caused it.

Added explicit handlers for DATE, TIME, INTERVAL, OID, NAME, BPCHAR, and 13 array types (_BOOL, _INT2, _INT4, _INT8, _FLOAT4, _FLOAT8, _TEXT, _VARCHAR, _CHAR, _BPCHAR, _UUID, _JSON, _JSONB). The default branch now logs a warning with column name and type before attempting String fallback, and returns InvalidRecordValue with a diagnostic message on failure.

Local Execution

  • Passed
  • Pre-commit hooks ran

AI Usage

  1. Opus 4.6
  2. Scaffolding, Writing comments, Writing PR Description
  3. Verified via cargo check, clippy, fmt, and full test suite (24/24 pass)
  4. Yes, all code can be explained

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 29, 2026

Codecov Report

❌ Patch coverage is 36.04061% with 126 lines in your changes missing coverage. Please review.
✅ Project coverage is 19.23%. Comparing base (611fca0) to head (73fcce0).
⚠️ Report is 36 commits behind head on master.

Files with missing lines Patch % Lines
core/connectors/sources/postgres_source/src/lib.rs 36.04% 124 Missing and 2 partials ⚠️

❌ Your patch check has failed because the patch coverage (36.04%) is below the target coverage (50.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@              Coverage Diff              @@
##             master    #3196       +/-   ##
=============================================
- Coverage     74.10%   19.23%   -54.87%     
  Complexity      943      943               
=============================================
  Files          1159     1157        -2     
  Lines        102033    90511    -11522     
  Branches      79083    67579    -11504     
=============================================
- Hits          75607    17414    -58193     
- Misses        23765    72678    +48913     
+ Partials       2661      419     -2242     
Components Coverage Δ
Rust Core 1.05% <36.04%> (-74.28%) ⬇️
Java SDK 60.14% <ø> (ø)
C# SDK 69.07% <ø> (-0.31%) ⬇️
Python SDK 81.43% <ø> (ø)
Node SDK 91.40% <ø> (-0.13%) ⬇️
Go SDK 39.43% <ø> (ø)
Files with missing lines Coverage Δ
core/connectors/sources/postgres_source/src/lib.rs 41.04% <36.04%> (-26.14%) ⬇️

... and 657 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@atharvalade atharvalade changed the title fix(postgres): handle unrecognized types with explicit coverage and diagnostics fix(connectors): handle unrecognized types with explicit coverage and diagnostics Apr 29, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs.

If you need a review, please ensure CI is green and the PR is rebased on the latest master. Don't hesitate to ping the maintainers - either @core on Discord or by mentioning them directly here on the PR.

Thank you for your contribution!

@github-actions github-actions Bot added stale Inactive issue or pull request and removed stale Inactive issue or pull request labels May 7, 2026
@hubcio
Copy link
Copy Markdown
Contributor

hubcio commented May 14, 2026

/ready

@github-actions github-actions Bot added the S-waiting-on-review PR is waiting on a reviewer label May 14, 2026
Copy link
Copy Markdown
Contributor

@slbotbm slbotbm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/author

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current code matches against wrong literals, which means that everything is coerced into a string.

Current literals (wrong):

- _BOOL
- _INT2
- _INT4
- _INT8
- _FLOAT4
- _FLOAT8
- _TEXT
- _VARCHAR
- _CHAR
- _BPCHAR
- _UUID
- _JSON
- _JSONB

Correct literals:

- BOOL[]
- INT2[]
- INT4[]
- INT8[]
- FLOAT4[]
- FLOAT8[]
- TEXT[]
- VARCHAR[]
- CHAR[]
- UUID[]
- JSON[]
- JSONB[]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the code treats BPCHAR as a possible option, but sqlx reports the display name CHAR[], so the BPCHAR-style array match can never fire.

Comment on lines +1025 to +1037
let value: Option<Vec<bool>> = row
.try_get(column_index)
.map_err(|_| Error::InvalidRecord)?;
Ok(value
.map(|arr| {
serde_json::Value::Array(
arr.into_iter().map(serde_json::Value::Bool).collect(),
)
})
.unwrap_or(serde_json::Value::Null))
}
"_INT2" => {
let value: Option<Vec<i16>> = row
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For arms decoding Option<Vec<T>>, the code assumes every element is non-null, but postgresql can contain null elements. Thus, if a null value is supplied to the current code, sqlx will fail to decode it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like Option<Vec<Option<T>>> could mitigate this.

Copy link
Copy Markdown
Contributor

@slbotbm slbotbm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separately, the current code also does not handle the following types:

  • TIMETZ
  • TIMETZ[]
  • DATE[]
  • TIME[]
  • TIMESTAMP[]
  • TIMESTAMPTZ[]
  • INTERVAL[]

Comment on lines +1201 to +1210
let total_secs = interval.microseconds / 1_000_000;
let remaining_us = (interval.microseconds % 1_000_000).unsigned_abs();
let hours = total_secs / 3600;
let mins = (total_secs % 3600) / 60;
let secs = total_secs % 60;
if remaining_us != 0 {
parts.push(format!(
"{:02}:{:02}:{:02}.{:06}",
hours, mins, secs, remaining_us
));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part formats negative numbers incorrectly. For example, the above code would convert -1.5 seconds into 00:00:-1.500000

Comment on lines +916 to 919
"INT4" | "OID" => {
let value: Option<i32> = row
.try_get(column_index)
.map_err(|_| Error::InvalidRecord)?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code current treats OID as i32, but in postgres, that is u32, which can lead batch processing failures. I feel decoding this to an Oid type would be a good approach.

I also noticed you're decoding individual OID's, but not but not Vec<OID>

Comment on lines +980 to 983
}
"TIMESTAMP" | "TIMESTAMPTZ" => {
let value: Option<DateTime<Utc>> = row
.try_get(column_index)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though not related to your changes, in sqlx, TIMESTAMP is chrono::NaiveDateTime, while TIMESTAMPZ is chrono::DateTime<Utc>. If the user passes TIMESTAMP, the conversion will fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-review PR is waiting on a reviewer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unrecognized Postgres types silently coerce to String

3 participants