fix(teradata): validate destination table schema in precheck and upload#686
fix(teradata): validate destination table schema in precheck and upload#686
Conversation
When an existing table is missing required columns (e.g., only has id and text), the upload silently fails with a raw Teradata SQL error that surfaces as "Unexpected job failure after 3 retries" in the UI. Add schema validation in both precheck() and upload_dataframe() so users see a clear error message naming the missing columns and suggesting remediation.
… import - Bump version to 1.4.21 (1.4.20 already taken on main) - Add CHANGELOG.md entry for schema validation feature - Remove unused REQUIRED_DESTINATION_COLUMNS import in test file (ruff F401)
…DatabaseName Teradata doesn't support LIMIT syntax — the precheck schema validation was silently failing due to a broad except clause. Also adds DatabaseName filter to DBC.ColumnsV queries to avoid false matches in multi-database environments. Addresses Cursor Bugbot findings (high: invalid LIMIT syntax, medium: missing DatabaseName scope).
…validation Eliminate unnecessary database round trip by using a single SELECT ColumnName query instead of a separate existence check followed by a column fetch. An empty result already indicates the table doesn't exist.
|
failed tests unrelated to changes |
|
I don't think we usually go this deep on precheck. What was the motivation behind this. |
|
Users could set a custom record id key. I am not real psyched on this thing in general. |
|
|
||
| missing = REQUIRED_DESTINATION_COLUMNS - existing | ||
| if missing: | ||
| raise DestinationConnectionError( |
There was a problem hiding this comment.
Recommend swapping out the name of this error to avoid the word "Connection" which could lead to a network-related hypothesis of the issue.
Perhaps "DestinationConfigurationMismatchError" or something like that
| with self.get_cursor() as cursor: | ||
| cursor.execute("SELECT DATABASE") | ||
| current_db = cursor.fetchone()[0].strip() | ||
| cursor.execute( |
There was a problem hiding this comment.
Do we want to also check for columns in the schema which are NOT NULL if we don't have a way to fill them? It's not a perfectly easy check because there could be NOT NULL with a default specified also, in which case we don't want to fail (this pattern is common with last_updated columns, for example).
We want to move toward richer errors, especially if they are able to deterministically predict failure of the job. In fact, I'd say that's the entire purpose of the precheck. Also note that precheck-on-save is just one case. We also want to (soon? Eventually?) support a preflight of jobs when they startup in case some condition has changed (think column added or removed). |
Schema validation errors (missing required columns) were raised as DestinationConnectionError which surfaces as "Error in connecting to downstream data source" in the UI — misleading for a schema mismatch. Added DestinationSchemaError as a new recognized error type with a clear "Destination table schema error" message.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit a4dfe43. Configure here.
| # Columns that must exist in the destination table for uploads to succeed. | ||
| # record_id is needed for delete-before-upsert; element_id, text, and type | ||
| # carry the core document content. | ||
| REQUIRED_DESTINATION_COLUMNS = {"id", "record_id", "element_id", "text", "type"} |
There was a problem hiding this comment.
Hardcoded record_id ignores configurable record_id_key setting
Medium Severity
REQUIRED_DESTINATION_COLUMNS hardcodes "record_id", but SQLUploaderConfig.record_id_key is a configurable field that users can set to a different column name. Both _validate_table_schema and upload_dataframe compare against this hardcoded set, so a table with a valid custom record-ID column (e.g. "my_record_id") will be falsely rejected as missing "record_id". The required set needs to dynamically include self.upload_config.record_id_key instead.
Additional Locations (2)
Reviewed by Cursor Bugbot for commit a4dfe43. Configure here.


Summary
TeradataUploader.precheck()— queriesDBC.ColumnsVto verify an existing table has all required columns (id,record_id,element_id,text,type) before the workflow startsupload_dataframe()as a runtime guard, so even if precheck is bypassed or the table is altered mid-run, the error is caught and surfaced cleanlyDestinationConnectionError(a recognized error type) with a clear message naming the missing columns and suggesting remediationidandtext) would fail deep in the SQL INSERT and surface as the generic "Unexpected job failure after 3 retries" in the UITest plan
test_teradata_uploader_precheck_rejects_missing_columns— precheck raises on incomplete schematest_teradata_uploader_precheck_schema_check_skipped_when_table_name_none— auto-create mode skips validationtest_teradata_uploader_precheck_schema_check_tolerates_extra_columns— extra columns are finetest_teradata_uploader_precheck_schema_check_case_insensitive— handles Teradata uppercasetest_teradata_uploader_precheck_skips_schema_check_when_table_missing— nonexistent table defers to create_destinationtest_teradata_uploader_upload_dataframe_rejects_missing_columns— runtime upload guardNote
Medium Risk
Adds new fail-fast schema validation for Teradata uploads, which can newly block workflows that previously proceeded (and failed later) when destination tables are missing required columns. Low code-surface change, but it affects runtime behavior and user-facing errors for existing Teradata destinations.
Overview
Teradata destination uploads now validate the destination table schema before a workflow runs and again right before inserting rows.
TeradataUploader.precheck()queriesDBC.ColumnsVfor an existing table and raises a newDestinationSchemaErrorif required columns (id,record_id,element_id,text,type) are missing; it skips validation when the table is missing (defer tocreate_destination) or whentable_nameisNone(auto-create mode).upload_dataframe()adds the same guard using discovered table columns to fail with a clear, actionable message instead of a late SQL failure.Also bumps version to
1.4.21and expands unit tests to cover missing/extra columns, case-insensitivity, and skip conditions.Reviewed by Cursor Bugbot for commit a4dfe43. Bugbot is set up for automated code reviews on this repo. Configure here.