fix(destination-s3-data-lake): produce lowercase column names for Glue catalog compatibility by devin-ai-integration[bot] · Pull Request #76059 · airbytehq/airbyte

devin-ai-integration · 2026-04-02T22:49:34Z

What

The S3 Data Lake connector does not produce lowercase column names when writing to the Glue catalog, which breaks downstream engines like Snowflake and Athena.

The connector was missing a custom TableSchemaMapper implementation and fell back to NoopTableSchemaMapper, which passes column names through unchanged. The sibling GCS Data Lake connector already has a working implementation of this pattern.

Resolves https://github.com/airbytehq/oncall/issues/11856
Original issue: #76058

How

New S3DataLakeTableSchemaMapper — Implements TableSchemaMapper with @Singleton. toColumnName() calls Transformations.toAlphanumericAndUnderscore(name).lowercase() to sanitize and lowercase all column names. toFinalTableName() applies the same transformation to namespace and table name.
transformSchemaWithMappedNames() in S3DataLakeStreamLoader — Post-processes the Iceberg schema produced by IcebergUtil.toIcebergSchema() to replace original column names with the mapped (lowercased) names from stream.tableSchema.columnSchema.inputToFinalColumnNames. Airbyte metadata columns are skipped since they are already valid.

Both changes follow the pattern established by GcsDataLakeTableSchemaMapper and GcsDataLakeStreamLoader.

Review guide

⚠️ Key concern for reviewers: transformSchemaWithMappedNames creates a new Schema(mappedFields) without preserving identifier field IDs from the original schema. The original code (icebergUtil.toIcebergSchema(stream)) returned a Schema that included identifier fields for dedup mode. The GCS version handles this explicitly with a withIdentifierFields boolean parameter — the S3 version does not. Please verify whether dropping identifier field IDs here causes a regression for dedup syncs, or whether they are re-applied downstream by IcebergTableSynchronizer.maybeApplySchemaChanges().

S3DataLakeTableSchemaMapper.kt — New file. Core fix lives in toColumnName() (line 41). Review the toColumnType() mappings — these were copied from the GCS connector (BigLake/Parquet types) and should be verified for S3/Glue correctness.
S3DataLakeStreamLoader.kt — Lines 46–89. The transformSchemaWithMappedNames() method and the changed incomingSchema initialization.
S3DataLakeTableSchemaMapperTest.kt — Unit tests for the mapper.
metadata.yaml — Version bump to 0.3.47.
docs/integrations/destinations/s3-data-lake.md — Changelog entry.

User Impact

Column names written to the Glue catalog will now be lowercase and sanitized (special characters replaced with underscores). This fixes compatibility with Snowflake, Athena, and other engines that require lowercase Glue identifiers.

For existing tables with mixed-case column names: the schema evolution logic in IcebergTableSynchronizer will see the lowercased names as new columns. Reviewers should verify the behavior here — it may require a full refresh for affected streams.

Can this PR be safely reverted and rolled back?

YES 💚

Link to Devin session: https://app.devin.ai/sessions/4328e94738f64f5880a3d21a37916762

…e catalog compatibility Add S3DataLakeTableSchemaMapper to transform column names using Transformations.toAlphanumericAndUnderscore() followed by .lowercase(), matching the pattern used by the GCS Data Lake connector. Also add transformSchemaWithMappedNames() to S3DataLakeStreamLoader to post-process the Iceberg schema with the mapped column names from the stream's table schema. Resolves airbytehq/oncall#11856 Co-Authored-By: bot_apk <apk@cognition.ai>

devin-ai-integration · 2026-04-02T22:49:36Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

github-actions · 2026-04-02T22:49:57Z

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

PR Slash Commands

Airbyte Maintainers (that's you!) can execute the following slash commands on your PR:

🛠️ Quick Fixes
- /format-fix - Fixes most formatting issues.
- /bump-version - Bumps connector versions, scraping changelog description from the PR title.
  - Bump types: patch (default), minor, major, major_rc, rc, promote.
  - The rc type is a smart default: applies minor_rc if stable, or bumps the RC number if already RC.
  - The promote type strips the RC suffix to finalize a release.
  - Example: /bump-version type=rc or /bump-version type=minor
- /bump-progressive-rollout-version - Alias for /bump-version type=rc. Bumps with an RC suffix and enables progressive rollout.
❇️ AI Testing and Review (internal link: AI-SDLC Docs):
- /ai-prove-fix - Runs prerelease readiness checks, including testing against customer connections.
- /ai-canary-prerelease - Rolls out prerelease to 5-10 connections for canary testing.
- /ai-review - AI-powered PR review for connector safety and quality gates.
🚀 Connector Releases:
- /publish-connectors-prerelease - Publishes pre-release connector builds (tagged as {version}-preview.{git-sha}) for all modified connectors in the PR.
☕️ JVM connectors:
- /update-connector-cdk-version connector=<CONNECTOR_NAME> - Updates the specified connector to the latest CDK version.
  Example: /update-connector-cdk-version connector=destination-bigquery
🐍 Python connectors:
- /poe connector source-example lock - Run the Poe lock task on the source-example connector, committing the results back to the branch.
- /poe source example lock - Alias for /poe connector source-example lock.
- /poe source example use-cdk-branch my/branch - Pin the source-example CDK reference to the branch name specified.
- /poe source example use-cdk-latest - Update the source-example CDK dependency to the latest available version.
⚙️ Admin commands:
- /force-merge reason="<REASON>" - Force merges the PR using admin privileges, bypassing CI checks. Requires a reason.
  Example: /force-merge reason="CI is flaky, tests pass locally"

📚 Show Repo Guidance

Helpful Resources

Breaking Changes Guide - Breaking changes, migration guides, and upgrade deadlines
Developing Connectors Locally
Managing Connector Secrets
On-Demand Regression Tests
#connector-ci-issues
#connector-publish-updates
#connector-build-statuses

📝 Edit this welcome message.

Co-Authored-By: bot_apk <apk@cognition.ai>

github-actions · 2026-04-02T22:56:19Z

`destination-s3-data-lake` Connector Test Results

25 tests 24 ✅ 3s ⏱️
3 suites 0 💤
3 files 1 ❌

For more details on these failures, see this check.

Results for commit e0728d7.

♻️ This comment has been updated with latest results.

Co-Authored-By: bot_apk <apk@cognition.ai>

github-actions · 2026-04-02T22:57:39Z

Deploy preview for airbyte-docs ready!

✅ Preview
https://airbyte-docs-vz12t84aj-airbyte-growth.vercel.app

Built with commit e0728d7.
This pull request is being automatically deployed with vercel-action

Co-Authored-By: bot_apk <apk@cognition.ai>

Octavia Squidington III (octavia-squidington-iii) added the connectors/destination/s3-data-lake label Apr 2, 2026

devin-ai-integration Bot and others added 2 commits April 2, 2026 22:51

chore: bump destination-s3-data-lake version to 0.3.47

ec1059a

Co-Authored-By: bot_apk <apk@cognition.ai>

style: fix Spotless formatting in S3DataLakeTableSchemaMapper and tests

11f52c8

Co-Authored-By: bot_apk <apk@cognition.ai>

fix: use correct ColumnType.type property name in tests

dd844e5

Co-Authored-By: bot_apk <apk@cognition.ai>

style: collapse short assertEquals calls to single lines per Spotless

e0728d7

Co-Authored-By: bot_apk <apk@cognition.ai>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(destination-s3-data-lake): produce lowercase column names for Glue catalog compatibility#76059

fix(destination-s3-data-lake): produce lowercase column names for Glue catalog compatibility#76059
devin-ai-integration[bot] wants to merge 5 commits into
masterfrom
devin/1775169931-s3-data-lake-lowercase-columns

devin-ai-integration Bot commented Apr 2, 2026 •

edited

Loading

Uh oh!

devin-ai-integration Bot commented Apr 2, 2026

Uh oh!

github-actions Bot commented Apr 2, 2026

PR Slash Commands

Helpful Resources

Uh oh!

github-actions Bot commented Apr 2, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

devin-ai-integration Bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

How

Review guide

User Impact

Can this PR be safely reverted and rolled back?

Uh oh!

devin-ai-integration Bot commented Apr 2, 2026

🤖 Devin AI Engineer

Uh oh!

github-actions Bot commented Apr 2, 2026

👋 Greetings, Airbyte Team Member!

PR Slash Commands

Helpful Resources

Uh oh!

github-actions Bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

destination-s3-data-lake Connector Test Results

Uh oh!

github-actions Bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

devin-ai-integration Bot commented Apr 2, 2026 •

edited

Loading

github-actions Bot commented Apr 2, 2026 •

edited

Loading

`destination-s3-data-lake` Connector Test Results

github-actions Bot commented Apr 2, 2026 •

edited

Loading