Skip to content

Fixes #22644: Add Iceberg table support for GCS and S3 Datalake connector#27735

Open
mohitjeswani01 wants to merge 13 commits intoopen-metadata:mainfrom
mohitjeswani01:feature/22644-iceberg-gcp-support
Open

Fixes #22644: Add Iceberg table support for GCS and S3 Datalake connector#27735
mohitjeswani01 wants to merge 13 commits intoopen-metadata:mainfrom
mohitjeswani01:feature/22644-iceberg-gcp-support

Conversation

@mohitjeswani01
Copy link
Copy Markdown

@mohitjeswani01 mohitjeswani01 commented Apr 25, 2026

🧊 Iceberg Table Support for GCS and S3 Datalake Connector

Fixes #22644

Submitted as part of the WeMakeDevs × OpenMetadata Hackathon
(Main Project Track — OpenMetadata Connectors & Ingestion)
Implemented following maintainer direction from @harshach:
"Check existing connectors such as S3 connector how its supporting
iceberg and reuse existing parsing logic that's already there"


The Problem

Companies migrating from BigQuery to Iceberg on GCS could not
ingest Iceberg table metadata into OpenMetadata. The Datalake
connector already supported GCS and already had Iceberg JSON
parsing logic — but two precise bugs prevented them from
working together.

Bug 1 — Wrong Table Discovery
DatalakeGcsClient.get_table_names() listed every blob
individually. An Iceberg table orders/ produced 5+ entries:

orders/metadata/v1.metadata.json → treated as separate table
orders/metadata/v2.metadata.json → treated as separate table
orders/data/00000-0-abc.parquet → treated as separate table

Result: 5 bogus tables instead of 1 correct orders table.

Bug 2 — Iceberg Column Parser Never Called
In readers/dataframe/json.py, the raw_data gate only
opened for JSON files containing a "$schema" key.
Iceberg metadata JSON uses "format-version" — so
raw_data was always None, meaning
_is_iceberg_delta_metadata() and
_parse_iceberg_delta_schema() were never reached despite
being already correctly implemented.

Result: Iceberg tables showed garbage columns
(format-version, table-uuid, location)
instead of the actual data schema.


The Fix — 5 Production Files, Zero New Abstractions

All fixes follow existing patterns exactly and reuse
existing parsing logic unchanged.

Fix 1 — raw_data Gate (readers/dataframe/json.py)

# BEFORE — only JSON Schema files triggered Iceberg parsing:
raw_data = content if data.get("$schema") else None

# AFTER — Iceberg metadata files now also trigger it:
raw_data = (
    content if isinstance(data, dict) and (
        data.get("$schema") is not None           # JSON Schema (existing)
        or data.get("format-version") is not None # Apache Iceberg ← NEW
        or (                                       # Delta Lake structure ← NEW
            isinstance(data.get("schema"), dict)
            and isinstance(data.get("schema", {}).get("fields"), list)
        )
    ) else None
)

This single change unlocks the entire Iceberg parsing pipeline
that was already correctly implemented in datalake_utils.py.

Fix 2 — Iceberg Table Directory Detection (gcs.py + s3.py)

Added _get_iceberg_tables() to both DatalakeGcsClient
and DatalakeS3Client. Detects Iceberg table directories
by scanning for */metadata/v*.metadata.json pattern,
keeps only the latest version per table directory, and
yields one entry per Iceberg table instead of individual blobs.

Non-Iceberg buckets fall through to the original listing
behavior — zero breaking changes.

Fix 3 — Table Name + Type (datalake_utils.py + metadata.py)

Added get_iceberg_table_name_from_metadata_path() which
extracts the directory name from the metadata path:
"warehouse/orders/metadata/v2.metadata.json""orders"

standardize_table_name() now uses this for Iceberg paths.
get_tables_name_and_type() now yields TableType.Iceberg
for Iceberg metadata files.

Fix 4 — Fetch Path Preservation (metadata.py)

Ensured fetch_dataframe_first_chunk() receives the original
metadata blob path for fetching while the Table entity displays
the clean directory name. Both values flow through the pipeline
correctly.


Complete E2E Flow After Fix

For an Iceberg table warehouse/orders/ on GCS:

Step Before After
get_table_names() 5 blobs yielded 1 entry: latest metadata path
standardize_table_name() warehouse/orders/metadata/v2.metadata.json orders
TableType Regular Iceberg
raw_data gate None → parser never called Set → _is_iceberg_delta_metadata() called
get_columns() Garbage keys from JSON Correct schema from schema.fields

What Was NOT Changed (Reused 100%)

Following @harshach's direction to reuse existing logic:

  • _is_iceberg_delta_metadata() — zero changes
  • _parse_iceberg_delta_schema() — zero changes
  • _parse_struct_fields() — zero changes
  • set_google_credentials() — zero changes
  • DatalakeGcsClient credential initialization — zero changes
  • No new connector directory created
  • No schema JSON changes
  • No frontend changes
  • No Java changes

Tests — 18 Tests, Zero Infrastructure Required

tests/unit/readers/test_json_reader.py:
Screenshot 2026-04-26 021022

tests/unit/source/database/test_iceberg_discovery.py:
Screenshot 2026-04-26 022429

New tests added: 18 across 2 files

tests/unit/readers/test_json_reader.py — 4 new tests:

  • test_raw_data_set_for_iceberg_metadata — gate opens for Iceberg JSON
  • test_iceberg_columns_parsed_correctly — correct columns extracted
  • test_raw_data_none_for_regular_json — backward compatibility
  • test_raw_data_set_for_json_schema — existing behavior preserved

tests/unit/source/database/test_iceberg_discovery.py — 14 new tests:

  • GCS: table detection, single table yield, multiple tables,
    fallback for non-Iceberg, mixed bucket handling
  • S3: same coverage for parity
  • Table name extraction: correct names, non-Iceberg returns None
  • TableType: Iceberg for metadata files, Regular for others

Files Changed

File Change
readers/dataframe/json.py +12 lines — raw_data gate fix
datalake/clients/gcs.py +35 lines — Iceberg directory detection
datalake/clients/s3.py +38 lines — symmetric S3 fix
utils/datalake/datalake_utils.py +22 lines — table name helper
datalake/metadata.py +6 lines — TableType + fetch path
tests/unit/readers/test_json_reader.py +80 lines — 4 tests
tests/unit/source/database/test_iceberg_discovery.py +305 lines — 14 tests (new file)

Type of Change

  • Bug fix
  • New feature

Checklist

  • I have read the CONTRIBUTING document
  • My PR title is Fixes #22644: Add Iceberg table support for GCS and S3 Datalake connector
  • I have commented on my code, particularly in
    hard-to-understand areas
  • For JSON Schema changes: No schema changes were made.
    All fixes are within the Python ingestion layer only.
    Existing GCS and S3 credential schemas are reused unchanged.
  • I have added tests around the new logic (16 new tests)
  • The issue properly describes the goal and this PR
    implements it fully following maintainer guidance

Summary by Gitar

  • Refactored JSON schema parsing:
    • Modularized column processing by introducing _parse_column to handle type inference for complex ARRAY and JSON types.
  • Improved JSON structure merging:
    • Extracted _process_unique_json_key to streamline the logic for merging nested structures and _ArrayOfStruct types during metadata discovery.

This will update automatically on new commits.

Copilot AI review requested due to automatic review settings April 25, 2026 21:18
@mohitjeswani01 mohitjeswani01 requested a review from a team as a code owner April 25, 2026 21:18
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Comment thread ingestion/src/metadata/ingestion/source/database/datalake/clients/gcs.py Outdated
Comment thread ingestion/src/metadata/ingestion/source/database/datalake/clients/gcs.py Outdated
Comment thread ingestion/src/metadata/ingestion/source/database/datalake/clients/s3.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@mohitjeswani01
Copy link
Copy Markdown
Author

Hi @harshach sir could you please add a safe to test label? thanks you ! 🙏

@harshach harshach added the safe to test Add this label to run secure Github workflows on PRs label Apr 26, 2026
@github-actions
Copy link
Copy Markdown
Contributor

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

@mohitjeswani01
Copy link
Copy Markdown
Author

thanks @harshach sir i will monitor the checks and work accordingly!🙏

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 26, 2026

🔴 Playwright Results — 5 failure(s), 12 flaky

✅ 3980 passed · ❌ 5 failed · 🟡 12 flaky · ⏭️ 86 skipped

Shard Passed Failed Flaky Skipped
✅ Shard 1 299 0 0 4
🟡 Shard 2 751 0 3 8
🟡 Shard 3 744 0 3 7
🔴 Shard 4 767 5 3 18
✅ Shard 5 687 0 0 41
🟡 Shard 6 732 0 3 8

Genuine Failures (failed on all attempts)

Pages/DataContractInheritance.spec.ts › Edit Inherited Contract - Creates new asset contract instead of modifying parent (shard 4)
Error: �[2mexpect(�[22m�[31mlocator�[39m�[2m).�[22mtoContainText�[2m(�[22m�[32mexpected�[39m�[2m)�[22m failed

Locator: getByTestId('contract-title')
Expected substring: �[32m"dp_for_edit_inherited_95e266ca"�[39m
Timeout: 15000ms
Error: element(s) not found

Call log:
�[2m  - Expect "toContainText" with timeout 15000ms�[22m
�[2m  - waiting for getByTestId('contract-title')�[22m

Pages/DataContractInheritance.spec.ts › Delete Button Disabled - Fully inherited contracts cannot be deleted (shard 4)
Error: �[2mexpect(�[22m�[31mlocator�[39m�[2m).�[22mtoContainText�[2m(�[22m�[32mexpected�[39m�[2m)�[22m failed

Locator: getByTestId('contract-title')
Expected substring: �[32m"dp_delete_disabled_0def01e3"�[39m
Timeout: 15000ms
Error: element(s) not found

Call log:
�[2m  - Expect "toContainText" with timeout 15000ms�[22m
�[2m  - waiting for getByTestId('contract-title')�[22m

Pages/DataContractInheritance.spec.ts › Run Validation - Inherited contract validation uses entity-based validation (shard 4)
Error: �[2mexpect(�[22m�[31mlocator�[39m�[2m).�[22mtoContainText�[2m(�[22m�[32mexpected�[39m�[2m)�[22m failed

Locator: getByTestId('contract-title')
Expected substring: �[32m"dp_run_validation_ed1327eb"�[39m
Timeout: 15000ms
Error: element(s) not found

Call log:
�[2m  - Expect "toContainText" with timeout 15000ms�[22m
�[2m  - waiting for getByTestId('contract-title')�[22m

Pages/DataContractInheritance.spec.ts › Remove Asset - Inherited contract no longer shown when asset is removed from Data Product (shard 4)
Error: �[2mexpect(�[22m�[31mlocator�[39m�[2m).�[22mtoContainText�[2m(�[22m�[32mexpected�[39m�[2m)�[22m failed

Locator: getByTestId('contract-title')
Expected substring: �[32m"dp_remove_asset_cf5f3e5e"�[39m
Timeout: 15000ms
Error: element(s) not found

Call log:
�[2m  - Expect "toContainText" with timeout 15000ms�[22m
�[2m  - waiting for getByTestId('contract-title')�[22m

Pages/DataContractInheritance.spec.ts › Delete Asset Contract - Falls back to showing inherited contract from Data Product (shard 4)
Error: �[2mexpect(�[22m�[31mlocator�[39m�[2m).�[22mtoContainText�[2m(�[22m�[32mexpected�[39m�[2m)�[22m failed

Locator: getByTestId('contract-title')
Expected substring: �[32m"dp_delete_fallback_ad1070c6"�[39m
Timeout: 15000ms
Error: element(s) not found

Call log:
�[2m  - Expect "toContainText" with timeout 15000ms�[22m
�[2m  - waiting for getByTestId('contract-title')�[22m

🟡 12 flaky test(s) (passed on retry)
  • Features/ActivityAPI.spec.ts › Activity event is created when description is updated (shard 2, 1 retry)
  • Features/ActivityAPI.spec.ts › Activity event is created when owner is added (shard 2, 1 retry)
  • Features/Glossary/GlossaryWorkflow.spec.ts › should start term as Draft when glossary has reviewers (shard 2, 2 retries)
  • Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
  • Features/Workflows/WorkflowOssRestrictions.spec.ts › execution history tab loads and API call succeeds (shard 3, 1 retry)
  • Flow/PersonaFlow.spec.ts › Set default persona for team should work properly (shard 3, 1 retry)
  • Pages/CustomProperties.spec.ts › Should clear search and show all properties for apiCollection in right panel (shard 4, 1 retry)
  • Pages/CustomProperties.spec.ts › Set dateTime-cp custom property on column and verify in UI (shard 4, 1 retry)
  • Pages/DataContracts.spec.ts › Create Data Contract and validate for Directory (shard 4, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
  • Pages/UserDetails.spec.ts › Admin user can edit teams from the user profile (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

@mohitjeswani01
Copy link
Copy Markdown
Author

Quality Gate Failed Quality Gate failed for 'open-metadata-ingestion'

Failed conditions E Security Review Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

oops the sonarqube failed i will check it shortly !

…tadata#22644)

- Integer version comparison (v10 > v9) via regex group capture
- Single-pass listing: eliminates double bucket scan for non-Iceberg buckets
- Mixed buckets: regular files outside Iceberg dirs are now yielded
- Removes extra head_object/get_blob API calls (use listing size directly)
- Fix get_tables_name_and_type return type annotation to 5-tuple
- Update tests: remove _get_iceberg_tables direct calls, add v10 regression
@mohitjeswani01 mohitjeswani01 force-pushed the feature/22644-iceberg-gcp-support branch from 59033b6 to b7b8904 Compare April 28, 2026 18:12
Copilot AI review requested due to automatic review settings April 28, 2026 18:12
Comment thread ingestion/src/metadata/ingestion/source/database/datalake/clients/s3.py Outdated
Copilot AI review requested due to automatic review settings April 30, 2026 09:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Comment thread ingestion/tests/unit/source/database/test_iceberg_discovery.py Outdated
Comment thread ingestion/tests/unit/source/database/test_iceberg_discovery.py Outdated
@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 30, 2026

Code Review 👍 Approved with suggestions 7 resolved / 8 findings

Adds Iceberg table support for GCS and S3 Datalake connectors, resolving memory bloat, redundant API calls, and versioning bugs. Refactor test classes to move away from unittest.TestCase to align with project guidelines.

💡 Quality: Test classes inherit unittest.TestCase contrary to guidelines

📄 ingestion/tests/unit/readers/test_json_reader.py:264-278

The new tests in test_json_reader.py (lines 264-343) are added inside a unittest.TestCase class, and the new test file test_iceberg_discovery.py uses plain classes (good) but the existing test file uses self.assertEqual patterns. The project's testing guidelines state: 'Use pytest. Use plain assert statements. No unittest.TestCase inheritance.'

The new tests in test_iceberg_discovery.py correctly use plain assert — good. However, the 4 new tests added to test_json_reader.py are methods on an existing unittest.TestCase subclass, which is pre-existing and out of scope for this PR, but the new test methods themselves do use plain assert instead of self.assert*, which is a minor inconsistency within the TestCase class (mixing styles).

✅ 7 resolved
Bug: Lexicographic version comparison fails for v10+ metadata

📄 ingestion/src/metadata/ingestion/source/database/datalake/clients/gcs.py:140 📄 ingestion/src/metadata/ingestion/source/database/datalake/clients/s3.py:90
The Iceberg "latest metadata" detection uses lexicographic string comparison (key_name > existing / blob.name > existing) to determine the highest version. This breaks for version numbers ≥ 10 because v9.metadata.json > v10.metadata.json lexicographically. The GCS docstring even acknowledges this ("v10 > v9 in padded form; for typical low version counts this is correct") but the assumption is fragile — production Iceberg tables commonly exceed 10 metadata versions as each schema evolution or snapshot creates a new version.

Extract the numeric version and compare as integers instead.

Bug: Mixed Iceberg + regular files: regular tables silently dropped

📄 ingestion/src/metadata/ingestion/source/database/datalake/clients/gcs.py:153-157 📄 ingestion/src/metadata/ingestion/source/database/datalake/clients/s3.py:101-107
When _get_iceberg_tables() returns any results, get_table_names() yields only Iceberg metadata entries and then returns, completely skipping all non-Iceberg files (CSV, Parquet, etc.) in the same bucket/prefix. This means a bucket containing both Iceberg tables and regular data files will lose visibility of all regular tables.

The test test_gcs_mixed_iceberg_and_regular_files documents this as intentional, but it's a data loss issue for users who legitimately have mixed content in a single bucket prefix. Instead of an early return, yield both Iceberg and non-Iceberg entries, filtering out blobs that belong to Iceberg table directories (metadata/, data/ subdirs).

Performance: Double bucket listing for non-Iceberg S3/GCS buckets

📄 ingestion/src/metadata/ingestion/source/database/datalake/clients/s3.py:100 📄 ingestion/src/metadata/ingestion/source/database/datalake/clients/s3.py:109 📄 ingestion/src/metadata/ingestion/source/database/datalake/clients/gcs.py:152 📄 ingestion/src/metadata/ingestion/source/database/datalake/clients/gcs.py:159
get_table_names() calls _get_iceberg_tables() which iterates all objects in the bucket via paginated API calls, then if no Iceberg tables are found, iterates all objects again in the regular listing loop. For large buckets with thousands of objects this doubles the number of LIST API calls (which are also billed per-request on S3/GCS).

Consider doing a single pass: iterate objects once, check each for the Iceberg metadata pattern, collect Iceberg tables and regular files in one loop, then yield appropriately.

Quality: Return type annotation still says 4-tuple, yields 5-tuple

📄 ingestion/src/metadata/ingestion/source/database/datalake/metadata.py:230 📄 ingestion/src/metadata/ingestion/source/database/datalake/metadata.py:278
The get_tables_name_and_type method's return type annotation at line 230 still declares Iterable[Tuple[str, TableType, SupportedTypes, Optional[int]]] but the method now yields a 5-tuple including key_name. This will cause type-checking tools (mypy, pyright) to flag a mismatch and confuses readers about the method's contract.

Performance: Entire bucket listing materialized in memory before yielding

📄 ingestion/src/metadata/ingestion/source/database/datalake/clients/s3.py:110-124 📄 ingestion/src/metadata/ingestion/source/database/datalake/clients/gcs.py:148-162
Both get_table_names implementations (GCS and S3) now collect all objects into two in-memory lists (iceberg_tables dict and regular_files list) before yielding anything. The previous code was a streaming generator that yielded objects one at a time with O(1) memory overhead.

For buckets with millions of objects (common in data lakes), this can cause significant memory pressure. For example, 10M objects × ~200 bytes per tuple ≈ 2 GB of RAM just for the listing.

Consider a two-pass approach or a streaming approach:

  1. First pass: only collect the iceberg_tables dict (small — one entry per Iceberg table) while streaming regular files through a secondary check.
  2. Alternatively, do a prefix-based listing for */metadata/v*.metadata.json first (if the storage API supports it), then list the rest while excluding known Iceberg prefixes.

A simpler partial fix: yield Iceberg entries after the scan, but for regular files, do a second streaming pass instead of accumulating them all in a list. This keeps memory proportional to the number of Iceberg tables (typically small) rather than the total object count.

...and 2 more resolved from earlier reviews

🤖 Prompt for agents
Code Review: Adds Iceberg table support for GCS and S3 Datalake connectors, resolving memory bloat, redundant API calls, and versioning bugs. Refactor test classes to move away from unittest.TestCase to align with project guidelines.

1. 💡 Quality: Test classes inherit unittest.TestCase contrary to guidelines
   Files: ingestion/tests/unit/readers/test_json_reader.py:264-278

   The new tests in `test_json_reader.py` (lines 264-343) are added inside a `unittest.TestCase` class, and the new test file `test_iceberg_discovery.py` uses plain classes (good) but the existing test file uses `self.assertEqual` patterns. The project's testing guidelines state: 'Use pytest. Use plain assert statements. No unittest.TestCase inheritance.'
   
   The new tests in `test_iceberg_discovery.py` correctly use plain `assert` — good. However, the 4 new tests added to `test_json_reader.py` are methods on an existing `unittest.TestCase` subclass, which is pre-existing and out of scope for this PR, but the new test methods themselves do use plain `assert` instead of `self.assert*`, which is a minor inconsistency within the TestCase class (mixing styles).

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@mohitjeswani01
Copy link
Copy Markdown
Author

Quality Gate Failed Quality Gate failed for 'open-metadata-ingestion'

Failed conditions E Security Review Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

@harshach sir i solved all the problems which appeared in sonarcube but i am unable to notice the exact reason why it is continuously failing could you check that once ? thanks 🙏

Copilot AI review requested due to automatic review settings May 1, 2026 08:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Comment on lines 203 to 206
def get_tables_name_and_type( # pylint: disable=too-many-branches
self,
) -> Iterable[Tuple[str, TableType, SupportedTypes, Optional[int]]]: # noqa: UP006, UP045
) -> Iterable[Tuple[str, TableType, SupportedTypes, Optional[int], str]]: # noqa: UP006, UP045
"""
Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DatalakeSource overrides DatabaseServiceSource.get_tables_name_and_type() / yield_table() with incompatible tuple shapes (5-tuple instead of the base 2-tuple). Ingestion runs basedpyright with reportIncompatibleMethodOverride = "error", so this change is very likely to fail type-checking and break the expected contract for the database topology stages.

Recommendation: keep the override signatures compatible with DatabaseServiceSource (return/accept Tuple[str, TableType]) and carry fetch_key via an internal mapping/context (e.g., store table_name -> key_name during discovery) or by using displayName for the human-readable name while keeping name/fetch key stable.

Copilot uses AI. Check for mistakes.
Comment on lines 249 to 252
def yield_table(
self,
table_name_and_type: Tuple[str, TableType, SupportedTypes, Optional[int]], # noqa: UP006, UP045
table_name_and_type: Tuple[str, TableType, SupportedTypes, Optional[int], str], # noqa: UP006, UP045
) -> Iterable[Either[CreateTableRequest]]:
Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yield_table() now expects a 5-tuple, but DatabaseServiceSource.yield_table() is defined to accept Tuple[str, TableType]. With basedpyright configured to error on incompatible overrides, this signature change is likely to fail type-checking.

Recommendation: keep the override signature compatible and source fetch_key from an internal mapping/context built during get_tables_name_and_type() (or refactor the topology contract to carry a dedicated table-info object).

Copilot uses AI. Check for mistakes.
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 1, 2026

Quality Gate Failed Quality Gate failed for 'open-metadata-ingestion'

Failed conditions
E Security Review Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for Iceberg tables in GCP

3 participants