Fixes #22644: Add Iceberg table support for GCS and S3 Datalake connector by mohitjeswani01 · Pull Request #27735 · open-metadata/OpenMetadata

mohitjeswani01 · 2026-04-25T21:18:26Z

🧊 Iceberg Table Support for GCS and S3 Datalake Connector

Submitted as part of the WeMakeDevs × OpenMetadata Hackathon
(Main Project Track — OpenMetadata Connectors & Ingestion)
Implemented following maintainer direction from @harshach:
"Check existing connectors such as S3 connector how its supporting
iceberg and reuse existing parsing logic that's already there"

The Problem

Companies migrating from BigQuery to Iceberg on GCS could not
ingest Iceberg table metadata into OpenMetadata. The Datalake
connector already supported GCS and already had Iceberg JSON
parsing logic — but two precise bugs prevented them from
working together.

Bug 1 — Wrong Table Discovery
DatalakeGcsClient.get_table_names() listed every blob
individually. An Iceberg table orders/ produced 5+ entries:

orders/metadata/v1.metadata.json → treated as separate table
orders/metadata/v2.metadata.json → treated as separate table
orders/data/00000-0-abc.parquet → treated as separate table

Result: 5 bogus tables instead of 1 correct orders table.

Bug 2 — Iceberg Column Parser Never Called
In readers/dataframe/json.py, the raw_data gate only
opened for JSON files containing a "$schema" key.
Iceberg metadata JSON uses "format-version" — so
raw_data was always None, meaning
_is_iceberg_delta_metadata() and
_parse_iceberg_delta_schema() were never reached despite
being already correctly implemented.

Result: Iceberg tables showed garbage columns
(format-version, table-uuid, location)
instead of the actual data schema.

The Fix — 5 Production Files, Zero New Abstractions

All fixes follow existing patterns exactly and reuse
existing parsing logic unchanged.

Fix 1 — `raw_data` Gate (`readers/dataframe/json.py`)

# BEFORE — only JSON Schema files triggered Iceberg parsing:
raw_data = content if data.get("$schema") else None

# AFTER — Iceberg metadata files now also trigger it:
raw_data = (
    content if isinstance(data, dict) and (
        data.get("$schema") is not None           # JSON Schema (existing)
        or data.get("format-version") is not None # Apache Iceberg ← NEW
        or (                                       # Delta Lake structure ← NEW
            isinstance(data.get("schema"), dict)
            and isinstance(data.get("schema", {}).get("fields"), list)
        )
    ) else None
)

This single change unlocks the entire Iceberg parsing pipeline
that was already correctly implemented in datalake_utils.py.

Fix 2 — Iceberg Table Directory Detection (`gcs.py` + `s3.py`)

Added _get_iceberg_tables() to both DatalakeGcsClient
and DatalakeS3Client. Detects Iceberg table directories
by scanning for */metadata/v*.metadata.json pattern,
keeps only the latest version per table directory, and
yields one entry per Iceberg table instead of individual blobs.

Non-Iceberg buckets fall through to the original listing
behavior — zero breaking changes.

Fix 3 — Table Name + Type (`datalake_utils.py` + `metadata.py`)

Added get_iceberg_table_name_from_metadata_path() which
extracts the directory name from the metadata path:
"warehouse/orders/metadata/v2.metadata.json" → "orders"

standardize_table_name() now uses this for Iceberg paths.
get_tables_name_and_type() now yields TableType.Iceberg
for Iceberg metadata files.

Fix 4 — Fetch Path Preservation (`metadata.py`)

Ensured fetch_dataframe_first_chunk() receives the original
metadata blob path for fetching while the Table entity displays
the clean directory name. Both values flow through the pipeline
correctly.

Complete E2E Flow After Fix

For an Iceberg table warehouse/orders/ on GCS:

Step	Before	After
`get_table_names()`	5 blobs yielded	1 entry: latest metadata path
`standardize_table_name()`	`warehouse/orders/metadata/v2.metadata.json`	`orders`
`TableType`	`Regular`	`Iceberg`
`raw_data` gate	`None` → parser never called	Set → `_is_iceberg_delta_metadata()` called
`get_columns()`	Garbage keys from JSON	Correct schema from `schema.fields`

What Was NOT Changed (Reused 100%)

Following @harshach's direction to reuse existing logic:

_is_iceberg_delta_metadata() — zero changes
_parse_iceberg_delta_schema() — zero changes
_parse_struct_fields() — zero changes
set_google_credentials() — zero changes
DatalakeGcsClient credential initialization — zero changes
No new connector directory created
No schema JSON changes
No frontend changes
No Java changes

Tests — 18 Tests, Zero Infrastructure Required

tests/unit/readers/test_json_reader.py:

tests/unit/source/database/test_iceberg_discovery.py:

New tests added: 18 across 2 files

tests/unit/readers/test_json_reader.py — 4 new tests:

test_raw_data_set_for_iceberg_metadata — gate opens for Iceberg JSON
test_iceberg_columns_parsed_correctly — correct columns extracted
test_raw_data_none_for_regular_json — backward compatibility
test_raw_data_set_for_json_schema — existing behavior preserved

tests/unit/source/database/test_iceberg_discovery.py — 14 new tests:

GCS: table detection, single table yield, multiple tables,
fallback for non-Iceberg, mixed bucket handling
S3: same coverage for parity
Table name extraction: correct names, non-Iceberg returns None
TableType: Iceberg for metadata files, Regular for others

Files Changed

File	Change
`readers/dataframe/json.py`	+12 lines — raw_data gate fix
`datalake/clients/gcs.py`	+35 lines — Iceberg directory detection
`datalake/clients/s3.py`	+38 lines — symmetric S3 fix
`utils/datalake/datalake_utils.py`	+22 lines — table name helper
`datalake/metadata.py`	+6 lines — TableType + fetch path
`tests/unit/readers/test_json_reader.py`	+80 lines — 4 tests
`tests/unit/source/database/test_iceberg_discovery.py`	+305 lines — 14 tests (new file)

Type of Change

Bug fix
New feature

Checklist

I have read the CONTRIBUTING document
My PR title is Fixes #22644: Add Iceberg table support for GCS and S3 Datalake connector
I have commented on my code, particularly in
hard-to-understand areas
For JSON Schema changes: No schema changes were made.
All fixes are within the Python ingestion layer only.
Existing GCS and S3 credential schemas are reused unchanged.
I have added tests around the new logic (16 new tests)
The issue properly describes the goal and this PR
implements it fully following maintainer guidance

Summary by Gitar

Refactored JSON schema parsing:
- Modularized column processing by introducing _parse_column to handle type inference for complex ARRAY and JSON types.
Improved JSON structure merging:
- Extracted _process_unique_json_key to streamline the logic for merging nested structures and _ArrayOfStruct types during metadata discovery.

_{This will update automatically on new commits.}

github-actions · 2026-04-25T21:18:53Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

github-actions · 2026-04-25T21:19:24Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

github-actions · 2026-04-25T21:55:23Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

mohitjeswani01 · 2026-04-25T21:59:38Z

Hi @harshach sir could you please add a safe to test label? thanks you ! 🙏

github-actions · 2026-04-26T14:16:22Z

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

mohitjeswani01 · 2026-04-26T14:43:22Z

thanks @harshach sir i will monitor the checks and work accordingly!🙏

github-actions · 2026-04-26T16:39:41Z

🔴 Playwright Results — 5 failure(s), 12 flaky

✅ 3980 passed · ❌ 5 failed · 🟡 12 flaky · ⏭️ 86 skipped

Shard	Passed	Failed	Flaky	Skipped
✅ Shard 1	299	0	0	4
🟡 Shard 2	751	0	3	8
🟡 Shard 3	744	0	3	7
🔴 Shard 4	767	5	3	18
✅ Shard 5	687	0	0	41
🟡 Shard 6	732	0	3	8

Genuine Failures (failed on all attempts)

❌ Pages/DataContractInheritance.spec.ts › Edit Inherited Contract - Creates new asset contract instead of modifying parent (shard 4)

Error: �[2mexpect(�[22m�[31mlocator�[39m�[2m).�[22mtoContainText�[2m(�[22m�[32mexpected�[39m�[2m)�[22m failed

Locator: getByTestId('contract-title')
Expected substring: �[32m"dp_for_edit_inherited_95e266ca"�[39m
Timeout: 15000ms
Error: element(s) not found

Call log:
�[2m  - Expect "toContainText" with timeout 15000ms�[22m
�[2m  - waiting for getByTestId('contract-title')�[22m

❌ Pages/DataContractInheritance.spec.ts › Delete Button Disabled - Fully inherited contracts cannot be deleted (shard 4)

Error: �[2mexpect(�[22m�[31mlocator�[39m�[2m).�[22mtoContainText�[2m(�[22m�[32mexpected�[39m�[2m)�[22m failed

Locator: getByTestId('contract-title')
Expected substring: �[32m"dp_delete_disabled_0def01e3"�[39m
Timeout: 15000ms
Error: element(s) not found

Call log:
�[2m  - Expect "toContainText" with timeout 15000ms�[22m
�[2m  - waiting for getByTestId('contract-title')�[22m

❌ Pages/DataContractInheritance.spec.ts › Run Validation - Inherited contract validation uses entity-based validation (shard 4)

Error: �[2mexpect(�[22m�[31mlocator�[39m�[2m).�[22mtoContainText�[2m(�[22m�[32mexpected�[39m�[2m)�[22m failed

Locator: getByTestId('contract-title')
Expected substring: �[32m"dp_run_validation_ed1327eb"�[39m
Timeout: 15000ms
Error: element(s) not found

Call log:
�[2m  - Expect "toContainText" with timeout 15000ms�[22m
�[2m  - waiting for getByTestId('contract-title')�[22m

❌ Pages/DataContractInheritance.spec.ts › Remove Asset - Inherited contract no longer shown when asset is removed from Data Product (shard 4)

Error: �[2mexpect(�[22m�[31mlocator�[39m�[2m).�[22mtoContainText�[2m(�[22m�[32mexpected�[39m�[2m)�[22m failed

Locator: getByTestId('contract-title')
Expected substring: �[32m"dp_remove_asset_cf5f3e5e"�[39m
Timeout: 15000ms
Error: element(s) not found

Call log:
�[2m  - Expect "toContainText" with timeout 15000ms�[22m
�[2m  - waiting for getByTestId('contract-title')�[22m

❌ Pages/DataContractInheritance.spec.ts › Delete Asset Contract - Falls back to showing inherited contract from Data Product (shard 4)

Error: �[2mexpect(�[22m�[31mlocator�[39m�[2m).�[22mtoContainText�[2m(�[22m�[32mexpected�[39m�[2m)�[22m failed

Locator: getByTestId('contract-title')
Expected substring: �[32m"dp_delete_fallback_ad1070c6"�[39m
Timeout: 15000ms
Error: element(s) not found

Call log:
�[2m  - Expect "toContainText" with timeout 15000ms�[22m
�[2m  - waiting for getByTestId('contract-title')�[22m

🟡 12 flaky test(s) (passed on retry)

Features/ActivityAPI.spec.ts › Activity event is created when description is updated (shard 2, 1 retry)
Features/ActivityAPI.spec.ts › Activity event is created when owner is added (shard 2, 1 retry)
Features/Glossary/GlossaryWorkflow.spec.ts › should start term as Draft when glossary has reviewers (shard 2, 2 retries)
Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
Features/Workflows/WorkflowOssRestrictions.spec.ts › execution history tab loads and API call succeeds (shard 3, 1 retry)
Flow/PersonaFlow.spec.ts › Set default persona for team should work properly (shard 3, 1 retry)
Pages/CustomProperties.spec.ts › Should clear search and show all properties for apiCollection in right panel (shard 4, 1 retry)
Pages/CustomProperties.spec.ts › Set dateTime-cp custom property on column and verify in UI (shard 4, 1 retry)
Pages/DataContracts.spec.ts › Create Data Contract and validate for Directory (shard 4, 1 retry)
Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
Pages/UserDetails.spec.ts › Admin user can edit teams from the user profile (shard 6, 1 retry)

📦 Download artifacts

How to debug locally

# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

mohitjeswani01 · 2026-04-26T20:28:41Z

Quality Gate failed for 'open-metadata-ingestion'

Failed conditions E Security Review Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

oops the sonarqube failed i will check it shortly !

…data#22644)

…tadata#22644) - Integer version comparison (v10 > v9) via regex group capture - Single-pass listing: eliminates double bucket scan for non-Iceberg buckets - Mixed buckets: regular files outside Iceberg dirs are now yielded - Removes extra head_object/get_blob API calls (use listing size directly) - Fix get_tables_name_and_type return type annotation to 5-tuple - Update tests: remove _get_iceberg_tables direct calls, add v10 regression

…uality gate

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

… tests per review

gitar-bot · 2026-04-30T10:08:57Z

Code Review 👍 Approved with suggestions 7 resolved / 8 findings

Adds Iceberg table support for GCS and S3 Datalake connectors, resolving memory bloat, redundant API calls, and versioning bugs. Refactor test classes to move away from unittest.TestCase to align with project guidelines.

💡 Quality: Test classes inherit unittest.TestCase contrary to guidelines

📄 ingestion/tests/unit/readers/test_json_reader.py:264-278

The new tests in test_json_reader.py (lines 264-343) are added inside a unittest.TestCase class, and the new test file test_iceberg_discovery.py uses plain classes (good) but the existing test file uses self.assertEqual patterns. The project's testing guidelines state: 'Use pytest. Use plain assert statements. No unittest.TestCase inheritance.'

The new tests in test_iceberg_discovery.py correctly use plain assert — good. However, the 4 new tests added to test_json_reader.py are methods on an existing unittest.TestCase subclass, which is pre-existing and out of scope for this PR, but the new test methods themselves do use plain assert instead of self.assert*, which is a minor inconsistency within the TestCase class (mixing styles).

✅ 7 resolved

✅ Bug: Lexicographic version comparison fails for v10+ metadata

📄 ingestion/src/metadata/ingestion/source/database/datalake/clients/gcs.py:140 📄 ingestion/src/metadata/ingestion/source/database/datalake/clients/s3.py:90
The Iceberg "latest metadata" detection uses lexicographic string comparison (key_name > existing / blob.name > existing) to determine the highest version. This breaks for version numbers ≥ 10 because v9.metadata.json > v10.metadata.json lexicographically. The GCS docstring even acknowledges this ("v10 > v9 in padded form; for typical low version counts this is correct") but the assumption is fragile — production Iceberg tables commonly exceed 10 metadata versions as each schema evolution or snapshot creates a new version.

Extract the numeric version and compare as integers instead.

✅ Bug: Mixed Iceberg + regular files: regular tables silently dropped

📄 ingestion/src/metadata/ingestion/source/database/datalake/clients/gcs.py:153-157 📄 ingestion/src/metadata/ingestion/source/database/datalake/clients/s3.py:101-107
When _get_iceberg_tables() returns any results, get_table_names() yields only Iceberg metadata entries and then returns, completely skipping all non-Iceberg files (CSV, Parquet, etc.) in the same bucket/prefix. This means a bucket containing both Iceberg tables and regular data files will lose visibility of all regular tables.

The test test_gcs_mixed_iceberg_and_regular_files documents this as intentional, but it's a data loss issue for users who legitimately have mixed content in a single bucket prefix. Instead of an early return, yield both Iceberg and non-Iceberg entries, filtering out blobs that belong to Iceberg table directories (metadata/, data/ subdirs).

✅ Performance: Double bucket listing for non-Iceberg S3/GCS buckets

📄 ingestion/src/metadata/ingestion/source/database/datalake/clients/s3.py:100 📄 ingestion/src/metadata/ingestion/source/database/datalake/clients/s3.py:109 📄 ingestion/src/metadata/ingestion/source/database/datalake/clients/gcs.py:152 📄 ingestion/src/metadata/ingestion/source/database/datalake/clients/gcs.py:159
get_table_names() calls _get_iceberg_tables() which iterates all objects in the bucket via paginated API calls, then if no Iceberg tables are found, iterates all objects again in the regular listing loop. For large buckets with thousands of objects this doubles the number of LIST API calls (which are also billed per-request on S3/GCS).

Consider doing a single pass: iterate objects once, check each for the Iceberg metadata pattern, collect Iceberg tables and regular files in one loop, then yield appropriately.

✅ Quality: Return type annotation still says 4-tuple, yields 5-tuple

📄 ingestion/src/metadata/ingestion/source/database/datalake/metadata.py:230 📄 ingestion/src/metadata/ingestion/source/database/datalake/metadata.py:278
The get_tables_name_and_type method's return type annotation at line 230 still declares Iterable[Tuple[str, TableType, SupportedTypes, Optional[int]]] but the method now yields a 5-tuple including key_name. This will cause type-checking tools (mypy, pyright) to flag a mismatch and confuses readers about the method's contract.

✅ Performance: Entire bucket listing materialized in memory before yielding

📄 ingestion/src/metadata/ingestion/source/database/datalake/clients/s3.py:110-124 📄 ingestion/src/metadata/ingestion/source/database/datalake/clients/gcs.py:148-162
Both get_table_names implementations (GCS and S3) now collect all objects into two in-memory lists (iceberg_tables dict and regular_files list) before yielding anything. The previous code was a streaming generator that yielded objects one at a time with O(1) memory overhead.

For buckets with millions of objects (common in data lakes), this can cause significant memory pressure. For example, 10M objects × ~200 bytes per tuple ≈ 2 GB of RAM just for the listing.

Consider a two-pass approach or a streaming approach:

First pass: only collect the iceberg_tables dict (small — one entry per Iceberg table) while streaming regular files through a secondary check.

Alternatively, do a prefix-based listing for */metadata/v*.metadata.json first (if the storage API supports it), then list the rest while excluding known Iceberg prefixes.

A simpler partial fix: yield Iceberg entries after the scan, but for regular files, do a second streaming pass instead of accumulating them all in a list. This keeps memory proportional to the number of Iceberg tables (typically small) rather than the total object count.

...and 2 more resolved from earlier reviews

🤖 Prompt for agents

Code Review: Adds Iceberg table support for GCS and S3 Datalake connectors, resolving memory bloat, redundant API calls, and versioning bugs. Refactor test classes to move away from unittest.TestCase to align with project guidelines.

1. 💡 Quality: Test classes inherit unittest.TestCase contrary to guidelines
   Files: ingestion/tests/unit/readers/test_json_reader.py:264-278

   The new tests in `test_json_reader.py` (lines 264-343) are added inside a `unittest.TestCase` class, and the new test file `test_iceberg_discovery.py` uses plain classes (good) but the existing test file uses `self.assertEqual` patterns. The project's testing guidelines state: 'Use pytest. Use plain assert statements. No unittest.TestCase inheritance.'
   
   The new tests in `test_iceberg_discovery.py` correctly use plain `assert` — good. However, the 4 new tests added to `test_json_reader.py` are methods on an existing `unittest.TestCase` subclass, which is pre-existing and out of scope for this PR, but the new test methods themselves do use plain `assert` instead of `self.assert*`, which is a minor inconsistency within the TestCase class (mixing styles).

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

mohitjeswani01 · 2026-04-30T11:16:14Z

Quality Gate failed for 'open-metadata-ingestion'

Failed conditions E Security Review Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

@harshach sir i solved all the problems which appeared in sonarcube but i am unable to notice the exact reason why it is continuously failing could you check that once ? thanks 🙏

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Copilot · 2026-05-01T08:33:00Z

    def get_tables_name_and_type(  # pylint: disable=too-many-branches
        self,
-    ) -> Iterable[Tuple[str, TableType, SupportedTypes, Optional[int]]]:  # noqa: UP006, UP045
+    ) -> Iterable[Tuple[str, TableType, SupportedTypes, Optional[int], str]]:  # noqa: UP006, UP045
        """


DatalakeSource overrides DatabaseServiceSource.get_tables_name_and_type() / yield_table() with incompatible tuple shapes (5-tuple instead of the base 2-tuple). Ingestion runs basedpyright with reportIncompatibleMethodOverride = "error", so this change is very likely to fail type-checking and break the expected contract for the database topology stages.

Recommendation: keep the override signatures compatible with DatabaseServiceSource (return/accept Tuple[str, TableType]) and carry fetch_key via an internal mapping/context (e.g., store table_name -> key_name during discovery) or by using displayName for the human-readable name while keeping name/fetch key stable.

Copilot · 2026-05-01T08:33:01Z

    def yield_table(
        self,
-        table_name_and_type: Tuple[str, TableType, SupportedTypes, Optional[int]],  # noqa: UP006, UP045
+        table_name_and_type: Tuple[str, TableType, SupportedTypes, Optional[int], str],  # noqa: UP006, UP045
    ) -> Iterable[Either[CreateTableRequest]]:


yield_table() now expects a 5-tuple, but DatabaseServiceSource.yield_table() is defined to accept Tuple[str, TableType]. With basedpyright configured to error on incompatible overrides, this signature change is likely to fail type-checking.

Recommendation: keep the override signature compatible and source fetch_key from an internal mapping/context built during get_tables_name_and_type() (or refactor the topology contract to carry a dedicated table-info object).

sonarqubecloud · 2026-05-01T09:29:53Z

Quality Gate failed for 'open-metadata-ingestion'

Failed conditions
E Security Review Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Copilot AI review requested due to automatic review settings April 25, 2026 21:18

mohitjeswani01 requested a review from a team as a code owner April 25, 2026 21:18

Copilot started reviewing on behalf of mohitjeswani01 April 25, 2026 21:18 View session

gitar-bot Bot reviewed Apr 25, 2026

View reviewed changes

Comment thread ingestion/src/metadata/ingestion/source/database/datalake/clients/gcs.py Outdated

gitar-bot Bot reviewed Apr 25, 2026

View reviewed changes

Comment thread ingestion/src/metadata/ingestion/source/database/datalake/clients/gcs.py Outdated

gitar-bot Bot reviewed Apr 25, 2026

View reviewed changes

Comment thread ingestion/src/metadata/ingestion/source/database/datalake/clients/s3.py Outdated

gitar-bot Bot reviewed Apr 25, 2026

View reviewed changes

Comment thread ingestion/src/metadata/ingestion/source/database/datalake/metadata.py

Copilot AI reviewed Apr 25, 2026

View reviewed changes

harshach added the safe to test Add this label to run secure Github workflows on PRs label Apr 26, 2026

harshach temporarily deployed to test April 26, 2026 14:20 — with GitHub Actions Inactive

mohitjeswani01 added 2 commits April 28, 2026 23:36

feat(datalake): add GCS/S3 Iceberg table ingestion support (open-meta…

134c1eb

…data#22644)

mohitjeswani01 force-pushed the feature/22644-iceberg-gcp-support branch from 59033b6 to b7b8904 Compare April 28, 2026 18:12

Copilot AI review requested due to automatic review settings April 28, 2026 18:12

Merge branch 'main' into feature/22644-iceberg-gcp-support

3604a30

Copilot started reviewing on behalf of mohitjeswani01 April 28, 2026 18:12 View session

gitar-bot Bot reviewed Apr 28, 2026

View reviewed changes

Comment thread ingestion/src/metadata/ingestion/source/database/datalake/clients/s3.py Outdated

mohitjeswani01 temporarily deployed to test April 29, 2026 21:12 — with GitHub Actions Inactive

fix(datalake): remove unnecessary list() wrapping to pass SonarQube q…

166dee7

…uality gate

Copilot AI review requested due to automatic review settings April 30, 2026 09:48

Copilot started reviewing on behalf of mohitjeswani01 April 30, 2026 09:48 View session

Merge branch 'main' into feature/22644-iceberg-gcp-support

949389a

Copilot AI reviewed Apr 30, 2026

View reviewed changes

mohitjeswani01 had a problem deploying to test April 30, 2026 10:00 — with GitHub Actions Error

fix(datalake): restore cold storage skip logging and remove duplicate…

4e6e1c6

… tests per review

mohitjeswani01 temporarily deployed to test April 30, 2026 10:20 — with GitHub Actions Inactive

mohitjeswani01 had a problem deploying to test April 30, 2026 10:20 — with GitHub Actions Failure

mohitjeswani01 temporarily deployed to test April 30, 2026 10:20 — with GitHub Actions Inactive

Merge branch 'main' into feature/22644-iceberg-gcp-support

f9497bd

Copilot AI review requested due to automatic review settings May 1, 2026 08:26

Copilot started reviewing on behalf of mohitjeswani01 May 1, 2026 08:27 View session

Copilot AI reviewed May 1, 2026

View reviewed changes

mohitjeswani01 temporarily deployed to test May 1, 2026 08:36 — with GitHub Actions Inactive

Conversation

mohitjeswani01 commented Apr 25, 2026 • edited by gitar-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🧊 Iceberg Table Support for GCS and S3 Datalake Connector

The Problem

The Fix — 5 Production Files, Zero New Abstractions

Fix 1 — raw_data Gate (readers/dataframe/json.py)

Fix 2 — Iceberg Table Directory Detection (gcs.py + s3.py)

Fix 3 — Table Name + Type (datalake_utils.py + metadata.py)

Fix 4 — Fetch Path Preservation (metadata.py)

Complete E2E Flow After Fix

What Was NOT Changed (Reused 100%)

Tests — 18 Tests, Zero Infrastructure Required

Files Changed

Type of Change

Checklist

Summary by Gitar

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

mohitjeswani01 commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

mohitjeswani01 commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔴 Playwright Results — 5 failure(s), 12 flaky

Genuine Failures (failed on all attempts)

Uh oh!

mohitjeswani01 commented Apr 26, 2026

Quality Gate failed for 'open-metadata-ingestion'

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gitar-bot Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mohitjeswani01 commented Apr 30, 2026

Quality Gate failed for 'open-metadata-ingestion'

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI May 1, 2026

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented May 1, 2026

Quality Gate failed for 'open-metadata-ingestion'

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

mohitjeswani01 commented Apr 25, 2026 •

edited by gitar-bot Bot

Loading

Fix 1 — `raw_data` Gate (`readers/dataframe/json.py`)

Fix 2 — Iceberg Table Directory Detection (`gcs.py` + `s3.py`)

Fix 3 — Table Name + Type (`datalake_utils.py` + `metadata.py`)

Fix 4 — Fetch Path Preservation (`metadata.py`)

github-actions Bot commented Apr 26, 2026 •

edited

Loading

gitar-bot Bot commented Apr 30, 2026 •

edited

Loading