Fixes #25721: Resolve Pinot JSON ingestion crash using custom PinotJSONType#27722
Fixes #25721: Resolve Pinot JSON ingestion crash using custom PinotJSONType#27722mohitjeswani01 wants to merge 3 commits intoopen-metadata:mainfrom
Conversation
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
There was a problem hiding this comment.
Pull request overview
Resolves a Pinot connector sample-data ingestion crash for JSON columns by introducing a Pinot-specific SQLAlchemy TypeDecorator that bypasses SQLAlchemy JSON’s bytes-oriented result processor while still normalizing Pinot driver outputs.
Changes:
- Added
PinotJSONType(TypeDecorator overString) to safely deserialize/normalize Pinot JSON column values. - Updated Pinot type mapping to return
PinotJSONTypeforjsoncolumns. - Registered
PinotJSONTypeinColumnTypeParserand expanded Pinot unit tests to cover JSON normalization + framework wiring.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| ingestion/src/metadata/ingestion/source/database/pinotdb/custom_types.py | Introduces PinotJSONType to normalize Pinot JSON values and avoid SQLAlchemy JSON bytes-processing. |
| ingestion/src/metadata/ingestion/source/database/pinotdb/metadata.py | Switches Pinot json type mapping from types.JSON to PinotJSONType. |
| ingestion/src/metadata/ingestion/source/database/column_type_parser.py | Registers PinotJSONType to resolve to OM JSON at the framework parser layer. |
| ingestion/tests/unit/topology/database/test_pinotdb.py | Adds/extends unit coverage for Pinot JSON normalization behavior and mapping integration. |
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
Hi @harshach and @PubChimps Sir |
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
Hi @PubChimps @harshach sir resolved the merge conflict here successfully could you please review and add a |
Code Review ✅ Approved 1 resolved / 1 findingsCustom PinotJSONType implementation fixes the JSON ingestion crash by correcting the skip logic in pytest. No issues found. ✅ 1 resolved✅ Edge Case: pytest.importorskip skips PinotJSONType tests unnecessarily
OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
| def test_python_type_contract(self): | ||
| assert PinotJSONType().python_type is Any | ||
|
|
| bytes-processor is never invoked. The pinotdb driver always delivers | ||
| JSON values as Python str (multistage engine) or already-deserialized | ||
| containers (single-stage engine). We normalize both shapes here. |
Description:
Fixes #25721
I worked on resolving the
TypeError: expected bytes, str foundthat occurs when ingesting sample data from Apache Pinot tables containing JSON columns.The Problem:
Apache Pinot (specifically the Multistage Engine) returns JSON columns as raw Python
str. However, SQLAlchemy 2.0's built-inJSONtype expects a byte stream for its internal result processor. This leads to a crash during the sample data fetch before the ingestion framework can even process the row.The Fix:
I implemented a custom
PinotJSONTypeusing aTypeDecoratorwithimpl = String.Stringas the base implementation, we completely sidestep SQLAlchemy's default JSON bytes-processor.process_result_valuelogic is designed to handle both raw JSON strings (Multistage Engine) and already-deserialized dictionaries/lists (Single-stage Engine).Falseor empty strings, ensuring they map safely toNonewithout killing the ingestion pipeline.column_type_parser.pywith a guarded import to ensure the Pinot-specific logic is only loaded when the connector is active, maintaining the architectural integrity of the framework.How did you test your changes?
I wrote an exhaustive regression suite in
ingestion/tests/unit/topology/database/test_pinotdb.pycovering 21 new test cases. These verify:column_type_parser.Test Results:
The suite is 100% green with 31/31 tests passed (including the original mappings and new regression cases).
Screenshots:

Successful test run showing exhaustive coverage of Pinot JSON scenarios:
Type of change:
Checklist:
I have read the CONTRIBUTING document.
My PR title is
Fixes #25721: Resolve Pinot JSON ingestion crash using custom PinotJSONTypeI have commented on my code, particularly in hard-to-understand areas.
I have added a test that covers the exact scenario we are fixing. For complex issues, comment the issue number in the test for future reference.
Summary by Gitar
column_type_parser.pyandpinotdb/metadata.pyto include multiple Ruff and PEP-8 compliant linting fixes (noqaannotations, type hint updates).This will update automatically on new commits.