Fixes #24798: Prevent avro parser from leaking file content in warning logs#27498
Conversation
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
6b5d865 to
3fbe124
Compare
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
Ready for review. Addressed the gitar-bot feedback — moved the test to unit tests. |
| except Exception as exc: # pylint: disable=broad-except | ||
| logger.debug(traceback.format_exc()) | ||
| logger.warning(f"Unable to parse the avro schema: {exc}") | ||
| logger.warning(f"Unable to parse the avro schema: {type(exc).__name__}") |
There was a problem hiding this comment.
@rajattiwari-stack instead of doing this can you do a special exception handling for type SchemaParseException
because even if you hide the message here, in the debug log the same content will be exposed
3fbe124 to
80635c7
Compare
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
| except avro.errors.SchemaParseException: | ||
| logger.warning("Unable to parse the avro schema: SchemaParseException") |
There was a problem hiding this comment.
⚠️ Bug: SchemaParseException handler drops debug traceback
The new except avro.errors.SchemaParseException blocks don't call logger.debug(traceback.format_exc()), unlike the generic except Exception blocks that follow them. This means when the most common parse failure occurs (SchemaParseException), no traceback is logged even at DEBUG level, making it harder to diagnose parsing issues.
The PR description states "Full exception details remain available at DEBUG level through the existing traceback.format_exc() call", but this is only true for non-SchemaParseException exceptions.
Suggested fix:
except avro.errors.SchemaParseException:
logger.debug(traceback.format_exc())
logger.warning("Unable to parse the avro schema: SchemaParseException")
Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion
Code Review
|
| Compact |
|
Was this helpful? React with 👍 / 👎 | Gitar
|
@ulixius9 Updated — added specific |
Describe your changes:
Fixes #24798
When the avro schema parser fails to parse a file during S3 ingestion, the exception message includes the actual file content. This was being logged at WARNING level via
logger.warning(f"Unable to parse the avro schema: {exc}"), which can leak sensitive data into log storage.Changed warning logs to only include the exception type name instead of the full exception message. Full exception details remain available at DEBUG level through the existing
traceback.format_exc()call.Before:
WARNING - Unable to parse the avro schema: No "type" property: {"secret_key": "super_secret_value_12345"}After:
WARNING - Unable to parse the avro schema: SchemaParseExceptionAdded a unit test to verify warning-level logs do not contain any of the sensitive file content.
Type of change:
Checklist:
Fixes <issue-number>: <short explanation>