fix(ingestion): enhance SSL safety and log sanitization#27719
Conversation
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
7dfaf18 to
0600652
Compare
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
The
Maintainers may need to either update the CI runner's trust store or explicitly allow |
0600652 to
7af9422
Compare
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
7af9422 to
53ed0bf
Compare
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
53ed0bf to
576301e
Compare
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
576301e to
0370c4e
Compare
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
| url = base_url + endpoint | ||
| response = requests.request( | ||
| "POST", url, headers=headers, data=payload, verify=False, timeout=10 | ||
| "POST", url, headers=headers, data=payload, verify=True, timeout=10 |
There was a problem hiding this comment.
@RinZ27 instead of making this hardcoded, can you please make this configurable in the SASConnection.json?
There was a problem hiding this comment.
- SAS Client: Added a configurable
verifySSLproperty to theSASConnection.jsonschema (defaulting totrue). The client now respects this setting instead of using a hardcoded value. - AWS Secrets Manager: Restored the
secret_idin the error log paths. I agree it's essential for identifying misconfigured secret IDs in production logs. - Error Handling: Fixed an issue where
APIErrorwas being used incorrectly (passing a string instead of a dict), and switched toRuntimeErrorfor cleaner exception messages without potential body leakage.
| except ClientError as err: | ||
| logger.debug(traceback.format_exc()) | ||
| logger.error(f"Couldn't get value for secret [{secret_id}]: {err}") | ||
| logger.error(f"Couldn't get value from secrets manager: {err}") |
There was a problem hiding this comment.
why are you changing this? - it is important in case of error for me to know it failed while fetching which secret value maybe I just configured the wrong secret id
0370c4e to
1ce4500
Compare
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
1ce4500 to
32bcc7c
Compare
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
32bcc7c to
83b5fc7
Compare
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
Thanks for the PR. This needs to be updated against the latest Could you please rebase on |
83b5fc7 to
b6a344e
Compare
b982d13 to
b8fc155
Compare
|
b8fc155 to
37d4ff7
Compare
| if "error" in response.keys(): # noqa: SIM118 | ||
| raise APIError(response["error"]) | ||
| if response and isinstance(response, dict) and "error" in response: | ||
| raise APIError({"message": response["error"]}) |
There was a problem hiding this comment.
⚠️ Bug: APIError raised without required 'code' key will crash on .code access
All APIError raise sites in this diff pass {"message": response["error"]} without a "code" key. While the constructor only accesses error["message"], any caller (or logging middleware) that accesses the .code property will trigger a KeyError. The APIError class assumes both "message" and "code" are present in the dict. Every raise site in this file is affected.
Include a 'code' key in every APIError dict to prevent KeyError when .code is accessed. Use the error code from the response if available, or a default.:
raise APIError({"message": response["error"], "code": response.get("errorCode", 0)})
- Apply fix
Check the box to apply the fix or reply for a change | Was this helpful? React with 👍 / 👎
| @@ -111,7 +132,7 @@ def get_views(self, query): | |||
| logger.info(f"{query}") | |||
| response = self.client.post(path=endpoint, data=query, headers=headers) | |||
| if "error" in response.keys(): # noqa: SIM118 | |||
There was a problem hiding this comment.
💡 Bug: get_views still uses unguarded response.keys() pattern
The get_views method at line 134 still uses if "error" in response.keys(): without the response and isinstance(response, dict) guard that was consistently applied to all other methods in this PR. If the response is None or not a dict, this will raise an AttributeError.
Apply the same guard pattern used in all other methods for consistency and safety.:
if response and isinstance(response, dict) and "error" in response:
raise APIError({"message": "Error fetching views from SAS", "code": 0})
- Apply fix
Check the box to apply the fix or reply for a change | Was this helpful? React with 👍 / 👎
| @@ -33,13 +37,14 @@ class SASClient: | |||
| def __init__(self, config: SASConnection): | |||
| self.config: SASConnection = config | |||
| self.auth_token = self.get_token(config.serverHost, config.username, config.password.get_secret_value()) | |||
There was a problem hiding this comment.
💡 Bug: get_token calls _get_verify() before self.config is fully ready
In __init__, line 39 calls self.get_token(...) which internally calls self._get_verify() at line 197. _get_verify() accesses self.config.verifySSL and self.config.sslConfig. While self.config is assigned on line 38 (just before), this ordering dependency is fragile — any reordering of __init__ could break it. More importantly, if get_token is called independently (it's a public-ish method), it silently depends on self.config being set. This is minor but worth noting for maintainability.
Add a safety guard or pass verify as a parameter to get_token to make the dependency explicit.:
def get_token(self, base_url, user, password):
endpoint = "/SASLogon/oauth/token"
payload = {"grant_type": "password", "username": user, "password": password}
headers = {
"Content-type": "application/x-www-form-urlencoded",
"Authorization": SAS_CLI_AUTH_HEADER,
}
url = base_url + endpoint
verify = self._get_verify() if hasattr(self, 'config') else True
...
- Apply fix
Check the box to apply the fix or reply for a change | Was this helpful? React with 👍 / 👎
| if verify_ssl == VerifySSL.ignore: | ||
| return ssl._create_unverified_context() # pylint: disable=protected-access |
There was a problem hiding this comment.
⚠️ Edge Case: Elasticsearch get_ssl_context returns unverified context for no certs
When verify_ssl == VerifySSL.validate (the default) but no ssl_config or certificates are provided, the function falls through to line 143 and returns ssl.create_default_context(). This is correct. However, when verify_ssl == VerifySSL.ignore, it returns ssl._create_unverified_context() which disables hostname and cert verification. This is intentional per the PR but worth noting that VerifySSL.ignore in the Elasticsearch path creates an SSL context that will still attempt TLS (just without verification), whereas in the SAS path it sets verify=False which may skip TLS entirely depending on the HTTP library. The behavior difference between connectors for the same enum value could confuse users.
Was this helpful? React with 👍 / 👎
Code Review
|
| Compact |
|
Was this helpful? React with 👍 / 👎 | Gitar
|
The Python checkstyle failed. Please run You can install the pre-commit hooks with |
|
|



Describe your changes:
The ingestion logic for SAS and Elasticsearch previously defaulted to insecure behaviors by disabling SSL verification (
verify=Falseor unverified contexts). This update restores standard certificate validation to ensure production environments are protected against man-in-the-middle attacks by default.Key improvements:
verifySSL=true.verifySSLproperty to bothSASConnectionandElasticSearchConnectionschemas. This allows users to explicitly toggle certificate validation if they are using self-signed certificates in non-production or restricted environments.These changes prioritize safety while providing the necessary escape hatches for specific deployment scenarios.
Type of change:
Checklist:
Fixes :Summary by Gitar
verifySSLparameter tosasConnection.jsonandelasticSearchConnection.jsonto allow toggling certificate validation.test_sas_client.pyto cover connection initialization, SSL verification toggling, and robust error handling scenarios.This will update automatically on new commits.