Skip to content

[WIP] SAP-success factors#27350

Closed
varun-lakhyani wants to merge 0 commit intoopen-metadata:mainfrom
varun-lakhyani:sap-success-factors
Closed

[WIP] SAP-success factors#27350
varun-lakhyani wants to merge 0 commit intoopen-metadata:mainfrom
varun-lakhyani:sap-success-factors

Conversation

@varun-lakhyani
Copy link
Copy Markdown
Member

Describe your changes:

Fixes

I worked on ... because ...

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

@varun-lakhyani varun-lakhyani requested review from a team as code owners April 14, 2026 10:24
@varun-lakhyani varun-lakhyani added the safe to test Add this label to run secure Github workflows on PRs label Apr 14, 2026
@varun-lakhyani varun-lakhyani changed the title SAP-success factors [WIP] SAP-success factors Apr 14, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 14, 2026

Jest test Coverage

UI tests summary

Lines Statements Branches Functions
Coverage: 64%
64.27% (59560/92662) 43.76% (31031/70905) 46.94% (9363/19943)

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 14, 2026

🟡 Playwright Results — all passed (16 flaky)

✅ 3145 passed · ❌ 0 failed · 🟡 16 flaky · ⏭️ 205 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 2 640 0 2 32
🟡 Shard 3 649 0 2 26
🟡 Shard 4 620 0 2 47
🟡 Shard 5 605 0 2 67
🟡 Shard 6 631 0 8 33
🟡 16 flaky test(s) (passed on retry)
  • Features/BulkEditEntity.spec.ts › Glossary (shard 2, 1 retry)
  • Features/DataQuality/BundleSuiteBulkOperations.spec.ts › Bulk selection operations (shard 2, 1 retry)
  • Features/Permissions/GlossaryPermissions.spec.ts › Team-based permissions work correctly (shard 3, 1 retry)
  • Flow/ExploreDiscovery.spec.ts › Should display deleted assets when showDeleted is checked and deleted is not present in queryFilter (shard 3, 1 retry)
  • Pages/Customproperties-part2.spec.ts › entityReferenceList shows item count, scrollable list, no expand toggle (shard 4, 1 retry)
  • Pages/Entity.spec.ts › Tag Add, Update and Remove for child entities (shard 4, 2 retries)
  • Pages/ExploreTree.spec.ts › Verify Database and Database Schema available in explore tree (shard 5, 1 retry)
  • Pages/Glossary.spec.ts › Add and Remove Assets (shard 5, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/ODCSImportExport.spec.ts › Import ODCS with security/roles (shard 6, 1 retry)
  • Pages/ODCSImportExport.spec.ts › Multi-object ODCS contract - object selector shows all schema objects (shard 6, 1 retry)
  • Pages/ProfilerConfigurationPage.spec.ts › Non admin user (shard 6, 1 retry)
  • Pages/Teams.spec.ts › Teams Page Flow (shard 6, 1 retry)
  • Pages/Users.spec.ts › Permissions for table details page for Data Consumer (shard 6, 1 retry)
  • Pages/Users.spec.ts › Check permissions for Data Steward (shard 6, 1 retry)
  • VersionPages/EntityVersionPages.spec.ts › Directory (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Comment thread ingestion/src/metadata/ingestion/source/database/sapsuccessfactors/client.py Outdated
Comment thread ingestion/src/metadata/ingestion/source/database/sapsuccessfactors/client.py Outdated
Comment thread ingestion/src/metadata/ingestion/source/database/sapsuccessfactors/client.py Outdated
@sonarqubecloud
Copy link
Copy Markdown

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 17, 2026

Wrapping up

Code Review ⚠️ Changes requested 3 resolved / 6 findings

Integrates SAP SuccessFactors connectivity while resolving attribute errors and navigation property resolution bugs. Requires remediation for critical XML injection vulnerabilities in SAML assertions, inadequate OAuth2 TTL buffer handling, and schema validation gaps.

⚠️ Security: SAML assertion built via string interpolation without XML escaping

📄 ingestion/src/metadata/ingestion/source/database/sapsuccessfactors/client.py:191-201 📄 ingestion/src/metadata/ingestion/source/database/sapsuccessfactors/client.py:204-216 📄 ingestion/src/metadata/ingestion/source/database/sapsuccessfactors/client.py:241-248

In _build_assertion_elem and its helper methods (_issuer_xml, _subject_xml, etc.), user-supplied configuration values like client_id, username, and token_url are interpolated directly into raw XML strings using f-strings. If any of these values contain XML special characters (<, >, &, ", '), the resulting XML will be malformed or could inject unexpected elements.

While these values come from service configuration (limiting the attack surface), this is still a fragile and insecure pattern — especially for username and token_url which are user-provided inputs. Malformed XML will cause cryptic lxml parsing failures rather than a clear validation error.

Suggested fix
from xml.sax.saxutils import escape, quoteattr

# In _issuer_xml:
return f"<saml2:Issuer>{escape(client_id)}</saml2:Issuer>"

# In _subject_xml, escape text content and use quoteattr for attributes:
f'<saml2:SubjectConfirmationData NotOnOrAfter="{escape(expire_str)}" Recipient={quoteattr(token_url)}/>'

# Apply escape() to all interpolated values in XML text nodes
# and quoteattr() to all interpolated values in XML attributes.
⚠️ Edge Case: OAuth2 token TTL < 300s causes re-auth on every request

📄 ingestion/src/metadata/ingestion/source/database/sapsuccessfactors/client.py:136-139 📄 ingestion/src/metadata/ingestion/source/database/sapsuccessfactors/client.py:153-161

In _setup_oauth2, the token expiry is calculated as ttl - _TOKEN_REFRESH_BUFFER_SECONDS where the buffer is 300 seconds. If the OAuth2 server returns an expires_in value ≤ 300 (e.g., a short-lived token of 60 seconds), _token_expiry will be set to a time in the past. This causes _refresh_token_if_needed() to re-authenticate before every API call, creating a request storm against the token endpoint.

Additionally, the retry in _get() for HTTP 401 would compound this by triggering yet another token fetch per failed request.

Suggested fix
ttl = int(payload.get("expires_in", _OAUTH2_DEFAULT_TTL_SECONDS))
buffer = min(_TOKEN_REFRESH_BUFFER_SECONDS, ttl // 2)
self._token_expiry = datetime.now(timezone.utc) + timedelta(
    seconds=ttl - buffer
)
💡 Quality: Connection schema lacks conditional required fields per auth type

📄 openmetadata-spec/src/main/resources/json/schema/entity/services/connections/database/sapSuccessFactorsConnection.json:119

The JSON schema for sapSuccessFactorsConnection.json only requires baseUrl and companyId. However, BasicAuth needs username and password, while OAuth2Credentials needs clientId, privateKey, tokenUrl, and username. Without conditional validation (e.g., using if/then or oneOf), users can save an incomplete configuration that will fail at runtime with an obscure AttributeError or NoneType error rather than a clear validation message.

Other connectors in this project (e.g., sapErpConnection) use oneOf patterns for conditional auth requirements — this connector should follow the same pattern.

✅ 3 resolved
Bug: OAuth2 path crashes with AttributeError if privateKey is None

📄 ingestion/src/metadata/ingestion/source/database/sapsuccessfactors/client.py:276 📄 ingestion/src/metadata/ingestion/source/database/sapsuccessfactors/client.py:113 📄 ingestion/src/metadata/ingestion/source/database/sapsuccessfactors/client.py:105 📄 openmetadata-spec/src/main/resources/json/schema/entity/services/connections/database/sapSuccessFactorsConnection.json:119
When authType is OAuth2Credentials, _sign_assertion calls self.config.privateKey.get_secret_value() (line 276) without a None guard. Since the JSON schema does not mark privateKey, clientId, or tokenUrl as required, a misconfigured connection will produce an unhelpful AttributeError: 'NoneType' object has no attribute 'get_secret_value' instead of a clear validation error.

Similarly, _setup_oauth2 calls str(self.config.tokenUrl) (line 113) which would produce "None" as a string URL if tokenUrl is not set, leading to a confusing HTTP error rather than a clear config validation failure.

For BasicAuth, username being None would produce "None@companyId" (line 105).

Edge Case: Navigation property target_entity_set resolution uses document order

📄 ingestion/src/metadata/ingestion/source/database/sapsuccessfactors/client.py:628-642
In _resolve_navigation_properties (line 638), the target entity set is always assigned to_end (the second <End> in document order from the <AssociationSet>). However, the two <End> elements in an <AssociationSet> are not guaranteed to appear in any particular order relative to FromRole/ToRole. If the document happens to list the "from" end second, the resolved target_entity_set will point to the wrong entity set, producing incorrect foreign key references.

The code even has a comment acknowledging this: "use document order as fallback" (line 637), but it's actually the only logic path — there's no attempt to match ToRole against the <End Role="..."> attributes.

Quality: Typo: class name '_AssertionContext' should be '_AssertionContext'

📄 ingestion/src/metadata/ingestion/source/database/sapsuccessfactors/client.py:61 📄 ingestion/src/metadata/ingestion/source/database/sapsuccessfactors/client.py:181 📄 ingestion/src/metadata/ingestion/source/database/sapsuccessfactors/client.py:186 📄 ingestion/src/metadata/ingestion/source/database/sapsuccessfactors/client.py:271
The class at line 61 is named _AssertionContext but the correct English word is "Assertion" (not "Assertion"). This typo propagates to type hints and variable names throughout the SAML assertion builder code.

🤖 Prompt for agents
Code Review: Integrates SAP SuccessFactors connectivity while resolving attribute errors and navigation property resolution bugs. Requires remediation for critical XML injection vulnerabilities in SAML assertions, inadequate OAuth2 TTL buffer handling, and schema validation gaps.

1. ⚠️ Security: SAML assertion built via string interpolation without XML escaping
   Files: ingestion/src/metadata/ingestion/source/database/sapsuccessfactors/client.py:191-201, ingestion/src/metadata/ingestion/source/database/sapsuccessfactors/client.py:204-216, ingestion/src/metadata/ingestion/source/database/sapsuccessfactors/client.py:241-248

   In `_build_assertion_elem` and its helper methods (`_issuer_xml`, `_subject_xml`, etc.), user-supplied configuration values like `client_id`, `username`, and `token_url` are interpolated directly into raw XML strings using f-strings. If any of these values contain XML special characters (`<`, `>`, `&`, `"`, `'`), the resulting XML will be malformed or could inject unexpected elements.
   
   While these values come from service configuration (limiting the attack surface), this is still a fragile and insecure pattern — especially for `username` and `token_url` which are user-provided inputs. Malformed XML will cause cryptic lxml parsing failures rather than a clear validation error.

   Suggested fix:
   from xml.sax.saxutils import escape, quoteattr
   
   # In _issuer_xml:
   return f"<saml2:Issuer>{escape(client_id)}</saml2:Issuer>"
   
   # In _subject_xml, escape text content and use quoteattr for attributes:
   f'<saml2:SubjectConfirmationData NotOnOrAfter="{escape(expire_str)}" Recipient={quoteattr(token_url)}/>'
   
   # Apply escape() to all interpolated values in XML text nodes
   # and quoteattr() to all interpolated values in XML attributes.

2. ⚠️ Edge Case: OAuth2 token TTL < 300s causes re-auth on every request
   Files: ingestion/src/metadata/ingestion/source/database/sapsuccessfactors/client.py:136-139, ingestion/src/metadata/ingestion/source/database/sapsuccessfactors/client.py:153-161

   In `_setup_oauth2`, the token expiry is calculated as `ttl - _TOKEN_REFRESH_BUFFER_SECONDS` where the buffer is 300 seconds. If the OAuth2 server returns an `expires_in` value ≤ 300 (e.g., a short-lived token of 60 seconds), `_token_expiry` will be set to a time in the past. This causes `_refresh_token_if_needed()` to re-authenticate before *every* API call, creating a request storm against the token endpoint.
   
   Additionally, the retry in `_get()` for HTTP 401 would compound this by triggering yet another token fetch per failed request.

   Suggested fix:
   ttl = int(payload.get("expires_in", _OAUTH2_DEFAULT_TTL_SECONDS))
   buffer = min(_TOKEN_REFRESH_BUFFER_SECONDS, ttl // 2)
   self._token_expiry = datetime.now(timezone.utc) + timedelta(
       seconds=ttl - buffer
   )

3. 💡 Quality: Connection schema lacks conditional required fields per auth type
   Files: openmetadata-spec/src/main/resources/json/schema/entity/services/connections/database/sapSuccessFactorsConnection.json:119

   The JSON schema for `sapSuccessFactorsConnection.json` only requires `baseUrl` and `companyId`. However, BasicAuth needs `username` and `password`, while OAuth2Credentials needs `clientId`, `privateKey`, `tokenUrl`, and `username`. Without conditional validation (e.g., using `if`/`then` or `oneOf`), users can save an incomplete configuration that will fail at runtime with an obscure `AttributeError` or `NoneType` error rather than a clear validation message.
   
   Other connectors in this project (e.g., sapErpConnection) use `oneOf` patterns for conditional auth requirements — this connector should follow the same pattern.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@varun-lakhyani varun-lakhyani marked this pull request as draft April 17, 2026 12:53
@github-actions
Copy link
Copy Markdown
Contributor

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant