Skip to content

Fixes #28065: [openlinege] Add job ownership support in ingestion pipeline#28381

Merged
harshach merged 5 commits into
open-metadata:mainfrom
jsingh-yelp:add-ownership-support-for-openlineage-pipelines
Jun 15, 2026
Merged

Fixes #28065: [openlinege] Add job ownership support in ingestion pipeline#28381
harshach merged 5 commits into
open-metadata:mainfrom
jsingh-yelp:add-ownership-support-for-openlineage-pipelines

Conversation

@jsingh-yelp

@jsingh-yelp jsingh-yelp commented May 22, 2026

Copy link
Copy Markdown
Contributor

Describe your changes:

Fixes #28065

Change Summary:

  • This PR adds OpenLineage job ownership support for pipeline ingestion.
  • OpenLineage events can now populate Pipeline owners from the job-level ownership facet. The ingestion flow resolves OpenLineage owner names to existing OpenMetadata users or group teams, supports both replace and append ownership update modes.
  • Updated user/team ownership APIs to efficiently filter existing Pipeline owners.

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

High-level design:

N/A — small change.

Tests:

  • Ran git diff --check
  • Ran Python compile checks for the touched OpenLineage ingestion files/tests
  • Ran mvn -Dmaven.repo.local=/nail/tmp/om-m2 -pl openmetadata-service spotless:apply
  • Manually tested OpenLineage ownership events through Kafka using jaskaran_openlineage_test
  • Verified append mode behavior by sending multiple ownership events for the same job with different teams

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • My PR is linked to a GitHub issue via Fixes #<issue-number> above.
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.
  • For UI changes: I attached a screen recording and/or screenshots above.
  • I have added tests (unit / integration / Playwright as applicable) and listed them above.

Summary by Gitar

  • Ingestion logic refinement:
    • Improved OwnershipResolver to handle None values for include_owners and ownership_update_mode configuration parameters.
    • Added a defensive check in _ensure_pipeline_owner_cache using _owner_cache_loaded flag instead of relying on nullability of dictionaries.
  • Entity API updates:
    • Updated EntityRepository methods to pass ListFilter into listInternal, serializeJsons, and setFieldsInBulk to improve context-aware entity processing.
  • Testing enhancements:
    • Added unit tests to verify None value handling for include_owners and default replace behavior for ownership_update_mode in test_openlineage_ownership.py.

This will update automatically on new commits.

@jsingh-yelp jsingh-yelp requested a review from a team as a code owner May 22, 2026 15:28
@github-actions

Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Comment thread ingestion/tests/unit/topology/pipeline/test_openlineage_ownership.py Outdated
@jsingh-yelp jsingh-yelp force-pushed the add-ownership-support-for-openlineage-pipelines branch 2 times, most recently from 75512f1 to 12d0abd Compare May 22, 2026 15:39
@github-actions

Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

1 similar comment
@github-actions

Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Comment thread ingestion/tests/unit/topology/pipeline/test_openlineage_ownership.py Outdated
@jsingh-yelp jsingh-yelp force-pushed the add-ownership-support-for-openlineage-pipelines branch 2 times, most recently from 21bfb20 to bf415d5 Compare May 22, 2026 16:02
@github-actions

Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

1 similar comment
@github-actions

Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@jsingh-yelp jsingh-yelp force-pushed the add-ownership-support-for-openlineage-pipelines branch from bf415d5 to 8574a04 Compare May 22, 2026 16:30
@github-actions

Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@jsingh-yelp jsingh-yelp force-pushed the add-ownership-support-for-openlineage-pipelines branch from 8574a04 to 50206df Compare May 22, 2026 16:31
@github-actions

Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@jsingh-yelp jsingh-yelp changed the title Add openlineage job ownership support in ingestion pipeline Fixes #28065: Add openlineage job ownership support in ingestion pipeline May 22, 2026
@jsingh-yelp jsingh-yelp changed the title Fixes #28065: Add openlineage job ownership support in ingestion pipeline Fixes #28065: [openlinege] Add openlineage job ownership support in ingestion pipeline May 22, 2026
@jsingh-yelp jsingh-yelp changed the title Fixes #28065: [openlinege] Add openlineage job ownership support in ingestion pipeline Fixes #28065: [openlinege] Add job ownership support in ingestion pipeline May 22, 2026
@harshach harshach added the safe to test Add this label to run secure Github workflows on PRs label May 22, 2026
@github-actions

Copy link
Copy Markdown
Contributor

⚠️ TypeScript Types Need Update

The generated TypeScript types are out of sync with the JSON schema changes.

Since this is a pull request from a forked repository, the types cannot be automatically committed.
Please generate and commit the types manually:

cd openmetadata-ui/src/main/resources/ui
./json2ts-generate-all.sh -l true
git add src/generated/
git commit -m "Update generated TypeScript types"
git push

After pushing the changes, this check will pass automatically.

@github-actions

Copy link
Copy Markdown
Contributor

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

…ntityType filter is set

On the filtered users/teams list path (ownsEntityType / directOwnsOnly),
fetchAndSetFieldsExcept iterated only fieldFetchers and never called
fetchAndSetRelationshipFieldsInBulk. Any field handled solely by the batched
relationship fetch was therefore dropped (Team.domains came back null), and the
relationship fields that the bulk layer normally batches fell back to per-field
N+1 fetches.

Run the batched relationship fetch inside fetchAndSetFieldsExcept and skip both
the excluded fields and the relationship-handled fields. fetchAndSetFields now
delegates to it with an empty exclusion set, so the default path is unchanged.

Add a TeamResourceIT regression test: listing teams with
fields=owns,domains&ownsEntityType=pipeline must keep domains populated and
restrict owns to pipelines.
@sonika-shah

sonika-shah commented May 29, 2026

Copy link
Copy Markdown
Collaborator

@jsingh-yelp pushed a Java-side fix to this branch (b61d956).

Java bug fixed: the new ownsEntityType/directOwnsOnly list path went through fetchAndSetFieldsExcept, which skipped the batched relationship fetch — so GET /v1/teams?fields=owns,domains&ownsEntityType=pipeline returned domains == null, and relationship fields fell back to N+1. Fix: run fetchAndSetRelationshipFieldsInBulk in that helper too + added a TeamResourceIT regression test. (spotless + compile pass; IT not run locally yet.)

Two Python issues worth a look:

  1. append mode drops owners — existing owners are rebuilt only from Group teams + users, so non-Group team owners and bot owners get silently dropped during merge.
  2. unbounded owner cache — it paginates and loads all users + teams into memory upfront → unbounded memory + slow first-event processing on large deployments.

cc @ulixius9 please check when you get time.

@sonarqubecloud

Copy link
Copy Markdown

@sonarqubecloud

Copy link
Copy Markdown

@jsingh-yelp

Copy link
Copy Markdown
Contributor Author

@sonika-shah, @ulixius9

append mode drops owners existing owners are rebuilt only from Group teams + users, so non-Group team owners and bot owners get silently dropped during merge.

Yes, append mode rebuilds existing owners from cached Group teams + users. Non-Group team owners are excluded, but that matches current OpenMetadata validation rules code ref: https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityRepository.java#L7383

it paginates and loads all users + teams into memory upfront → unbounded memory + slow first-event processing on large deployments.

This is intentional for append mode we preload user/team ownership once so existing pipeline owners can be preserved while merging owners from OpenLineage events. This avoids owner lookups on each message, in my view, it should make the overall pipeline a lot faster; we can revisit a lazy/on-demand resolver if first-event cache loading becomes too expensive.

@jsingh-yelp

Copy link
Copy Markdown
Contributor Author

cc: @mohittilala Can you please review this PR? as I see you made some recent openlineage changes

@jsingh-yelp

Copy link
Copy Markdown
Contributor Author

cc: @ulixius9 , @mohittilala Can I please have review on this PR?

@harshach harshach merged commit b60fbb4 into open-metadata:main Jun 15, 2026
63 of 65 checks passed
@gitar-bot

gitar-bot Bot commented Jun 15, 2026

Copy link
Copy Markdown
Code Review ✅ Approved 10 resolved / 10 findings

Integrates OpenLineage job ownership into pipeline ingestion by enabling owner resolution and configurable update modes, addressing previous build and logic issues including missing API field propagation, incorrect cache initialization, and unhandled null configurations.

✅ 10 resolved
Edge Case: Owner name with colon prefix (e.g. ":jdoe") silently resolves

📄 ingestion/src/metadata/ingestion/source/pipeline/openlineage/ownership_resolver.py:90-95 📄 ingestion/src/metadata/ingestion/source/pipeline/openlineage/ownership_resolver.py:101-114
In ownership_resolver.py line 90, raw_owner_name.partition(":") on an input like :jdoe yields owner_type="", separator=":", owner_name="jdoe". Since separator is truthy, it enters the qualified branch and does owner_type = "".lower()"". This doesn't match "team" or "user", so it falls into the else (unqualified) branch at line 106, resolving as if it were unqualified. While the end result is functional, the intent is unclear—it may be better to explicitly handle or warn about a malformed qualified identifier like :name.

Performance: Owner cache loads ALL teams/users, not just pipeline-owning ones

📄 ingestion/src/metadata/ingestion/source/pipeline/openlineage/ownership_resolver.py:162-176 📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/teams/UserResource.java:319-322
In _ensure_pipeline_owner_cache, the resolver calls list_all_entities(entity=Team, ...) with params={"ownsEntityType": "pipeline"} which should filter server-side. However for users it also passes "directOwnsOnly": "true". If the server-side filtering for ownsEntityType is not implemented or returns all entities regardless (since this is new functionality being added in the same PR), the cache will load every user/team in the system into memory. For large deployments this could be significant. Ensure the API actually filters—the UserResource.java change only passes the param into ListFilter but the DAO query needs to handle it.

Quality: New test file uses unittest.TestCase instead of pytest

📄 ingestion/tests/unit/topology/pipeline/test_openlineage_ownership.py:1-2 📄 ingestion/tests/unit/topology/pipeline/test_openlineage_ownership.py:42
The project's testing guidelines require using pytest with plain assert statements and no unittest.TestCase inheritance. The new test_openlineage_ownership.py file uses class OpenLineageOwnershipResolverTest(unittest.TestCase) with self.assertIsNotNone, self.assertEqual, etc. This is inconsistent with project conventions and makes it harder to leverage pytest features (fixtures, parametrize, better assertion introspection).

Bug: ownsEntityType filter added unconditionally even when null

📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/teams/UserResource.java:319 📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/teams/TeamResource.java:238
In UserResource.java line 319, filter.addQueryParam("ownsEntityType", ownsEntityType) is called unconditionally—even when ownsEntityType is null. This differs from how isBot is handled (guarded by if (isBot != null)). If downstream code treats a null-valued query param differently from an absent one, this could cause unexpected behavior (e.g., nullOrEmpty(filter.getQueryParam("ownsEntityType")) returns true for null, so it's likely fine, but it's inconsistent and could break if ListFilter.addQueryParam doesn't handle null gracefully).

Bug: Missing base setFieldsInBulk(Fields, List, ListFilter) causes compile error

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityRepository.java:2153
The change in EntityRepository.java:2153 calls setFieldsInBulk(fields, entities, filter) with 3 arguments, but the base class only defines a 2-parameter overload (setFieldsInBulk(Fields, List<T>)). While UserRepository and TeamRepository define their own 3-parameter versions, these are not overrides of a base method — they are new overloads with a different signature. Since listAll is defined in EntityRepository<T> (not in the subclasses), it calls the method on this which resolves to the base class signature at compile time. This will fail to compile.

You need to add a 3-parameter base method in EntityRepository that delegates to the 2-parameter version by default, so subclasses can override it.

...and 5 more resolved from earlier reviews

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add OpenLineage ownership facet support for pipelines

4 participants