Commit 8020475
committed
* Fix: [OpenLineage] resolve table identity from symlinks facet
* memoize table resolution per event
A dataset is resolved 2-3 times per event (name->fqn map, column lineage,
edge building). Memoize in a per-event LRUCache (cap 1000, reset each
event) so resolution and its warning log run once per dataset. Bounded
and reset every event, so it cannot accumulate or OOM.
* harden symlink facet parsing against malformed events
* restore dotted fallback, scope creation by namespace
* remove entity creation from lineage path
* drop duplicate status warning entries
* align ResolvedTable docstring with read-only resolution
* use raw identities as resolution cache key
* capture database for 3-part dotted names
* fix candidate dedup and column lineage edge cases
* sharpen lineage warnings and harden pipeline fqn lookup
* cover candidate dedup, column lineage edges, and pipeline fqn guard with tests
* bind pipeline_name to a local for basedpyright
* tolerate explicit null in symlink identifier fields
* read context.pipeline via getattr to satisfy basedpyright
* harden malformed-event handling for null namespace and non-dict columnLineage
* make column lineage defensiveness tests non-vacuous
* alias symlink identities in the columnLineage lookup map
(cherry picked from commit 4daaf6d)
1 parent 6ec66dc commit 8020475
3 files changed
Lines changed: 1163 additions & 937 deletions
File tree
- ingestion
- src/metadata/ingestion/source/pipeline/openlineage
- tests/unit/topology/pipeline
0 commit comments