Skip to content

[DCP - Ingestion]Adds the provenance URL#510

Merged
gmechali merged 7 commits into
datacommonsorg:masterfrom
gmechali:provUrl
May 19, 2026
Merged

[DCP - Ingestion]Adds the provenance URL#510
gmechali merged 7 commits into
datacommonsorg:masterfrom
gmechali:provUrl

Conversation

@gmechali
Copy link
Copy Markdown
Contributor

@gmechali gmechali commented May 19, 2026

This PR modifies the JSONLD exporter to also add the provenance URL on observation nodes, and then reads back the provenance URL from the MCFGraph, to write it in the Spanner Observation Mutation.

I published a new dataflow test template and successfully loaded the undata import in a test spanner DB.

@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented May 19, 2026

Not up to standards ⛔

🔴 Issues 2 medium · 4 minor

Alerts:
⚠ 6 issues (≤ 0 issues of at least minor severity)

Results:
6 new issues

Category Results
Security 1 medium
CodeStyle 4 minor
Complexity 1 medium

View in Codacy

🟢 Metrics 0 complexity · 0 duplication

Metric Results
Complexity 0
Duplication 0

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds provenance URLs to the JSON-LD export by fetching them from the database and including them in the observation graph. Review feedback identifies a significant performance issue where the provenance URL query is executed redundantly for every data chunk, potentially causing full table scans; it is recommended to fetch this data once in the main process. Additionally, the reviewer suggests using expand_id for the URL values and adding a check for empty strings to ensure consistency and data integrity.

Comment thread simple/stats/jsonld_exporter.py Outdated
Comment thread simple/stats/jsonld_exporter.py Outdated
@gmechali gmechali requested review from dwnoble and vish-cs May 19, 2026 16:53
@gmechali gmechali requested a review from keyurva May 19, 2026 17:36
Copy link
Copy Markdown
Contributor

@keyurva keyurva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Gabe!

Comment thread simple/stats/jsonld_exporter.py Outdated
Comment thread util/src/main/java/org/datacommons/util/GraphUtils.java Outdated
Comment thread util/src/main/java/org/datacommons/util/GraphUtils.java Outdated
@gmechali gmechali merged commit 9cbfdc0 into datacommonsorg:master May 19, 2026
5 of 6 checks passed
@gmechali gmechali deleted the provUrl branch May 19, 2026 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants