Skip to content

Commit bb0daa1

Browse files
harshachgithub-actions[bot]aniketkatkar97github-advanced-security[bot]Copilot
authored
RDF, cleanup relations and remove unnecessary bindings, add distributed mode for RDF reindex (#26902)
* RDF, cleanup relations and remove unnecessary bindings, add distributed mode for RDF reindex * Update generated TypeScript types * Address comments from copilot * Update generated TypeScript types * fix test issues * Fix minor UI bugs * Add the missing filters * Fix RDF export API error * Add export functionality * Fix ui-checkstyle * Fix java checkstyle * Fix unit tests * Fix and increase the coverage for KnowledgeGraph.spec.ts * Fix tests * Remove rdf as default in playwright and local docker * fix ui-checkstyle * Address comments * Potential fix for pull request finding 'CodeQL / Artifact poisoning' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * Address copilot comments * Address copilot comments * FIx tests * FIx docker * Update openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/rdf/distributed/DistributedRdfIndexCoordinator.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Address copilot review comments: license headers, JSON escaping, type safety, border-color, stop semantics Agent-Logs-Url: https://github.com/open-metadata/OpenMetadata/sessions/c026e52e-162b-4c9a-9874-43791d4aaac1 Co-authored-by: harshach <38649+harshach@users.noreply.github.com> * Show error toast for unsupported export format in KnowledgeGraph Agent-Logs-Url: https://github.com/open-metadata/OpenMetadata/sessions/c026e52e-162b-4c9a-9874-43791d4aaac1 Co-authored-by: harshach <38649+harshach@users.noreply.github.com> * Fix docker * Fix docker for playwright * Fix docker for playwright * Fix tests * Fix tests * Fix docker * Fix docker * Fix glossary and pagination spec flakiness * update the missing translations * Fix docker * Fix docker * Fix integration test * Fix fuseki not starting * Fixed the run local docker script * worked on comments * Fix flakiness in knowledge graph tests * Fix checkstyle --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Aniket Katkar <aniketkatkar97@gmail.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: harshach <38649+harshach@users.noreply.github.com>
1 parent e873c9d commit bb0daa1

99 files changed

Lines changed: 8090 additions & 1502 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.dockerignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
docker/development/docker-volume
2+
docker/docker-compose-quickstart/docker-volume

.github/actions/setup-openmetadata-test-environment/action.yml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,10 @@ inputs:
1515
description: Arguments to pass to run_local_docker.sh
1616
required: false
1717
default: "-m no-ui -d mysql" # Use "-d postgresql" for postgres and Opensearch
18+
startup-script:
19+
description: Startup script used to launch the local OpenMetadata test environment
20+
required: false
21+
default: "./docker/run_local_docker.sh"
1822
ingestion_dependency:
1923
description: Ingestion dependency to pass to run_local_docker.sh
2024
required: false
@@ -97,4 +101,4 @@ runs:
97101
timeout_minutes: 60
98102
max_attempts: 2
99103
retry_on: error
100-
command: ./docker/run_local_docker.sh ${{ inputs.args }}
104+
command: ${{ inputs.startup-script }} ${{ inputs.args }}
Lines changed: 224 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,224 @@
1+
# Copyright 2021 Collate
2+
# Licensed under the Apache License, Version 2.0 (the "License");
3+
# you may not use this file except in compliance with the License.
4+
# You may obtain a copy of the License at
5+
# http://www.apache.org/licenses/LICENSE-2.0
6+
# Unless required by applicable law or agreed to in writing, software
7+
# distributed under the License is distributed on an "AS IS" BASIS,
8+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
9+
# See the License for the specific language governing permissions and
10+
# limitations under the License.
11+
12+
name: Postgresql PR Knowledge Graph E2E Tests
13+
on:
14+
workflow_dispatch:
15+
pull_request:
16+
types:
17+
- opened
18+
- synchronize
19+
- reopened
20+
- ready_for_review
21+
paths:
22+
- ".github/actions/setup-openmetadata-test-environment/action.yml"
23+
- ".github/workflows/playwright-knowledge-graph-postgresql-e2e.yml"
24+
- "docker/run_local_docker.sh"
25+
- "docker/run_local_docker_common.sh"
26+
- "docker/run_local_docker_rdf.sh"
27+
- "docker/validate_compose.py"
28+
- "docker/development/docker-compose-fuseki.yml"
29+
- "docker/development/docker-compose-postgres-fuseki.yml"
30+
- "docs/rdf-local-development.md"
31+
- "openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/rdf/**"
32+
- "openmetadata-service/src/main/java/org/openmetadata/service/rdf/**"
33+
- "openmetadata-service/src/main/java/org/openmetadata/service/resources/rdf/**"
34+
- "openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/rdf/**"
35+
- "openmetadata-service/src/test/java/org/openmetadata/service/rdf/**"
36+
- "openmetadata-service/src/test/java/org/openmetadata/service/resources/rdf/**"
37+
- "openmetadata-spec/src/main/resources/rdf/**"
38+
- "openmetadata-ui/src/main/resources/ui/playwright/e2e/Features/KnowledgeGraph.spec.ts"
39+
- "openmetadata-ui/src/main/resources/ui/playwright.config.ts"
40+
- "openmetadata-ui/src/main/resources/ui/src/components/KnowledgeGraph/**"
41+
- "openmetadata-ui/src/main/resources/ui/src/components/OntologyExplorer/**"
42+
- "openmetadata-ui/src/main/resources/ui/src/rest/rdfAPI.ts"
43+
- "openmetadata-ui/src/main/resources/ui/src/types/knowledgeGraph.types.ts"
44+
- "openmetadata-ui/src/main/resources/ui/src/utils/TableUtils.tsx"
45+
46+
permissions:
47+
contents: read
48+
49+
concurrency:
50+
group: playwright-knowledge-graph-pr-postgresql-${{ github.event.pull_request.number || github.run_id }}
51+
cancel-in-progress: true
52+
53+
jobs:
54+
build:
55+
runs-on: ubuntu-latest
56+
if: ${{ github.event_name != 'pull_request' || !github.event.pull_request.draft }}
57+
steps:
58+
- name: Checkout
59+
uses: actions/checkout@v4
60+
with:
61+
ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }}
62+
63+
- name: Setup JDK 21
64+
uses: actions/setup-java@v4
65+
with:
66+
java-version: '21'
67+
distribution: 'temurin'
68+
69+
- name: Cache Maven Dependencies
70+
uses: actions/cache@v4
71+
with:
72+
path: ~/.m2
73+
key: ${{ runner.os }}-maven-${{ hashFiles('**/pom.xml') }}
74+
restore-keys: |
75+
${{ runner.os }}-maven-
76+
77+
- name: Install antlr cli
78+
run: sudo make install_antlr_cli
79+
80+
- name: Build with Maven
81+
run: mvn -DskipTests clean package
82+
83+
- name: Upload Maven build artifact
84+
uses: actions/upload-artifact@v4
85+
with:
86+
name: openmetadata-build
87+
path: openmetadata-dist/target/openmetadata-*.tar.gz
88+
retention-days: 1
89+
90+
playwright-knowledge-graph-postgresql:
91+
needs: [build]
92+
runs-on: ubuntu-latest
93+
if: ${{ !cancelled() && needs.build.result == 'success' && (github.event_name != 'pull_request' || !github.event.pull_request.head.repo.fork) }}
94+
environment: test
95+
steps:
96+
- name: Free Disk Space (Ubuntu)
97+
uses: jlumbroso/free-disk-space@main
98+
with:
99+
tool-cache: false
100+
android: true
101+
dotnet: true
102+
haskell: true
103+
large-packages: false
104+
swap-storage: true
105+
docker-images: false
106+
107+
- name: Checkout
108+
uses: actions/checkout@v4
109+
with:
110+
ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }}
111+
112+
- name: Prepare temporary directory for Maven build artifact
113+
run: mkdir -p "${{ runner.temp }}/openmetadata-build-artifact"
114+
115+
- name: Download Maven build artifact
116+
uses: actions/download-artifact@v4
117+
with:
118+
name: openmetadata-build
119+
path: ${{ runner.temp }}/openmetadata-build-artifact
120+
121+
- name: Copy Maven build artifact into workspace
122+
run: |
123+
mkdir -p openmetadata-dist/target
124+
cp -a "${{ runner.temp }}/openmetadata-build-artifact/." openmetadata-dist/target/
125+
126+
- name: Setup Openmetadata Test Environment
127+
uses: ./.github/actions/setup-openmetadata-test-environment
128+
with:
129+
python-version: "3.10"
130+
args: "-d postgresql -s true"
131+
startup-script: "./docker/run_local_docker_rdf.sh"
132+
ingestion_dependency: "all"
133+
134+
- name: Wait for Fuseki to be healthy
135+
run: |
136+
echo "Verifying Fuseki is healthy before running tests..."
137+
for i in $(seq 1 30); do
138+
if curl -sf "http://localhost:3030/\$/ping" > /dev/null 2>&1; then
139+
echo "Fuseki is healthy"
140+
exit 0
141+
fi
142+
echo "Waiting for Fuseki ($i/30)..."
143+
sleep 10
144+
done
145+
echo "Fuseki failed health check. Container logs:"
146+
docker logs openmetadata-fuseki --tail 100
147+
exit 1
148+
149+
- name: Setup Node.js
150+
uses: actions/setup-node@v4
151+
with:
152+
node-version-file: "openmetadata-ui/src/main/resources/ui/.nvmrc"
153+
154+
- name: Install dependencies
155+
working-directory: openmetadata-ui/src/main/resources/ui/
156+
run: yarn --ignore-scripts --frozen-lockfile
157+
158+
- name: Install Playwright Browsers
159+
run: npx playwright@1.57.0 install chromium --with-deps
160+
161+
- name: Run Knowledge Graph Playwright tests
162+
working-directory: openmetadata-ui/src/main/resources/ui/
163+
run: npx playwright test --project="Knowledge Graph"
164+
env:
165+
PLAYWRIGHT_IS_OSS: true
166+
PLAYWRIGHT_SNOWFLAKE_USERNAME: ${{ secrets.TEST_SNOWFLAKE_USERNAME }}
167+
PLAYWRIGHT_SNOWFLAKE_PASSWORD: ${{ secrets.TEST_SNOWFLAKE_PASSWORD }}
168+
PLAYWRIGHT_SNOWFLAKE_ACCOUNT: ${{ secrets.TEST_SNOWFLAKE_ACCOUNT }}
169+
PLAYWRIGHT_SNOWFLAKE_DATABASE: ${{ secrets.TEST_SNOWFLAKE_DATABASE }}
170+
PLAYWRIGHT_SNOWFLAKE_WAREHOUSE: ${{ secrets.TEST_SNOWFLAKE_WAREHOUSE }}
171+
PLAYWRIGHT_SNOWFLAKE_PASSPHRASE: ${{ secrets.TEST_SNOWFLAKE_PASSPHRASE }}
172+
PLAYWRIGHT_BQ_PRIVATE_KEY: ${{ secrets.TEST_BQ_PRIVATE_KEY }}
173+
PLAYWRIGHT_BQ_PROJECT_ID: ${{ secrets.PLAYWRIGHT_BQ_PROJECT_ID }}
174+
PLAYWRIGHT_BQ_PRIVATE_KEY_ID: ${{ secrets.TEST_BQ_PRIVATE_KEY_ID }}
175+
PLAYWRIGHT_BQ_PROJECT_ID_TAXONOMY: ${{ secrets.TEST_BQ_PROJECT_ID_TAXONOMY }}
176+
PLAYWRIGHT_BQ_CLIENT_EMAIL: ${{ secrets.TEST_BQ_CLIENT_EMAIL }}
177+
PLAYWRIGHT_BQ_CLIENT_ID: ${{ secrets.TEST_BQ_CLIENT_ID }}
178+
PLAYWRIGHT_REDSHIFT_HOST: ${{ secrets.E2E_REDSHIFT_HOST_PORT }}
179+
PLAYWRIGHT_REDSHIFT_USERNAME: ${{ secrets.E2E_REDSHIFT_USERNAME }}
180+
PLAYWRIGHT_REDSHIFT_PASSWORD: ${{ secrets.E2E_REDSHIFT_PASSWORD }}
181+
PLAYWRIGHT_REDSHIFT_DATABASE: ${{ secrets.TEST_REDSHIFT_DATABASE }}
182+
PLAYWRIGHT_METABASE_USERNAME: ${{ secrets.TEST_METABASE_USERNAME }}
183+
PLAYWRIGHT_METABASE_PASSWORD: ${{ secrets.TEST_METABASE_PASSWORD }}
184+
PLAYWRIGHT_METABASE_DB_SERVICE_NAME: ${{ secrets.TEST_METABASE_DB_SERVICE_NAME }}
185+
PLAYWRIGHT_METABASE_HOST_PORT: ${{ secrets.TEST_METABASE_HOST_PORT }}
186+
PLAYWRIGHT_SUPERSET_USERNAME: ${{ secrets.TEST_SUPERSET_USERNAME }}
187+
PLAYWRIGHT_SUPERSET_PASSWORD: ${{ secrets.TEST_SUPERSET_PASSWORD }}
188+
PLAYWRIGHT_SUPERSET_HOST_PORT: ${{ secrets.TEST_SUPERSET_HOST_PORT }}
189+
PLAYWRIGHT_KAFKA_BOOTSTRAP_SERVERS: ${{ secrets.TEST_KAFKA_BOOTSTRAP_SERVERS }}
190+
PLAYWRIGHT_KAFKA_SCHEMA_REGISTRY_URL: ${{ secrets.TEST_KAFKA_SCHEMA_REGISTRY_URL }}
191+
PLAYWRIGHT_GLUE_ACCESS_KEY: ${{ secrets.TEST_GLUE_ACCESS_KEY }}
192+
PLAYWRIGHT_GLUE_SECRET_KEY: ${{ secrets.TEST_GLUE_SECRET_KEY }}
193+
PLAYWRIGHT_GLUE_AWS_REGION: ${{ secrets.TEST_GLUE_AWS_REGION }}
194+
PLAYWRIGHT_GLUE_ENDPOINT: ${{ secrets.TEST_GLUE_ENDPOINT }}
195+
PLAYWRIGHT_GLUE_STORAGE_SERVICE: ${{ secrets.TEST_GLUE_STORAGE_SERVICE }}
196+
PLAYWRIGHT_MYSQL_USERNAME: ${{ secrets.TEST_MYSQL_USERNAME }}
197+
PLAYWRIGHT_MYSQL_PASSWORD: ${{ secrets.TEST_MYSQL_PASSWORD }}
198+
PLAYWRIGHT_MYSQL_HOST_PORT: ${{ secrets.TEST_MYSQL_HOST_PORT }}
199+
PLAYWRIGHT_MYSQL_DATABASE_SCHEMA: ${{ secrets.TEST_MYSQL_DATABASE_SCHEMA }}
200+
PLAYWRIGHT_POSTGRES_USERNAME: ${{ secrets.TEST_POSTGRES_USERNAME }}
201+
PLAYWRIGHT_POSTGRES_PASSWORD: ${{ secrets.TEST_POSTGRES_PASSWORD }}
202+
PLAYWRIGHT_POSTGRES_HOST_PORT: ${{ secrets.TEST_POSTGRES_HOST_PORT }}
203+
PLAYWRIGHT_POSTGRES_DATABASE: ${{ secrets.TEST_POSTGRES_DATABASE }}
204+
PLAYWRIGHT_AIRFLOW_HOST_PORT: ${{ secrets.TEST_AIRFLOW_HOST_PORT }}
205+
PLAYWRIGHT_ML_MODEL_TRACKING_URI: ${{ secrets.TEST_ML_MODEL_TRACKING_URI }}
206+
PLAYWRIGHT_ML_MODEL_REGISTRY_URI: ${{ secrets.TEST_ML_MODEL_REGISTRY_URI }}
207+
PLAYWRIGHT_S3_STORAGE_ACCESS_KEY_ID: ${{ secrets.TEST_S3_STORAGE_ACCESS_KEY_ID }}
208+
PLAYWRIGHT_S3_STORAGE_SECRET_ACCESS_KEY: ${{ secrets.TEST_S3_STORAGE_SECRET_ACCESS_KEY }}
209+
PLAYWRIGHT_S3_STORAGE_END_POINT_URL: ${{ secrets.TEST_S3_STORAGE_END_POINT_URL }}
210+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
211+
212+
- uses: actions/upload-artifact@v4
213+
if: ${{ !cancelled() }}
214+
with:
215+
name: playwright-knowledge-graph-report
216+
path: openmetadata-ui/src/main/resources/ui/playwright/output/playwright-report
217+
retention-days: 5
218+
219+
- name: Clean Up
220+
if: always()
221+
run: |
222+
docker compose -f docker/development/docker-compose-postgres.yml -f docker/development/docker-compose-fuseki.yml down --remove-orphans || true
223+
docker compose -f docker/development/docker-compose-postgres.yml down --remove-orphans || true
224+
sudo rm -rf ${PWD}/docker/development/docker-volume

bootstrap/sql/migrations/native/1.13.0/mysql/schemaChanges.sql

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -129,3 +129,82 @@ SELECT ue.id, re.id, 'user', 'role', 10
129129
FROM user_entity ue, role_entity re
130130
WHERE ue.name = 'mcpapplicationbot'
131131
AND re.name = 'ApplicationBotImpersonationRole';
132+
133+
-- RDF distributed indexing state tables
134+
CREATE TABLE IF NOT EXISTS rdf_index_job (
135+
id VARCHAR(36) NOT NULL,
136+
status VARCHAR(32) NOT NULL,
137+
jobConfiguration JSON NOT NULL,
138+
totalRecords BIGINT NOT NULL DEFAULT 0,
139+
processedRecords BIGINT NOT NULL DEFAULT 0,
140+
successRecords BIGINT NOT NULL DEFAULT 0,
141+
failedRecords BIGINT NOT NULL DEFAULT 0,
142+
stats JSON,
143+
createdBy VARCHAR(256) NOT NULL,
144+
createdAt BIGINT NOT NULL,
145+
startedAt BIGINT,
146+
completedAt BIGINT,
147+
updatedAt BIGINT NOT NULL,
148+
errorMessage TEXT,
149+
PRIMARY KEY (id),
150+
INDEX idx_rdf_index_job_status (status),
151+
INDEX idx_rdf_index_job_created (createdAt DESC)
152+
);
153+
154+
CREATE TABLE IF NOT EXISTS rdf_index_partition (
155+
id VARCHAR(36) NOT NULL,
156+
jobId VARCHAR(36) NOT NULL,
157+
entityType VARCHAR(128) NOT NULL,
158+
partitionIndex INT NOT NULL,
159+
rangeStart BIGINT NOT NULL,
160+
rangeEnd BIGINT NOT NULL,
161+
estimatedCount BIGINT NOT NULL,
162+
workUnits BIGINT NOT NULL,
163+
priority INT NOT NULL DEFAULT 50,
164+
status VARCHAR(32) NOT NULL DEFAULT 'PENDING',
165+
processingCursor BIGINT NOT NULL DEFAULT 0,
166+
processedCount BIGINT NOT NULL DEFAULT 0,
167+
successCount BIGINT NOT NULL DEFAULT 0,
168+
failedCount BIGINT NOT NULL DEFAULT 0,
169+
assignedServer VARCHAR(255),
170+
claimedAt BIGINT,
171+
startedAt BIGINT,
172+
completedAt BIGINT,
173+
lastUpdateAt BIGINT,
174+
lastError TEXT,
175+
retryCount INT NOT NULL DEFAULT 0,
176+
claimableAt BIGINT NOT NULL DEFAULT 0,
177+
PRIMARY KEY (id),
178+
UNIQUE KEY uk_rdf_partition_job_entity_idx (jobId, entityType, partitionIndex),
179+
INDEX idx_rdf_partition_job (jobId),
180+
INDEX idx_rdf_partition_status_priority (status, priority DESC),
181+
INDEX idx_rdf_partition_claimable (jobId, status, claimableAt),
182+
INDEX idx_rdf_partition_assigned_server (jobId, assignedServer),
183+
CONSTRAINT fk_rdf_partition_job FOREIGN KEY (jobId) REFERENCES rdf_index_job(id) ON DELETE CASCADE
184+
);
185+
186+
CREATE TABLE IF NOT EXISTS rdf_reindex_lock (
187+
lockKey VARCHAR(64) NOT NULL,
188+
jobId VARCHAR(36) NOT NULL,
189+
serverId VARCHAR(255) NOT NULL,
190+
acquiredAt BIGINT NOT NULL,
191+
lastHeartbeat BIGINT NOT NULL,
192+
expiresAt BIGINT NOT NULL,
193+
PRIMARY KEY (lockKey)
194+
);
195+
196+
CREATE TABLE IF NOT EXISTS rdf_index_server_stats (
197+
id VARCHAR(36) NOT NULL,
198+
jobId VARCHAR(36) NOT NULL,
199+
serverId VARCHAR(256) NOT NULL,
200+
entityType VARCHAR(128) NOT NULL,
201+
processedRecords BIGINT DEFAULT 0,
202+
successRecords BIGINT DEFAULT 0,
203+
failedRecords BIGINT DEFAULT 0,
204+
partitionsCompleted INT DEFAULT 0,
205+
partitionsFailed INT DEFAULT 0,
206+
lastUpdatedAt BIGINT NOT NULL,
207+
PRIMARY KEY (id),
208+
UNIQUE INDEX idx_rdf_index_server_stats_job_server_entity (jobId, serverId, entityType),
209+
INDEX idx_rdf_index_server_stats_job_id (jobId)
210+
);

0 commit comments

Comments
 (0)