Skip to content

feat(data-quality): Add impact ranking for data quality checks #26658#27346

Open
AshutoshIIT1234 wants to merge 3 commits intoopen-metadata:mainfrom
AshutoshIIT1234:feature/data-quality-check-impact
Open

feat(data-quality): Add impact ranking for data quality checks #26658#27346
AshutoshIIT1234 wants to merge 3 commits intoopen-metadata:mainfrom
AshutoshIIT1234:feature/data-quality-check-impact

Conversation

@AshutoshIIT1234
Copy link
Copy Markdown

Implement Data Quality Checks Impact feature for issue #26658

  • Add scoring model that calculates impact based on:
    • Downstream usage (40% weight)
    • Consumer count (30% weight)
    • Recent incidents (30% weight)
  • Add REST API endpoint for impact ranking
  • Add frontend components to display impact scores
  • Add schema for dataQualityCheckImpact
    This helps prioritize which data quality checks are most critical to fix by ranking them based on their business impact.
    Describe your changes:
    I worked on the Data Quality Checks Impact feature because the issue requested ranking data quality checks by impact to give a clear idea of data quality failure criticality.
    Type of change:
  • [✓] New feature
    Checklist:
  • [✓] I have read the CONTRIBUTING document.
  • [✓] My PR title is Fixes Data Quality Checks Impact #26658: Add impact ranking for data quality checks
  • [✓ ] For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Implement Data Quality Checks Impact feature for issue open-metadata#26658

- Add scoring model that calculates impact based on:
  - Downstream usage (40% weight)
  - Consumer count (30% weight)
  - Recent incidents (30% weight)
- Add REST API endpoint for impact ranking
- Add frontend components to display impact scores
- Add schema for dataQualityCheckImpact

This helps prioritize which data quality checks are most critical to fix
by ranking them based on their business impact.
Copilot AI review requested due to automatic review settings April 14, 2026 09:49
@AshutoshIIT1234 AshutoshIIT1234 requested a review from a team as a code owner April 14, 2026 09:49
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

}

public DataQualityReport getDataQualityReport(
public DataQualityReport getDataQualityReport(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Quality: Indentation damage on pre-existing lines (accidental whitespace changes)

Several pre-existing lines had their leading indentation removed (e.g., getDataQualityReport at line 362, searchDataQualityLineage at line 2787, SubjectContext at line 545, getDataQualityTab at line 74). These appear to be accidental whitespace-only changes that break the consistent indentation style of the codebase. They should be reverted to avoid noisy diffs.

Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an “Impact Ranking” feature for Data Quality checks, spanning backend search/query logic, a new REST endpoint, and a new Data Quality UI tab to display ranked checks.

Changes:

  • UI: Introduces a new Data Quality “Impact” tab and an ImpactRanking table/summary view.
  • Service: Adds a GET /v1/dataQuality/testSuites/dataQualityCheckImpact endpoint and supporting repository/search methods.
  • Spec: Adds a new JSON schema for dataQualityCheckImpact.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 22 comments.

Show a summary per file
File Description
openmetadata-ui/src/main/resources/ui/src/rest/testAPI.ts Adds a client helper and TS types for fetching impact ranking data
openmetadata-ui/src/main/resources/ui/src/pages/DataQuality/DataQualityPage.interface.ts Adds IMPACT tab enum value
openmetadata-ui/src/main/resources/ui/src/pages/DataQuality/DataQualityClassBase.ts Registers the new Impact tab and component
openmetadata-ui/src/main/resources/ui/src/components/DataQuality/ImpactRanking/ImpactRanking.component.tsx New UI component to render impact summaries + ranked table
openmetadata-spec/src/main/resources/json/schema/tests/dataQualityCheckImpact.json New schema intended to define the impact ranking payload
openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java Adds a helper to search test cases for impact ranking
openmetadata-service/src/main/java/org/openmetadata/service/resources/dqtests/TestSuiteResource.java Adds the REST endpoint to fetch impact ranking
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TestSuiteRepository.java Implements ranking/scoring logic and builds the search query

Comment on lines +2832 to +2834
doc.put("downstreamUsage", 0);
doc.put("consumerCount", 0);
results.add(doc);
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

downstreamUsage and consumerCount are hardcoded to 0 (and recentIncidents isn’t populated at all), which makes the impact scoring endpoint always produce 0 scores. These metrics need to be computed/fetched before ranking, otherwise the feature is non-functional.

Copilot uses AI. Check for mistakes.
Comment on lines +388 to +392
for (Map<String, Object> testCase : testCases) {
int downstreamUsage =
((Number) testCase.getOrDefault("downstreamUsage", 0)).intValue();
int consumerCount = ((Number) testCase.getOrDefault("consumerCount", 0)).intValue();
int recentIncidents =
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scoring loop assumes downstreamUsage, consumerCount, and recentIncidents are present in the input docs, but the current retrieval code defaults them to 0. That makes impactScore always 0 and defeats the ranking.

Copilot uses AI. Check for mistakes.
"description": "Fully qualified name of the test suite.",
"$ref": "../type/basic.json#/definitions/fullyQualifiedEntityName"
},
"entityFullyQualifiedName": {
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This schema uses entityFullyQualifiedName, while the UI/API currently refer to entityFQN. Align the property names (and required fields) with the actual API payload so generated models and UI types stay consistent.

Suggested change
"entityFullyQualifiedName": {
"entityFQN": {

Copilot uses AI. Check for mistakes.
Comment on lines +108 to +112
dataIndex: 'testCaseStatus',
key: 'testCaseStatus',
render: (status: string) => (
<Tag color={status === 'Failed' ? 'error' : 'success'}>{status}</Tag>
),
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testCaseStatus can be Aborted or Queued as well as Failed/Success, but this render treats everything except Failed as success. Consider mapping each status to an appropriate tag color (e.g., warning/processing for Queued, default for Aborted) to avoid misleading UI.

Copilot uses AI. Check for mistakes.
}

public getDataQualityTab() {
public getDataQualityTab() {
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method declaration lost its indentation (public getDataQualityTab() is no longer aligned with the rest of the class). This is likely to fail Prettier/lint formatting checks; please reformat to match surrounding style.

Suggested change
public getDataQualityTab() {
public getDataQualityTab() {

Copilot uses AI. Check for mistakes.
Comment on lines +2797 to +2801
.withIndex(Entity.TEST_CASE)
.withQuery(query)
.withFrom(from)
.withSize(size)
.withSortFieldParam("timestamp")
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This request sorts by timestamp on the testCase index, but the test_case mapping doesn’t have a root timestamp field (it’s under testCaseResult.timestamp). This sort (and any range filter on timestamp) won’t behave as intended unless you use the correct index/field path.

Copilot uses AI. Check for mistakes.
Comment on lines +30 to +34
import { getTestCaseDetailPath } from '../../utils/RouterUtils';
import {
DataQualityCheckImpact,
getDataQualityCheckImpact,
} from '../../rest/testAPI';
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RouterUtils/testAPI imports are using an incorrect relative path for this file location, and getTestCaseDetailPath isn’t exported (the codebase has getTestCaseDetailPagePath). This will fail TypeScript compilation until the import paths/symbols are corrected.

Suggested change
import { getTestCaseDetailPath } from '../../utils/RouterUtils';
import {
DataQualityCheckImpact,
getDataQualityCheckImpact,
} from '../../rest/testAPI';
import { getTestCaseDetailPagePath as getTestCaseDetailPath } from '../../../utils/RouterUtils';
import {
DataQualityCheckImpact,
getDataQualityCheckImpact,
} from '../../../rest/testAPI';

Copilot uses AI. Check for mistakes.
Comment on lines +75 to +77
const handleTestCaseClick = (testCaseFQN: string) => {
history.push(getTestCaseDetailPagePath(testCaseFQN));
};
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handleTestCaseClick calls getTestCaseDetailPagePath(...), but that symbol isn’t imported in this file, so this will be a compile-time error. Import the correct RouterUtils helper (and remove the unused one).

Copilot uses AI. Check for mistakes.
Comment on lines +422 to +432
StringBuilder query = new StringBuilder();
query.append("{\"query\": {\"bool\": {\"must\": [");

if (testCaseStatus != null && !testCaseStatus.isEmpty()) {
query.append(String.format("{\"term\": {\"testCaseStatus\": \"%s\"}},", testCaseStatus.toLowerCase()));
}

query.append(
String.format(
"{\"range\": {\"timestamp\": {\"gte\": %d}}}}}", thirtyDaysAgo / 1000));
return query.toString();
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildTestCaseImpactQuery starts a JSON query string but never closes the bool.must array/object properly, so it will generate invalid JSON. Build this query with a JSON builder (or fix the missing closing brackets/braces).

Suggested change
StringBuilder query = new StringBuilder();
query.append("{\"query\": {\"bool\": {\"must\": [");
if (testCaseStatus != null && !testCaseStatus.isEmpty()) {
query.append(String.format("{\"term\": {\"testCaseStatus\": \"%s\"}},", testCaseStatus.toLowerCase()));
}
query.append(
String.format(
"{\"range\": {\"timestamp\": {\"gte\": %d}}}}}", thirtyDaysAgo / 1000));
return query.toString();
JsonArrayBuilder mustClauses = Json.createArrayBuilder();
if (testCaseStatus != null && !testCaseStatus.isEmpty()) {
mustClauses.add(
Json.createObjectBuilder()
.add(
"term",
Json.createObjectBuilder()
.add("testCaseStatus", testCaseStatus.toLowerCase())));
}
mustClauses.add(
Json.createObjectBuilder()
.add(
"range",
Json.createObjectBuilder()
.add(
"timestamp",
Json.createObjectBuilder().add("gte", thirtyDaysAgo / 1000))));
JsonObjectBuilder query =
Json.createObjectBuilder()
.add(
"query",
Json.createObjectBuilder()
.add(
"bool",
Json.createObjectBuilder().add("must", mustClauses)));
return query.build().toString();

Copilot uses AI. Check for mistakes.
Comment on lines +2827 to +2831
"testCaseFullyQualifiedName",
JsonUtils.extractValue(sourceNode.toString(), FULLY_QUALIFIED_NAME));
doc.put("entityFQN", JsonUtils.extractValue(sourceNode.toString(), "entityFQN"));
doc.put("testCaseStatus", JsonUtils.extractValue(sourceNode.toString(), "testCaseStatus"));
doc.put("timestamp", JsonUtils.extractValue(sourceNode.toString(), "timestamp"));
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fields being extracted here (entityFQN, testCaseStatus, timestamp) are not present at the root of the test_case index documents (they’re nested under testCaseResult.*). As a result, these values will be null unless the query/index is adjusted.

Copilot uses AI. Check for mistakes.
- Fix malformed JSON query in buildTestCaseImpactQuery (missing ] bracket)
- Fix wrong function name import (getTestCaseDetailPagePath)
- Fix wrong import path for RouterUtils
- Compute recentIncidents from grouped test case results instead of hardcoded 0
- Fix property name mismatch (entityFQN -> entityFullyQualifiedName)

Fixes open-metadata#26658
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copilot AI review requested due to automatic review settings April 14, 2026 16:20
@AshutoshIIT1234 AshutoshIIT1234 force-pushed the feature/data-quality-check-impact branch from 2741b5d to 1ddfe68 Compare April 14, 2026 16:20
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 9 comments.

Comment on lines +569 to +572
schema = @Schema(implementation = List.class)))
})
public List<?> getDataQualityCheckImpact(
@Context UriInfo uriInfo,
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The endpoint returns List<?> and documents the response as List.class, even though a dedicated DataQualityCheckImpact schema/model was added. Please return a typed List<DataQualityCheckImpact> (or DTO) and update the OpenAPI response annotation to the correct item type so clients can generate proper bindings.

Copilot uses AI. Check for mistakes.
Comment on lines +121 to +123
<Tooltip title={t('message.impact-score-tooltip')}>
<Tag color={getImpactColor(score)}>{getImpactLabel(score)}</Tag>
</Tooltip>
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several i18n keys referenced in this component (e.g. label.impact-ranking, label.impact-score, message.impact-score-tooltip, label.average-impact, label.data-quality-checks-by-impact, message.no-data-quality-checks-found) are not present in src/locale/languages/en-us.json, so the UI will render raw keys. Please add the missing keys (and run the i18n sync).

Copilot uses AI. Check for mistakes.
testCaseFullyQualifiedName: string;
entityFullyQualifiedName: string;
testCaseStatus: TestCaseStatus;
timestamp: number;
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timestamp is declared as a number, but the backend currently extracts it as a string from the search response and returns it in the response map as a JSON string. Either return a numeric timestamp from the API (preferred) or update this type to match the actual response shape.

Suggested change
timestamp: number;
timestamp: string;

Copilot uses AI. Check for mistakes.
query.append("{\"query\": {\"bool\": {\"must\": [");

if (testCaseStatus != null && !testCaseStatus.isEmpty()) {
query.append(String.format("{\"term\": {\"testCaseStatus\": \"%s\"}},", testCaseStatus.toLowerCase()));
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User input (testCaseStatus) is interpolated into the JSON query without escaping. A value containing quotes will break the JSON (and can be used for query injection); escape it (e.g., escapeDoubleQuotes) or build the query via a JSON builder.

Suggested change
query.append(String.format("{\"term\": {\"testCaseStatus\": \"%s\"}},", testCaseStatus.toLowerCase()));
query.append(
String.format(
"{\"term\": {\"testCaseStatus\": \"%s\"}},",
escapeDoubleQuotes(testCaseStatus.toLowerCase())));

Copilot uses AI. Check for mistakes.
query.append("{\"query\": {\"bool\": {\"must\": [");

if (testCaseStatus != null && !testCaseStatus.isEmpty()) {
query.append(String.format("{\"term\": {\"testCaseStatus\": \"%s\"}},", testCaseStatus.toLowerCase()));
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This term filter targets testCaseStatus, but in the TestCase index mapping the field is testCaseResult.testCaseStatus. Using the wrong field path means the status filter will never match any documents.

Suggested change
query.append(String.format("{\"term\": {\"testCaseStatus\": \"%s\"}},", testCaseStatus.toLowerCase()));
query.append(
String.format(
"{\"term\": {\"testCaseResult.testCaseStatus\": \"%s\"}},",
testCaseStatus.toLowerCase()));

Copilot uses AI. Check for mistakes.

query.append(
String.format(
"{\"range\": {\"timestamp\": {\"gte\": %d}}}]}}", thirtyDaysAgo / 1000));
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This range filter targets top-level timestamp, but in the TestCase index mapping the field is testCaseResult.timestamp. Using the wrong field path means the last-30-days filter won’t apply (and may return no results).

Suggested change
"{\"range\": {\"timestamp\": {\"gte\": %d}}}]}}", thirtyDaysAgo / 1000));
"{\"range\": {\"testCaseResult.timestamp\": {\"gte\": %d}}}]}}",
thirtyDaysAgo / 1000));

Copilot uses AI. Check for mistakes.

query.append(
String.format(
"{\"range\": {\"timestamp\": {\"gte\": %d}}}]}}", thirtyDaysAgo / 1000));
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JSON built here is malformed: it opens { "query": { "bool": { "must": [ but the final append only closes with }]}}, missing a final }. This will fail query_filter parsing and the endpoint will return empty results.

Suggested change
"{\"range\": {\"timestamp\": {\"gte\": %d}}}]}}", thirtyDaysAgo / 1000));
"{\"range\": {\"timestamp\": {\"gte\": %d}}}]}}}", thirtyDaysAgo / 1000));

Copilot uses AI. Check for mistakes.
Comment on lines +2824 to +2831
JsonNode sourceNode = JsonUtils.extractValue(jsonNode.toString(), SEARCH_SOURCE);
if (sourceNode != null) {
String testCaseFQN =
JsonUtils.extractValue(sourceNode.toString(), FULLY_QUALIFIED_NAME);
String testCaseId = JsonUtils.extractValue(sourceNode.toString(), ID);
String entityFQN = JsonUtils.extractValue(sourceNode.toString(), "entityFQN");
String testCaseStatus = JsonUtils.extractValue(sourceNode.toString(), "testCaseStatus");
String timestamp = JsonUtils.extractValue(sourceNode.toString(), "timestamp");
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop repeatedly converts JsonNode to string and re-parses it via JsonUtils.extractValue(...). There are existing overloads that operate directly on JsonNode (used elsewhere in this class), which avoids extra parsing and is less error-prone.

Suggested change
JsonNode sourceNode = JsonUtils.extractValue(jsonNode.toString(), SEARCH_SOURCE);
if (sourceNode != null) {
String testCaseFQN =
JsonUtils.extractValue(sourceNode.toString(), FULLY_QUALIFIED_NAME);
String testCaseId = JsonUtils.extractValue(sourceNode.toString(), ID);
String entityFQN = JsonUtils.extractValue(sourceNode.toString(), "entityFQN");
String testCaseStatus = JsonUtils.extractValue(sourceNode.toString(), "testCaseStatus");
String timestamp = JsonUtils.extractValue(sourceNode.toString(), "timestamp");
JsonNode sourceNode = JsonUtils.extractValue(jsonNode, SEARCH_SOURCE);
if (sourceNode != null) {
String testCaseFQN = JsonUtils.extractValue(sourceNode, FULLY_QUALIFIED_NAME);
String testCaseId = JsonUtils.extractValue(sourceNode, ID);
String entityFQN = JsonUtils.extractValue(sourceNode, "entityFQN");
String testCaseStatus = JsonUtils.extractValue(sourceNode, "testCaseStatus");
String timestamp = JsonUtils.extractValue(sourceNode, "timestamp");

Copilot uses AI. Check for mistakes.
Comment on lines +389 to +393
int downstreamUsage =
((Number) testCase.getOrDefault("downstreamUsage", 0)).intValue();
int consumerCount = ((Number) testCase.getOrDefault("consumerCount", 0)).intValue();
int recentIncidents =
((Number) testCase.getOrDefault("recentIncidents", 0)).intValue();
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

downstreamUsage and consumerCount are inputs to the scoring model here, but they’re currently always 0 in the search results (set in searchTestCasesForImpact). That means the 40%/30% weights never affect the score, which is misleading given the feature description; please populate these metrics or remove them from scoring until available.

Copilot uses AI. Check for mistakes.

query.append(
String.format(
"{\"range\": {\"timestamp\": {\"gte\": %d}}}]}}", thirtyDaysAgo / 1000));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 Bug: JSON query still malformed — missing one closing brace

The fix at line 431 changed }}}} to }]}}, which correctly adds the missing ] for the must array. However, it still produces invalid JSON because there are only 2 closing braces after ] when 3 are needed.

The structure opened is: {"query": {"bool": {"must": [ — that's 3 { and 1 [. After the range clause closes its own 3 braces, the remaining closers should be ]}}} (1 ] + 3 }), but the code produces ]}} (1 ] + 2 }).

This means the Elasticsearch query is still malformed and searchTestCasesForImpact will throw a parse error, causing the entire impact ranking feature to return empty results.

Suggested fix:

Change the format string to include the missing brace:

  "{"range": {"timestamp": {"gte": %d}}}]}}}")

Or better yet, use a JSON library to build queries instead of string concatenation to prevent these bracket-matching issues.

Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion

Comment on lines +2858 to +2859
doc.put("downstreamUsage", 0);
doc.put("consumerCount", 0);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Bug: downstreamUsage and consumerCount always 0, crippling score

While recentIncidents is now correctly computed by counting failed runs (lines 2843-2846), downstreamUsage and consumerCount remain hardcoded to 0 (lines 2858-2859). Since the impact score formula is 0.4 * downstream + 0.3 * consumer + 0.3 * incidents, 70% of the score weight is permanently zeroed out. The ranking will only ever reflect recent incident counts, making the multi-factor scoring model misleading — users see weights advertised for downstream/consumer impact that have no effect.

This was flagged in the previous review and is only partially addressed by this commit.

Suggested fix:

Either:
1. Populate `downstreamUsage` and `consumerCount` with real data from lineage/usage APIs, or
2. Remove these factors from the score formula and UI until they are implemented, to avoid confusing users with always-zero metrics.

Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion

Map<String, Object> doc;
if (groupedTestCases.containsKey(testCaseFQN)) {
doc = groupedTestCases.get(testCaseFQN);
int runCount = ((Number) doc.getOrDefault("runCount", 0)).intValue();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Edge Case: ClassCastException risk on getOrDefault with Integer 0

At lines 2840 and 2843, doc.getOrDefault("runCount", 0) and doc.getOrDefault("recentIncidents", 0) use an int literal as the default. Since the map stores Integer values from doc.put("runCount", 1), the cast (Number) works. However, the default 0 is auto-boxed to Integer which is fine, but mixing with the pattern is fragile — if anyone changes the stored value type (e.g., to Long), the cast chain could break. This is minor but worth noting for maintainability.

Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion

@prasanthcp
Copy link
Copy Markdown

As suggested, I think the JsonArrayBuilder or JsonUtils should really help to handle dynamic JSON.

@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 16, 2026

Code Review 🚫 Blocked 4 resolved / 8 findings

Impact ranking for data quality checks is currently blocked by a malformed JSON query and hardcoded zero values for downstream usage metrics. Additionally, accidental indentation changes and a potential ClassCastException require attention.

🚨 Bug: JSON query still malformed — missing one closing brace

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TestSuiteRepository.java:431

The fix at line 431 changed }}}} to }]}}, which correctly adds the missing ] for the must array. However, it still produces invalid JSON because there are only 2 closing braces after ] when 3 are needed.

The structure opened is: {"query": {"bool": {"must": [ — that's 3 { and 1 [. After the range clause closes its own 3 braces, the remaining closers should be ]}}} (1 ] + 3 }), but the code produces ]}} (1 ] + 2 }).

This means the Elasticsearch query is still malformed and searchTestCasesForImpact will throw a parse error, causing the entire impact ranking feature to return empty results.

Suggested fix
Change the format string to include the missing brace:

  "{"range": {"timestamp": {"gte": %d}}}]}}}")

Or better yet, use a JSON library to build queries instead of string concatenation to prevent these bracket-matching issues.
⚠️ Bug: downstreamUsage and consumerCount always 0, crippling score

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java:2858-2859 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TestSuiteRepository.java:395-403

While recentIncidents is now correctly computed by counting failed runs (lines 2843-2846), downstreamUsage and consumerCount remain hardcoded to 0 (lines 2858-2859). Since the impact score formula is 0.4 * downstream + 0.3 * consumer + 0.3 * incidents, 70% of the score weight is permanently zeroed out. The ranking will only ever reflect recent incident counts, making the multi-factor scoring model misleading — users see weights advertised for downstream/consumer impact that have no effect.

This was flagged in the previous review and is only partially addressed by this commit.

Suggested fix
Either:
1. Populate `downstreamUsage` and `consumerCount` with real data from lineage/usage APIs, or
2. Remove these factors from the score formula and UI until they are implemented, to avoid confusing users with always-zero metrics.
💡 Quality: Indentation damage on pre-existing lines (accidental whitespace changes)

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TestSuiteRepository.java:362 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java:2787 📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/dqtests/TestSuiteResource.java:545 📄 openmetadata-ui/src/main/resources/ui/src/pages/DataQuality/DataQualityClassBase.ts:74

Several pre-existing lines had their leading indentation removed (e.g., getDataQualityReport at line 362, searchDataQualityLineage at line 2787, SubjectContext at line 545, getDataQualityTab at line 74). These appear to be accidental whitespace-only changes that break the consistent indentation style of the codebase. They should be reverted to avoid noisy diffs.

💡 Edge Case: ClassCastException risk on getOrDefault with Integer 0

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java:2840 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java:2843

At lines 2840 and 2843, doc.getOrDefault("runCount", 0) and doc.getOrDefault("recentIncidents", 0) use an int literal as the default. Since the map stores Integer values from doc.put("runCount", 1), the cast (Number) works. However, the default 0 is auto-boxed to Integer which is fine, but mixing with the pattern is fragile — if anyone changes the stored value type (e.g., to Long), the cast chain could break. This is minor but worth noting for maintainability.

✅ 4 resolved
Bug: Malformed JSON query — missing ] and } in buildTestCaseImpactQuery

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TestSuiteRepository.java:429-431 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TestSuiteRepository.java:425-427
The buildTestCaseImpactQuery method constructs an Elasticsearch query string with unbalanced braces/brackets. The opening is {"query": {"bool": {"must": [ (3 braces + 1 bracket), but the closing }}}}} provides only 5 braces and no bracket. The correct closing after the range clause should be }}}]}} — 3 braces to close the range object, ] to close the must array, and }} to close bool and query. This will cause a JSON parse error at runtime, making the entire impact ranking endpoint non-functional.

Bug: Wrong function imported — will fail to compile or crash at runtime

📄 openmetadata-ui/src/main/resources/ui/src/components/DataQuality/ImpactRanking/ImpactRanking.component.tsx:30 📄 openmetadata-ui/src/main/resources/ui/src/components/DataQuality/ImpactRanking/ImpactRanking.component.tsx:76
Line 30 imports getTestCaseDetailPath from RouterUtils, but line 76 calls getTestCaseDetailPagePath which is never imported. The correct function getTestCaseDetailPagePath exists in RouterUtils.ts (line 555) but is not imported. This will either fail at build time (TypeScript) or crash at runtime when clicking a test case row.

Bug: Impact scores always based on zeros — feature is non-functional

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java:2832-2834 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TestSuiteRepository.java:388-402
searchTestCasesForImpact hardcodes downstreamUsage and consumerCount to 0 (lines 2832-2833), and recentIncidents is never populated either. Since these are the three inputs to the impact score formula (40%, 30%, 30% weights), every test case will receive an impact score of 0. The feature as shipped will show a table of zeros and provide no ranking value. The backend needs to actually compute these metrics — e.g., by querying lineage for downstream count, querying usage for consumers, and counting recent failed test results.

Quality: Relative import path for RouterUtils is likely wrong

📄 openmetadata-ui/src/main/resources/ui/src/components/DataQuality/ImpactRanking/ImpactRanking.component.tsx:30 📄 openmetadata-ui/src/main/resources/ui/src/components/DataQuality/ImpactRanking/ImpactRanking.component.tsx:32-34
The component is at components/DataQuality/ImpactRanking/ImpactRanking.component.tsx (3 levels deep from src/), but the import uses ../../utils/RouterUtils which resolves to components/utils/RouterUtils — not the correct src/utils/RouterUtils. The path should be ../../../utils/RouterUtils.

🤖 Prompt for agents
Code Review: Impact ranking for data quality checks is currently blocked by a malformed JSON query and hardcoded zero values for downstream usage metrics. Additionally, accidental indentation changes and a potential ClassCastException require attention.

1. 💡 Quality: Indentation damage on pre-existing lines (accidental whitespace changes)
   Files: openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TestSuiteRepository.java:362, openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java:2787, openmetadata-service/src/main/java/org/openmetadata/service/resources/dqtests/TestSuiteResource.java:545, openmetadata-ui/src/main/resources/ui/src/pages/DataQuality/DataQualityClassBase.ts:74

   Several pre-existing lines had their leading indentation removed (e.g., `getDataQualityReport` at line 362, `searchDataQualityLineage` at line 2787, `SubjectContext` at line 545, `getDataQualityTab` at line 74). These appear to be accidental whitespace-only changes that break the consistent indentation style of the codebase. They should be reverted to avoid noisy diffs.

2. 🚨 Bug: JSON query still malformed — missing one closing brace
   Files: openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TestSuiteRepository.java:431

   The fix at line 431 changed `}}}}` to `}]}}`, which correctly adds the missing `]` for the `must` array. However, it still produces invalid JSON because there are only 2 closing braces after `]` when 3 are needed.
   
   The structure opened is: `{"query": {"bool": {"must": [` — that's 3 `{` and 1 `[`. After the range clause closes its own 3 braces, the remaining closers should be `]}}}` (1 `]` + 3 `}`), but the code produces `]}}` (1 `]` + 2 `}`).
   
   This means the Elasticsearch query is still malformed and `searchTestCasesForImpact` will throw a parse error, causing the entire impact ranking feature to return empty results.

   Suggested fix:
   Change the format string to include the missing brace:
   
     "{"range": {"timestamp": {"gte": %d}}}]}}}")
   
   Or better yet, use a JSON library to build queries instead of string concatenation to prevent these bracket-matching issues.

3. ⚠️ Bug: downstreamUsage and consumerCount always 0, crippling score
   Files: openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java:2858-2859, openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TestSuiteRepository.java:395-403

   While `recentIncidents` is now correctly computed by counting failed runs (lines 2843-2846), `downstreamUsage` and `consumerCount` remain hardcoded to 0 (lines 2858-2859). Since the impact score formula is `0.4 * downstream + 0.3 * consumer + 0.3 * incidents`, 70% of the score weight is permanently zeroed out. The ranking will only ever reflect recent incident counts, making the multi-factor scoring model misleading — users see weights advertised for downstream/consumer impact that have no effect.
   
   This was flagged in the previous review and is only partially addressed by this commit.

   Suggested fix:
   Either:
   1. Populate `downstreamUsage` and `consumerCount` with real data from lineage/usage APIs, or
   2. Remove these factors from the score formula and UI until they are implemented, to avoid confusing users with always-zero metrics.

4. 💡 Edge Case: ClassCastException risk on getOrDefault with Integer 0
   Files: openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java:2840, openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java:2843

   At lines 2840 and 2843, `doc.getOrDefault("runCount", 0)` and `doc.getOrDefault("recentIncidents", 0)` use an `int` literal as the default. Since the map stores `Integer` values from `doc.put("runCount", 1)`, the cast `(Number)` works. However, the default `0` is auto-boxed to `Integer` which is fine, but mixing with the pattern is fragile — if anyone changes the stored value type (e.g., to `Long`), the cast chain could break. This is minor but worth noting for maintainability.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Data Quality Checks Impact

3 participants