Context center by harshach · Pull Request #27558 · open-metadata/OpenMetadata

harshach · 2026-04-20T22:28:34Z

Summary

Brings the Context Center feature into OpenMetadata core:

Drive pillar — Folder + ContextFile entities, object-storage-backed uploads, deduplicated asset_entity blobs, async delete queue, signed-URL download for S3/Azure providers, and PDF / Office / image OCR / spreadsheet / text extraction via PDFBox, POI, and Tika.
Knowledge pillar — Page entity with Article and QuickLink page types, search-backed hierarchy listing, review workflow, and a UI that ships as a sidebar entry below Govern.
Shared plumbing — new entity schemas (entity/data/{folder,contextFile,contextFileContent,page,article,quickLink,pageHierarchy}.json, attachments/asset.json), Java repositories + REST resources, Elasticsearch/OpenSearch index mappings for all supported locales, SearchSettings asset-type configurations, and native 2.0.0 SQL migrations that are idempotent against existing customer databases.

All moved code was previously shipped in Collate and is being upstreamed so every OpenMetadata user gets Drive + Knowledge Center in 2.0.0. Collate continues to work because it bundles this submodule; the Collate-side cleanup removes the now-duplicated code once this PR merges.

Major pieces

Schemas under openmetadata-spec/src/main/resources/json/schema/entity/data/ and api/data/ (+ attachments/ for the Asset blob).
Backend: KnowledgePageRepository, FolderRepository, ContextFileRepository, ContextFileContentRepository, AssetRepository, DaoListFilter, ContextFileExtractionService, ContextFileTextExtractor, AssetService + S3/Azure/InMemory/NoOp providers, QueuedDeleteAssetService, ContextEntityPromptService + loaders; REST resources under resources/knowledge and resources/drive (including ContextFileResource.uploadFile with RFC 5987 Content-Disposition handling).
Search: PageIndex, FolderIndex, ContextFileIndex, ES/OS listPageHierarchy + listPageHierarchyForActivePage (single-aggregation children count).
UI: sidebar entry + Explore tree branch for Knowledge Center, KnowledgePages widget wired via EntityRightPanelClassBase and CommonWidgetClassBase so Topic/Table pages surface linked articles. KNOWLEDGE_ARTICLE added to default layouts with the isLeaf: childrenCount === 0 contract preserved.
Migrations: bootstrap/sql/migrations/native/2.0.0/{mysql,postgres}/schemaChanges.sql creates knowledge_center, drive_folder, context_file, context_file_content, asset_entity, search_index_job, search_index_retry_queue; adds tag_usage.metadata (JSON) and audit_log_event.search_text columns. Every statement uses IF NOT EXISTS / prepared-statement guards so it is a no-op on databases that already have these tables from prior Collate migrations.

Type of change

New feature (Drive pillar + Knowledge Center upstream)
Breaking change — adds new entities, REST endpoints, DB tables, and search indexes. Backwards-compatible for Collate customers upgrading.

Test plan

mvn -pl openmetadata-service test — 866 unit tests, including new ContextFileResourceTest, ContextFileUploadSupportTest, ContextFileTextExtractorTest, ContextFileExtractionServiceTest, DriveMapperTest, PageIndexTest, FolderIndexTest, ContextFileIndexTest, and the DefaultContextEntityPromptLoader / ContextEntityPromptService tests.
mvn verify -pl openmetadata-integration-tests — 46 new IT tests covering folder CRUD, file upload against MinIO (auto-started by TestSuiteBootstrap), text extraction, recursive folder delete, and knowledge page hierarchy; existing-customer migration path exercised.
Playwright — moved the Knowledge Center spec suite from Collate; Explore Filter by Knowledge Center passes end-to-end against the rebuilt UI + reindexed search cluster. The multi-step "Knowledge Center page" test is still in the process of being stabilised against the new backend semantics.
Fresh-DB + existing-customer migration check on MySQL: knowledge_center, context_file, context_file_content, drive_folder, asset_entity, search_index_job, search_index_retry_queue, tag_usage.metadata, audit_log_event.search_text all present after migration.

Checklist

I have read the CONTRIBUTING document.
Unit + integration tests added for new logic.
JSON Schema changes come with native 2.0.0 migration scripts (idempotent).
Review comments from gitar-bot and copilot addressed in commits 7e94bd48a7, b8458e2868, and 4d8c434412.

Summary by Gitar

UI/UX improvements:
- Updated KnowledgeCenterLayout to utilize react-i18next for better internationalization support.
Testing updates:
- Refactored EntityRightPanelClassBase.test to verify that getKnowLedgeArticlesWidget correctly returns the KnowledgePages component.

_{This will update automatically on new commits.}

…support

github-actions · 2026-04-20T22:31:05Z

The Java checkstyle failed.

Please run mvn spotless:apply in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Java code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds “Context Center” Drive + Knowledge Center support by introducing Folder/ContextFile entities, object storage-backed assets, text extraction, and search/indexing enhancements (including knowledge-page hierarchy).

Changes:

Introduces new JSON schemas + create request schemas for Folder, ContextFile, ContextFileContent, Asset, and Article.
Adds search indexes/settings for page/folder/contextFile and implements page-hierarchy listing in both Elasticsearch & OpenSearch clients.
Adds object storage abstraction (S3/Azure/InMemory/NoOp), async delete queueing, extraction service + extensive unit/integration test coverage.

Reviewed changes

Copilot reviewed 94 out of 364 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
openmetadata-spec/src/main/resources/json/schema/entity/data/folder.json	Defines Folder entity schema for Drive.
openmetadata-spec/src/main/resources/json/schema/entity/data/contextFileContent.json	Defines ContextFileContent snapshot entity schema.
openmetadata-spec/src/main/resources/json/schema/entity/data/contextFile.json	Defines ContextFile entity schema for Drive.
openmetadata-spec/src/main/resources/json/schema/entity/data/article.json	Adds Article schema for knowledge pages.
openmetadata-spec/src/main/resources/json/schema/attachments/asset.json	Adds Asset schema for uploaded blobs.
openmetadata-spec/src/main/resources/json/schema/api/data/createPage.json	Extends page creation schema (article/quicklink union).
openmetadata-spec/src/main/resources/json/schema/api/data/createFolder.json	Adds Folder create request schema.
openmetadata-spec/src/main/resources/json/schema/api/data/createContextFile.json	Adds ContextFile create request schema.
openmetadata-spec/src/main/resources/json/schema/api/attachments/createAsset.json	Adds Asset create request schema.
openmetadata-spec/src/main/resources/elasticsearch/indexMapping.json	Registers new search indices for page/folder/contextFile.
openmetadata-spec/src/main/antlr4/org/openmetadata/schema/EntityLink.g4	Adds entity link grammar types for new entities.
openmetadata-service/src/test/java/org/openmetadata/service/search/indexes/PageIndexTest.java	Tests Page search document fields (fqnDepth/deleted).
openmetadata-service/src/test/java/org/openmetadata/service/search/indexes/FolderIndexTest.java	Tests Folder search index doc fields.
openmetadata-service/src/test/java/org/openmetadata/service/search/indexes/ContextFileIndexTest.java	Tests ContextFile search index doc fields.
openmetadata-service/src/test/java/org/openmetadata/service/resources/drive/DriveMapperTest.java	Tests Drive mappers (parent/folder references).
openmetadata-service/src/test/java/org/openmetadata/service/resources/drive/ContextFileUploadSupportTest.java	Tests upload buffering, checksum, asset/content building.
openmetadata-service/src/test/java/org/openmetadata/service/resources/drive/ContextFileResourceTest.java	Tests filename sanitization + expiry clamping.
openmetadata-service/src/test/java/org/openmetadata/service/drive/ContextFileTextExtractorTest.java	Tests text extraction for PDF/XLSX/Text/Image OCR modes.
openmetadata-service/src/test/java/org/openmetadata/service/drive/ContextFileExtractionServiceTest.java	Tests extraction service success/failure paths and queue rejection handling.
openmetadata-service/src/test/java/org/openmetadata/service/context/DefaultContextEntityPromptLoaderTest.java	Tests prompt loader preferring canonical snapshot text.
openmetadata-service/src/test/java/org/openmetadata/service/context/ContextEntityPromptServiceTest.java	Tests prompt assembly, budget truncation, and query relevance chunking.
openmetadata-service/src/main/resources/json/data/tags/knowledgeCenterTags.json	Adds KnowledgeCenter classification + tags.
openmetadata-service/src/main/resources/json/data/settings/searchSettings.json	Adds search settings for contextFile/folder/page.
openmetadata-service/src/main/java/org/openmetadata/service/util/SearchUtils.java	Adds mapping helper from search hit -> PageHierarchy.
openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchClient.java	Implements knowledge page hierarchy listing via OpenSearch.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/PageIndex.java	Adds Page search index representation.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/FolderIndex.java	Adds Folder search index representation.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/ContextFileIndex.java	Adds ContextFile search index representation.
openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchClient.java	Implements knowledge page hierarchy listing via Elasticsearch.
openmetadata-service/src/main/java/org/openmetadata/service/search/SearchIndexFactory.java	Registers Folder/ContextFile/Page index builders.
openmetadata-service/src/main/java/org/openmetadata/service/resources/knowledge/KnowledgePageMapper.java	Adds mapper for CreatePage -> Page defaults/fields.
openmetadata-service/src/main/java/org/openmetadata/service/resources/drive/FolderResource.java	Adds REST resource for folder CRUD + contents endpoint.
openmetadata-service/src/main/java/org/openmetadata/service/resources/drive/FolderMapper.java	Adds mapper for CreateFolder -> Folder.
openmetadata-service/src/main/java/org/openmetadata/service/resources/drive/ContextFileUploadSupport.java	Adds upload utility: size cap, checksum, temp buffering, asset/content factories.
openmetadata-service/src/main/java/org/openmetadata/service/resources/drive/ContextFileMapper.java	Adds mapper for CreateContextFile -> ContextFile.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/FolderRepository.java	Adds Folder repository + parent/children relationship management.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DaoListFilter.java	Adds DAO list-filter extension with pageType condition.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ContextFileRepository.java	Adds ContextFile repository + cleanup of assets/content snapshots.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ContextFileContentRepository.java	Adds ContextFileContent repository + list-by-file-id query.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java	Adds DAOs for folder/contextFile/content/page/assets.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/AssetRepository.java	Adds Asset repository for asset_entity persistence.
openmetadata-service/src/main/java/org/openmetadata/service/drive/ContextFileExtractionService.java	Adds async extraction orchestration and status updates.
openmetadata-service/src/main/java/org/openmetadata/service/context/TokenCounter.java	Adds token counting utility for prompt budgets.
openmetadata-service/src/main/java/org/openmetadata/service/context/ResolvedContextEntity.java	Adds prompt-ready resolved entity record.
openmetadata-service/src/main/java/org/openmetadata/service/context/DefaultContextEntityPromptLoader.java	Loads pages/files into prompt sections with authorization checks.
openmetadata-service/src/main/java/org/openmetadata/service/context/ContextPromptInjectionResult.java	Adds result record for prompt injection.
openmetadata-service/src/main/java/org/openmetadata/service/context/ContextEntityPromptService.java	Builds structured prompt context with budgets and relevance chunking.
openmetadata-service/src/main/java/org/openmetadata/service/context/ContextEntityPromptLoader.java	Adds loader interface for prompt resolution.
openmetadata-service/src/main/java/org/openmetadata/service/attachments/S3AssetService.java	Implements S3-backed asset IO + signed URL generation.
openmetadata-service/src/main/java/org/openmetadata/service/attachments/QueuedDeleteAssetService.java	Wraps asset delete with queued deletes + blocking wait.
openmetadata-service/src/main/java/org/openmetadata/service/attachments/ObjectDeleteQueueService.java	Adds bounded delete queue service with capacity accounting.
openmetadata-service/src/main/java/org/openmetadata/service/attachments/NoOpAssetService.java	Adds no-op asset provider for disabled storage.
openmetadata-service/src/main/java/org/openmetadata/service/attachments/InMemoryAssetService.java	Adds in-memory asset provider for local testing.
openmetadata-service/src/main/java/org/openmetadata/service/attachments/AzureAssetService.java	Implements Azure Blob-backed asset IO + SAS URLs.
openmetadata-service/src/main/java/org/openmetadata/service/attachments/AssetServiceFactory.java	Factory for selecting provider + wrapping with delete queue.
openmetadata-service/src/main/java/org/openmetadata/service/attachments/AssetService.java	Defines AssetService interface additions.
openmetadata-service/src/main/java/org/openmetadata/service/Entity.java	Registers new entity type constants.
openmetadata-service/pom.xml	Adds dependencies for extraction (PDFBox/POI/Tika) + Azure blob.
openmetadata-sdk/src/main/java/org/openmetadata/sdk/test/util/TestNamespaceExtension.java	JUnit extension for per-test namespace.
openmetadata-sdk/src/main/java/org/openmetadata/sdk/test/util/TestNamespace.java	Utility for unique names across IT runs.
openmetadata-sdk/src/main/java/org/openmetadata/sdk/test/util/SdkClients.java	JWT-backed SDK clients for IT roles/users.
openmetadata-sdk/src/main/java/org/openmetadata/sdk/test/util/RestClient.java	Low-level REST test client for endpoints not in SDK.
openmetadata-sdk/src/main/java/org/openmetadata/sdk/test/auth/JwtAuthProvider.java	Issues RSA JWTs for integration tests.
openmetadata-sdk/pom.xml	Adds dependencies to support SDK test utilities compilation.
openmetadata-integration-tests/src/test/resources/openmetadata-secure-test.yaml	Enables object storage for IT and sets S3 config placeholders.
openmetadata-integration-tests/src/test/resources/drive/sample-pricing.xlsx	Adds spreadsheet fixture (currently stub content).
openmetadata-integration-tests/src/test/resources/drive/sample-notes.txt	Adds text upload fixture.
openmetadata-integration-tests/src/test/resources/drive/sample-data.csv	Adds CSV upload fixture.
openmetadata-integration-tests/src/test/java/org/openmetadata/it/knowledge/KnowledgeCenterIT.java	Adds IT coverage for page related entities/domains/data products.
openmetadata-integration-tests/src/test/java/org/openmetadata/it/drive/DriveTestUsers.java	Adds helper to create drive test users.
openmetadata-integration-tests/src/test/java/org/openmetadata/it/bootstrap/TestSuiteBootstrap.java	Starts MinIO testcontainer when object storage uses S3.
openmetadata-integration-tests/pom.xml	Pins commons-compress for POI compatibility in tests.
bootstrap/sql/migrations/native/2.0.0/postgres/schemaChanges.sql	Adds DB tables for knowledge_center/drive_folder/context_file/assets/content.
bootstrap/sql/migrations/native/2.0.0/mysql/schemaChanges.sql	Adds DB tables for knowledge_center/drive_folder/context_file/assets/content.

github-actions · 2026-04-20T22:36:25Z

The Java checkstyle failed.

Please run mvn spotless:apply in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Java code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

- KnowledgePageRepository: null-safe pageType in getHierarchyWithSearch and getHierarchyWithSearchForActivePage so the /search/hierarchy endpoint no longer NPEs when the pageType query param is omitted. The ES/OS client helpers already skip the pageType term when the value is null or empty, so this is a pure null-guard. - ContextFileResource.uploadFile: when a failure happens after the ContextFileContent row is created (e.g. inside extractionService.submit), the cleanup path now hard-deletes that content row so the DB is not left with an orphaned record. - ContextFileResource: replace the raw Content-Disposition string with a buildContentDisposition helper that emits both the legacy quoted filename= and the RFC 5987 filename*=UTF-8'' parameter with percent-encoded bytes, so international filenames round-trip while staying header-injection safe. sanitizeFileName also falls back to "download" on null/blank input. - ContextFileResourceTest: new cases for sanitizeFileName null/blank fallbacks and for buildContentDisposition ASCII/unicode/space/injection behaviour (18 tests, all passing).

github-actions · 2026-04-20T22:50:31Z

The Java checkstyle failed.

Please run mvn spotless:apply in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Java code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Copilot

Pull request overview

Copilot reviewed 94 out of 338 changed files in this pull request and generated 8 comments.

@deprecated

- AssetRepository.getByFqnPrefix: swap arguments so (assetType, fqnPrefix) matches the DAO signature — previous ordering always missed the index. - FolderResource / ContextFileResource getEntitySpecificOperations: return List.of() instead of null so callers iterating the returned list cannot NPE. - SearchUtils.getPageHierarchy: replace UUID.fromString with a parseUuid helper that returns null for missing/malformed values and logs a warning instead of failing the whole hierarchy response. - DaoListFilter: qualify the pageType column with the caller-provided tableName, rename getArticleCondition to getPageTypeCondition (legacy no-arg method kept as @deprecated wrapper for compatibility). - Elastic/OpenSearch client processPageHierarchyHits: replace the per-hit getChildrenCountForPage search (N+1) with a single pass over the batch that derives childrenCount from pages whose parent is in the same result set. Also drops the now-unused helper and its throws clause. - openmetadata-sdk/pom.xml: mark JWT, JAX-RS client, Apache HttpClient, jakarta.json, parsson, and JUnit Jupiter as <optional>true</optional> so they don't leak into SDK consumers that only use the core client. - InMemoryAssetService: use the shared AsyncService executor for upload /read/delete instead of the JVM common ForkJoinPool. - sample-pricing.xlsx: replace the plain-text placeholder with a real minimal XLSX workbook so detection-based and extraction-based code paths see a valid Microsoft Excel 2007+ file.

github-actions · 2026-04-20T23:05:13Z

The Java checkstyle failed.

Please run mvn spotless:apply in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Java code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Follow-up to b8458e2. The previous fix derived childrenCount from pages whose parent appeared in the same batch — that worked for listPageHierarchyForActivePage (which fetches all depths) but always returned 0 on the plain listPageHierarchy path (which only fetches one depth), so top-level listings lost the count semantically. Replace with a single `filters` aggregation keyed by page id: each named bucket matches descendants via a fullyQualifiedName prefix query against the page's FQN. That gives accurate direct-descendant counts for every returned page in one aggregation round-trip, still O(1) additional search requests regardless of batch size.

Copilot

Pull request overview

Copilot reviewed 91 out of 338 changed files in this pull request and generated 5 comments.

github-actions · 2026-04-21T02:53:58Z

The Java checkstyle failed.

Please run mvn spotless:apply in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Java code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Fixes SearchSettingsHandlerTest.testEveryAssetTypeHasCorrespondingAllowedFields. searchSettings.json already had assetTypeConfigurations for contextFile, folder, and page but no matching allowedFields entries, so the test that asserts every assetType has a corresponding allowedFields block failed with 'Asset type contextFile has no corresponding allowedFields entry'. Adds the three missing blocks with the fields that each index actually exposes — name / displayName (with .keyword and .ngram variants), description, fqn, fqnParts, tags/tier/domains/dataProducts, plus entity-specific fields (fileType/contentType/extractedText for contextFile, parent.displayName for folder/page, pageType for page).

Copilot

Pull request overview

Copilot reviewed 90 out of 338 changed files in this pull request and generated 5 comments.

- ES/OS populateChildrenCounts: add fqnDepth == parentDepth + 1 to the per-page filter so childrenCount is direct children only, matching the field name and the UI's isLeaf check semantics. Previously matched all descendants. - ES/OS buildPageNestedSearchHierarchy: filter out hits with a null id before Collectors.toMap, which would otherwise NPE when SearchUtils drops a malformed UUID. - SearchUtils.getPageHierarchy: wrap PageType.fromValue in a parsePageType helper that logs and returns null on unknown values, so a single bad hit can no longer break the whole hierarchy response. - TestSuiteBootstrap.setupMinIO: pin minio/minio to RELEASE.2024-01-16T16-07-38Z instead of :latest so a newly-published image cannot break integration tests without a code change. - createContextFile.json: rewrite the assetId description to be provider agnostic (S3 / Azure Blob / in-memory / no-op) and flag it as the legacy path, preferring headContentId / ContextFileContent.

github-actions · 2026-04-23T16:08:09Z

The Java checkstyle failed.

Please run mvn spotless:apply in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Java code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Carried over from the latest AttachmentResource review. Three issues: 1. Content-Disposition header injection (security) — downloadAsset() built the header by direct string interpolation of asset.getFileName(). A filename containing double-quotes or CRLF could inject arbitrary HTTP headers. ContextFileResource already has a sanitize + RFC-5987 encode helper; rather than duplicate it, promote ContextFileUploadSupport.sanitizeFileName / buildContentDisposition to public, delete the duplicates from ContextFileResource (now delegators), and reuse the shared helpers from AttachmentResource. 2. Unbounded upload buffering (performance / DoS) — createAssetFromUpload read the full multipart body into a byte[] via IOUtils.toByteArray before checking against MAX_FILE_SIZE. An attacker could send an arbitrarily large body and exhaust heap before the validation ran. Replace with ContextFileUploadSupport.bufferUpload(), which streams to a bounded temp file and throws MaxFileSizeExceededException the moment the configured limit is passed; translate that into the same AttachmentException size-validation error the previous code raised. Promoted BufferedUpload and MaxFileSizeExceededException to public so the attachments package can consume them. 3. Startup NPE when objectStorage is null (bug) — initialize() called config.getObjectStorage().getMaxFileSize() without a null guard, so a deployment that doesn't configure object storage would NPE on server start. Added the same guard ContextFileResource.initialize() already uses, gave MAX_FILE_SIZE a safe 5 MiB default, and also null-guarded the S3-configuration branch of the CDN URL lookup so a pure-Azure or pure-NoOp setup doesn't fall off the end of the ternary. Ran mvn spotless:apply — picks up formatting-only changes in CollectionDAO.java and FolderIndex.java as a side effect of the shared helper additions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CI's UI Checkstyle workflow has three per-area jobs (lint-src, lint-playwright, lint-core-components) that reformat the files changed in the PR and fail if the reformat produces a diff. CLAUDE.md and AGENTS.md didn't previously document this flow, so re-running the fix was a guessing game — the two lint-core-components and lint-playwright failures on this branch came from stale import order left over from the main→context_center merge. This commit: - Adds a dedicated invocable skill at .claude/skills/ui-checkstyle/SKILL.md (Claude Code harness) and a mirror at .agents/skills/ui-checkstyle/SKILL.md (Codex-style agents). Both describe the exact three-command sequence CI runs — organize-imports-cli → eslint --fix → prettier --write — the per-area file scoping, the `--check` dry-run mode, and the rule that organize-imports must run BEFORE prettier (otherwise the indentation / trailing-comma round-trip leaves a dirty diff). - Promotes the existing one-liner in CLAUDE.md and AGENTS.md to an explicit "run before finishing any UI task" instruction that points at the skill. - Fixes two residual import-order drifts (KnowledgePagesHierarchy.tsx, EntityUtilClassBase.ts) surfaced by running the skill's sequence locally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ESLint --fix inserted a blank line between the KNOWLEDGE_PAGE guard and the fallback return in getEntityByFqn. Committing the formatted version. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… parsing The previous polling search used the namespaced unique name as a free-text q= argument. The namespace prefix contains '-' which the ES 9.x query_string parser treats as a NOT operator, producing a deterministic 500 across the full 30s polling window even when the document was indexed. Switch to the direct get-by-id endpoint (/v1/search/get/{index}/doc/{id}), which performs a real-time ES GET with no query_string parsing and no analyzer involvement — the most reliable signal that the document was indexed. Bump the timeout to 60s and capture the response body on any non-200 so future regressions surface the real ES error. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Co-authored-by: Copilot <copilot@github.com>

This reverts commit f0cca5f.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

KnowledgeCenterLayout was importing i18n directly from LocalUtil, but the global setupTests mock for that module only exposes t/on. Switch to the useTranslation() hook so it picks up the react-i18next mock that already provides i18n.dir(), matching how LeftSidebar and RichTextEditor use the direction. EntityRightPanelClassBase.getKnowLedgeArticlesWidget now returns the KnowledgePages component instead of null. Update the corresponding test case to assert the new return value. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

gitar-bot · 2026-05-02T19:25:13Z

Code Review ⚠️ Changes requested 9 resolved / 10 findings

Upstreams the Context Center feature with comprehensive entity schemas, search indexing, and REST resources. Lexicographic ordering in the getExtensionsWithOffset query must be corrected to prevent breaks in paginated history.

⚠️

Bug: Lexicographic version ordering breaks paginated history

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java:1529-1533

The getExtensionsWithOffset query was changed from numeric version ordering (COALESCE(versionNum, CAST(…))) to plain ORDER BY extension DESC, which sorts lexicographically. Version extension strings look like table.version.0.1, table.version.0.9, table.version.0.10, etc. Lexicographically, 0.9 > 0.10 (because '9' > '1'), so the DB returns the wrong rows for a given page.

The caller in EntityRepository.listVersionsWithOffset() does re-sort results in Java, but that only fixes ordering within the page — with LIMIT/OFFSET pagination, the wrong set of versions is selected in the first place. For an entity with 15+ versions, requesting page 1 (limit=10) would return versions like 0.9, 0.8, …, 0.2, 0.11, 0.10 instead of the correct 1.5, 1.4, …, 0.10, 0.9.

Suggested fix

Restore numeric ordering. If `versionNum` is now always populated,
use `ORDER BY versionNum DESC, extension DESC`. Otherwise restore
the original COALESCE-based ordering with @ConnectionAwareSqlQuery
for MySQL/Postgres variants:
  MySQL:  ORDER BY COALESCE(versionNum, CAST(SUBSTRING_INDEX(extension, '.version.', -1) AS DOUBLE)) DESC
  Postgres: ORDER BY COALESCE(versionNum, split_part(extension, '.version.', 2)::DOUBLE PRECISION) DESC

✅ 9 resolved

✅ Security: Content-Disposition filename vulnerable to quote injection

📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/drive/ContextFileResource.java:444-446 📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/drive/ContextFileResource.java:368
In ContextFileResource.sanitizeFileName(), the regex ["\\r ] replaces double-quotes, backslashes, CR, and LF. However, the filename is then placed in a quoted string (filename="...") without considering characters like semicolons or encoding issues with non-ASCII characters. While the current regex prevents response splitting, it doesn't handle RFC 5987 encoding for international filenames. This is minor since the main injection vectors are covered.

✅ Bug: NullPointerException in search hierarchy when pageType omitted

📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/knowledge/KnowledgePageResource.java:278-283 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/KnowledgePageRepository.java:248-258
The listHierarchyWithSearch endpoint in KnowledgePageResource passes knowledgePageType (a nullable @QueryParam) directly to repository.getHierarchyWithSearch() and repository.getHierarchyWithSearchForActivePage(). These repository methods immediately call pageType.value() without a null check, causing a NullPointerException when the query parameter is not provided.

This is a reachable crash condition since pageType has no @DefaultValue annotation and can be null.

✅ Edge Case: Upload cleanup doesn't delete ContextFileContent on failure

📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/drive/ContextFileResource.java:276-290
In the uploadFile method, if an exception occurs after repository.getContentRepository().create(...) succeeds but before extractionService.submit(...) completes, the catch block only cleans up the ContextFile entity and Asset but does not clean up the persisted ContextFileContent record. This leaves an orphaned content record in the database.

✅ Bug: childrenCount always 0 in flat page listing breaks UI expand

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchClient.java:1245-1259 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchClient.java:1278-1292
The refactored processPageHierarchyHits now derives childrenCount solely from pages present in the current batch. However, getPageHierarchyFromSearch fetches only one depth level (siblings), so no page in that batch will be referenced as a parent by another page in the same batch. This means childrenCount will always be 0 for the flat/top-level listing.

The UI (KnowledgePageUtils.tsx:157) uses isLeaf: page.childrenCount === 0 to decide whether to show an expand arrow. With this change, all pages in the top-level listing will appear as leaf nodes even if they have children, removing the ability to expand them.

The old code issued a count query per page (N+1) which was expensive but correct. A middle ground would be to issue a single aggregation query that counts children for all pages in the batch at once, or include a hasChildren boolean from the index.

✅ Edge Case: NoOpAssetService may return null instead of empty string

📄 openmetadata-service/src/main/java/org/openmetadata/service/attachments/NoOpAssetService.java:31
The new implementation asset == null ? "" : asset.getUrl() intends to return a safe value, but Asset.url is not a required field in the schema, so asset.getUrl() can return null. This breaks the implied contract that the method now returns empty-string when storage is disabled. The caller in AttachmentResource does check for null, so this won't crash, but the inconsistency undermines the stated goal in the Javadoc ("Returning a fake URL would let clients issue downloads that can never succeed").

...and 4 more resolved from earlier reviews

🤖 Prompt for agents

Code Review: Upstreams the Context Center feature with comprehensive entity schemas, search indexing, and REST resources. Lexicographic ordering in the `getExtensionsWithOffset` query must be corrected to prevent breaks in paginated history.

1. ⚠️ Bug: Lexicographic version ordering breaks paginated history
   Files: openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java:1529-1533

   The `getExtensionsWithOffset` query was changed from numeric version ordering (`COALESCE(versionNum, CAST(…))`) to plain `ORDER BY extension DESC`, which sorts lexicographically. Version extension strings look like `table.version.0.1`, `table.version.0.9`, `table.version.0.10`, etc. Lexicographically, `0.9` > `0.10` (because `'9' > '1'`), so the DB returns the wrong rows for a given page.
   
   The caller in `EntityRepository.listVersionsWithOffset()` does re-sort results in Java, but that only fixes ordering *within* the page — with `LIMIT/OFFSET` pagination, the wrong set of versions is selected in the first place. For an entity with 15+ versions, requesting page 1 (limit=10) would return versions like 0.9, 0.8, …, 0.2, 0.11, 0.10 instead of the correct 1.5, 1.4, …, 0.10, 0.9.

   Suggested fix:
   Restore numeric ordering. If `versionNum` is now always populated,
   use `ORDER BY versionNum DESC, extension DESC`. Otherwise restore
   the original COALESCE-based ordering with @ConnectionAwareSqlQuery
   for MySQL/Postgres variants:
     MySQL:  ORDER BY COALESCE(versionNum, CAST(SUBSTRING_INDEX(extension, '.version.', -1) AS DOUBLE)) DESC
     Postgres: ORDER BY COALESCE(versionNum, split_part(extension, '.version.', 2)::DOUBLE PRECISION) DESC

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

github-actions · 2026-05-02T19:46:54Z

Jest test Coverage

UI tests summary

Lines	Statements	Branches	Functions
	62.24% (63861/102602)	42.67% (34387/80581)	45.55% (10175/22334)

sonarqubecloud · 2026-05-02T19:52:30Z

Quality Gate passed for 'open-metadata-ui'

Issues
1109 New issues
0 Accepted issues

Measures
17 Security Hotspots
0.0% Coverage on New Code
3.8% Duplication on New Code

See analysis details on SonarQube Cloud

sonarqubecloud · 2026-05-02T20:49:17Z

Quality Gate passed for 'open-metadata-ingestion'

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

harshach added 2 commits April 18, 2026 10:11

Add Context Center: Migrate Knowledge Center , Images/ PDFs document …

f33a4af

…support

Add Context Center: Migrate Knowledge Center , Images/ PDFs document …

a1c2970

…support

Copilot AI review requested due to automatic review settings April 20, 2026 22:28

harshach requested a review from a team as a code owner April 20, 2026 22:28

github-actions Bot added backend safe to test Add this label to run secure Github workflows on PRs labels Apr 20, 2026

gitar-bot Bot reviewed Apr 20, 2026

View reviewed changes

Comment thread ...ervice/src/main/java/org/openmetadata/service/resources/knowledge/KnowledgePageResource.java

gitar-bot Bot reviewed Apr 20, 2026

View reviewed changes

Comment thread ...data-service/src/main/java/org/openmetadata/service/resources/drive/ContextFileResource.java

gitar-bot Bot reviewed Apr 20, 2026

View reviewed changes

Comment thread ...data-service/src/main/java/org/openmetadata/service/resources/drive/ContextFileResource.java

Copilot AI reviewed Apr 20, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into context_center

208b781

Copilot AI review requested due to automatic review settings April 20, 2026 22:48

Copilot AI reviewed Apr 20, 2026

View reviewed changes

gitar-bot Bot reviewed Apr 20, 2026

View reviewed changes

Comment thread ...service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchClient.java Outdated

Copilot AI review requested due to automatic review settings April 21, 2026 02:51

Copilot AI reviewed Apr 21, 2026

View reviewed changes

harshach added 2 commits April 20, 2026 19:57

Fix ui checkstyle

bede305

Copilot AI review requested due to automatic review settings April 21, 2026 02:58

Fix Java checkstyle

f616f2f

Copilot AI reviewed Apr 21, 2026

View reviewed changes

harshach had a problem deploying to test April 23, 2026 16:21 — with GitHub Actions Error

harshach and others added 18 commits April 23, 2026 09:21

Merge branch 'main' into context_center

839c6ff

Fix UI checkstyle on EntityUtilClassBase.ts

e000225

ESLint --fix inserted a blank line between the KNOWLEDGE_PAGE guard and the fallback return in getEntityByFqn. Committing the formatted version. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'origin/context_center' into context_center

32ab0de

Merge branch 'main' into context_center

85c3ffa

Merge branch 'main' into context_center

ad90d5b

Merge branch 'main' into context_center

b5565be

Merge branch 'main' into context_center

d54735e

Merge branch 'main' into context_center

80a9103

Merge branch 'main' into context_center

4ee94e3

Merge branch 'main' into context_center

4d5e1b9

Fix knowledge center icon

18d6060

update knowledge center to context center

f0cca5f

Co-authored-by: Copilot <copilot@github.com>

Revert "update knowledge center to context center"

c1f6abb

This reverts commit f0cca5f.

Merge branch 'main' into context_center

43ec22d

Fix UI checkstyle: sort tag*-related imports in SearchClassBase

5eb9000

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Conversation

harshach commented Apr 20, 2026 • edited by gitar-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Major pieces

Type of change

Test plan

Checklist

Summary by Gitar

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 21, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

gitar-bot Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 2, 2026

Jest test Coverage

UI tests summary

Uh oh!

sonarqubecloud Bot commented May 2, 2026

Quality Gate passed for 'open-metadata-ui'

Uh oh!

harshach commented Apr 20, 2026 •

edited by gitar-bot Bot

Loading

gitar-bot Bot commented May 2, 2026 •

edited

Loading