Skip to content

Context center#27558

Open
harshach wants to merge 60 commits intomainfrom
context_center
Open

Context center#27558
harshach wants to merge 60 commits intomainfrom
context_center

Conversation

@harshach
Copy link
Copy Markdown
Collaborator

@harshach harshach commented Apr 20, 2026

Summary

Brings the Context Center feature into OpenMetadata core:

  • Drive pillar — Folder + ContextFile entities, object-storage-backed uploads, deduplicated asset_entity blobs, async delete queue, signed-URL download for S3/Azure providers, and PDF / Office / image OCR / spreadsheet / text extraction via PDFBox, POI, and Tika.
  • Knowledge pillar — Page entity with Article and QuickLink page types, search-backed hierarchy listing, review workflow, and a UI that ships as a sidebar entry below Govern.
  • Shared plumbing — new entity schemas (entity/data/{folder,contextFile,contextFileContent,page,article,quickLink,pageHierarchy}.json, attachments/asset.json), Java repositories + REST resources, Elasticsearch/OpenSearch index mappings for all supported locales, SearchSettings asset-type configurations, and native 2.0.0 SQL migrations that are idempotent against existing customer databases.

All moved code was previously shipped in Collate and is being upstreamed so every OpenMetadata user gets Drive + Knowledge Center in 2.0.0. Collate continues to work because it bundles this submodule; the Collate-side cleanup removes the now-duplicated code once this PR merges.

Major pieces

  • Schemas under openmetadata-spec/src/main/resources/json/schema/entity/data/ and api/data/ (+ attachments/ for the Asset blob).
  • Backend: KnowledgePageRepository, FolderRepository, ContextFileRepository, ContextFileContentRepository, AssetRepository, DaoListFilter, ContextFileExtractionService, ContextFileTextExtractor, AssetService + S3/Azure/InMemory/NoOp providers, QueuedDeleteAssetService, ContextEntityPromptService + loaders; REST resources under resources/knowledge and resources/drive (including ContextFileResource.uploadFile with RFC 5987 Content-Disposition handling).
  • Search: PageIndex, FolderIndex, ContextFileIndex, ES/OS listPageHierarchy + listPageHierarchyForActivePage (single-aggregation children count).
  • UI: sidebar entry + Explore tree branch for Knowledge Center, KnowledgePages widget wired via EntityRightPanelClassBase and CommonWidgetClassBase so Topic/Table pages surface linked articles. KNOWLEDGE_ARTICLE added to default layouts with the isLeaf: childrenCount === 0 contract preserved.
  • Migrations: bootstrap/sql/migrations/native/2.0.0/{mysql,postgres}/schemaChanges.sql creates knowledge_center, drive_folder, context_file, context_file_content, asset_entity, search_index_job, search_index_retry_queue; adds tag_usage.metadata (JSON) and audit_log_event.search_text columns. Every statement uses IF NOT EXISTS / prepared-statement guards so it is a no-op on databases that already have these tables from prior Collate migrations.

Type of change

  • New feature (Drive pillar + Knowledge Center upstream)
  • Breaking change — adds new entities, REST endpoints, DB tables, and search indexes. Backwards-compatible for Collate customers upgrading.

Test plan

  • mvn -pl openmetadata-service test — 866 unit tests, including new ContextFileResourceTest, ContextFileUploadSupportTest, ContextFileTextExtractorTest, ContextFileExtractionServiceTest, DriveMapperTest, PageIndexTest, FolderIndexTest, ContextFileIndexTest, and the DefaultContextEntityPromptLoader / ContextEntityPromptService tests.
  • mvn verify -pl openmetadata-integration-tests — 46 new IT tests covering folder CRUD, file upload against MinIO (auto-started by TestSuiteBootstrap), text extraction, recursive folder delete, and knowledge page hierarchy; existing-customer migration path exercised.
  • Playwright — moved the Knowledge Center spec suite from Collate; Explore Filter by Knowledge Center passes end-to-end against the rebuilt UI + reindexed search cluster. The multi-step "Knowledge Center page" test is still in the process of being stabilised against the new backend semantics.
  • Fresh-DB + existing-customer migration check on MySQL: knowledge_center, context_file, context_file_content, drive_folder, asset_entity, search_index_job, search_index_retry_queue, tag_usage.metadata, audit_log_event.search_text all present after migration.

Checklist

  • I have read the CONTRIBUTING document.
  • Unit + integration tests added for new logic.
  • JSON Schema changes come with native 2.0.0 migration scripts (idempotent).
  • Review comments from gitar-bot and copilot addressed in commits 7e94bd48a7, b8458e2868, and 4d8c434412.

Summary by Gitar

  • UI/UX improvements:
    • Updated KnowledgeCenterLayout to utilize react-i18next for better internationalization support.
  • Testing updates:
    • Refactored EntityRightPanelClassBase.test to verify that getKnowLedgeArticlesWidget correctly returns the KnowledgePages component.

This will update automatically on new commits.

Copilot AI review requested due to automatic review settings April 20, 2026 22:28
@harshach harshach requested a review from a team as a code owner April 20, 2026 22:28
@github-actions github-actions Bot added backend safe to test Add this label to run secure Github workflows on PRs labels Apr 20, 2026
@github-actions
Copy link
Copy Markdown
Contributor

The Java checkstyle failed.

Please run mvn spotless:apply in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Java code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds “Context Center” Drive + Knowledge Center support by introducing Folder/ContextFile entities, object storage-backed assets, text extraction, and search/indexing enhancements (including knowledge-page hierarchy).

Changes:

  • Introduces new JSON schemas + create request schemas for Folder, ContextFile, ContextFileContent, Asset, and Article.
  • Adds search indexes/settings for page/folder/contextFile and implements page-hierarchy listing in both Elasticsearch & OpenSearch clients.
  • Adds object storage abstraction (S3/Azure/InMemory/NoOp), async delete queueing, extraction service + extensive unit/integration test coverage.

Reviewed changes

Copilot reviewed 94 out of 364 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
openmetadata-spec/src/main/resources/json/schema/entity/data/folder.json Defines Folder entity schema for Drive.
openmetadata-spec/src/main/resources/json/schema/entity/data/contextFileContent.json Defines ContextFileContent snapshot entity schema.
openmetadata-spec/src/main/resources/json/schema/entity/data/contextFile.json Defines ContextFile entity schema for Drive.
openmetadata-spec/src/main/resources/json/schema/entity/data/article.json Adds Article schema for knowledge pages.
openmetadata-spec/src/main/resources/json/schema/attachments/asset.json Adds Asset schema for uploaded blobs.
openmetadata-spec/src/main/resources/json/schema/api/data/createPage.json Extends page creation schema (article/quicklink union).
openmetadata-spec/src/main/resources/json/schema/api/data/createFolder.json Adds Folder create request schema.
openmetadata-spec/src/main/resources/json/schema/api/data/createContextFile.json Adds ContextFile create request schema.
openmetadata-spec/src/main/resources/json/schema/api/attachments/createAsset.json Adds Asset create request schema.
openmetadata-spec/src/main/resources/elasticsearch/indexMapping.json Registers new search indices for page/folder/contextFile.
openmetadata-spec/src/main/antlr4/org/openmetadata/schema/EntityLink.g4 Adds entity link grammar types for new entities.
openmetadata-service/src/test/java/org/openmetadata/service/search/indexes/PageIndexTest.java Tests Page search document fields (fqnDepth/deleted).
openmetadata-service/src/test/java/org/openmetadata/service/search/indexes/FolderIndexTest.java Tests Folder search index doc fields.
openmetadata-service/src/test/java/org/openmetadata/service/search/indexes/ContextFileIndexTest.java Tests ContextFile search index doc fields.
openmetadata-service/src/test/java/org/openmetadata/service/resources/drive/DriveMapperTest.java Tests Drive mappers (parent/folder references).
openmetadata-service/src/test/java/org/openmetadata/service/resources/drive/ContextFileUploadSupportTest.java Tests upload buffering, checksum, asset/content building.
openmetadata-service/src/test/java/org/openmetadata/service/resources/drive/ContextFileResourceTest.java Tests filename sanitization + expiry clamping.
openmetadata-service/src/test/java/org/openmetadata/service/drive/ContextFileTextExtractorTest.java Tests text extraction for PDF/XLSX/Text/Image OCR modes.
openmetadata-service/src/test/java/org/openmetadata/service/drive/ContextFileExtractionServiceTest.java Tests extraction service success/failure paths and queue rejection handling.
openmetadata-service/src/test/java/org/openmetadata/service/context/DefaultContextEntityPromptLoaderTest.java Tests prompt loader preferring canonical snapshot text.
openmetadata-service/src/test/java/org/openmetadata/service/context/ContextEntityPromptServiceTest.java Tests prompt assembly, budget truncation, and query relevance chunking.
openmetadata-service/src/main/resources/json/data/tags/knowledgeCenterTags.json Adds KnowledgeCenter classification + tags.
openmetadata-service/src/main/resources/json/data/settings/searchSettings.json Adds search settings for contextFile/folder/page.
openmetadata-service/src/main/java/org/openmetadata/service/util/SearchUtils.java Adds mapping helper from search hit -> PageHierarchy.
openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchClient.java Implements knowledge page hierarchy listing via OpenSearch.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/PageIndex.java Adds Page search index representation.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/FolderIndex.java Adds Folder search index representation.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/ContextFileIndex.java Adds ContextFile search index representation.
openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchClient.java Implements knowledge page hierarchy listing via Elasticsearch.
openmetadata-service/src/main/java/org/openmetadata/service/search/SearchIndexFactory.java Registers Folder/ContextFile/Page index builders.
openmetadata-service/src/main/java/org/openmetadata/service/resources/knowledge/KnowledgePageMapper.java Adds mapper for CreatePage -> Page defaults/fields.
openmetadata-service/src/main/java/org/openmetadata/service/resources/drive/FolderResource.java Adds REST resource for folder CRUD + contents endpoint.
openmetadata-service/src/main/java/org/openmetadata/service/resources/drive/FolderMapper.java Adds mapper for CreateFolder -> Folder.
openmetadata-service/src/main/java/org/openmetadata/service/resources/drive/ContextFileUploadSupport.java Adds upload utility: size cap, checksum, temp buffering, asset/content factories.
openmetadata-service/src/main/java/org/openmetadata/service/resources/drive/ContextFileMapper.java Adds mapper for CreateContextFile -> ContextFile.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/FolderRepository.java Adds Folder repository + parent/children relationship management.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DaoListFilter.java Adds DAO list-filter extension with pageType condition.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ContextFileRepository.java Adds ContextFile repository + cleanup of assets/content snapshots.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ContextFileContentRepository.java Adds ContextFileContent repository + list-by-file-id query.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java Adds DAOs for folder/contextFile/content/page/assets.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/AssetRepository.java Adds Asset repository for asset_entity persistence.
openmetadata-service/src/main/java/org/openmetadata/service/drive/ContextFileExtractionService.java Adds async extraction orchestration and status updates.
openmetadata-service/src/main/java/org/openmetadata/service/context/TokenCounter.java Adds token counting utility for prompt budgets.
openmetadata-service/src/main/java/org/openmetadata/service/context/ResolvedContextEntity.java Adds prompt-ready resolved entity record.
openmetadata-service/src/main/java/org/openmetadata/service/context/DefaultContextEntityPromptLoader.java Loads pages/files into prompt sections with authorization checks.
openmetadata-service/src/main/java/org/openmetadata/service/context/ContextPromptInjectionResult.java Adds result record for prompt injection.
openmetadata-service/src/main/java/org/openmetadata/service/context/ContextEntityPromptService.java Builds structured prompt context with budgets and relevance chunking.
openmetadata-service/src/main/java/org/openmetadata/service/context/ContextEntityPromptLoader.java Adds loader interface for prompt resolution.
openmetadata-service/src/main/java/org/openmetadata/service/attachments/S3AssetService.java Implements S3-backed asset IO + signed URL generation.
openmetadata-service/src/main/java/org/openmetadata/service/attachments/QueuedDeleteAssetService.java Wraps asset delete with queued deletes + blocking wait.
openmetadata-service/src/main/java/org/openmetadata/service/attachments/ObjectDeleteQueueService.java Adds bounded delete queue service with capacity accounting.
openmetadata-service/src/main/java/org/openmetadata/service/attachments/NoOpAssetService.java Adds no-op asset provider for disabled storage.
openmetadata-service/src/main/java/org/openmetadata/service/attachments/InMemoryAssetService.java Adds in-memory asset provider for local testing.
openmetadata-service/src/main/java/org/openmetadata/service/attachments/AzureAssetService.java Implements Azure Blob-backed asset IO + SAS URLs.
openmetadata-service/src/main/java/org/openmetadata/service/attachments/AssetServiceFactory.java Factory for selecting provider + wrapping with delete queue.
openmetadata-service/src/main/java/org/openmetadata/service/attachments/AssetService.java Defines AssetService interface additions.
openmetadata-service/src/main/java/org/openmetadata/service/Entity.java Registers new entity type constants.
openmetadata-service/pom.xml Adds dependencies for extraction (PDFBox/POI/Tika) + Azure blob.
openmetadata-sdk/src/main/java/org/openmetadata/sdk/test/util/TestNamespaceExtension.java JUnit extension for per-test namespace.
openmetadata-sdk/src/main/java/org/openmetadata/sdk/test/util/TestNamespace.java Utility for unique names across IT runs.
openmetadata-sdk/src/main/java/org/openmetadata/sdk/test/util/SdkClients.java JWT-backed SDK clients for IT roles/users.
openmetadata-sdk/src/main/java/org/openmetadata/sdk/test/util/RestClient.java Low-level REST test client for endpoints not in SDK.
openmetadata-sdk/src/main/java/org/openmetadata/sdk/test/auth/JwtAuthProvider.java Issues RSA JWTs for integration tests.
openmetadata-sdk/pom.xml Adds dependencies to support SDK test utilities compilation.
openmetadata-integration-tests/src/test/resources/openmetadata-secure-test.yaml Enables object storage for IT and sets S3 config placeholders.
openmetadata-integration-tests/src/test/resources/drive/sample-pricing.xlsx Adds spreadsheet fixture (currently stub content).
openmetadata-integration-tests/src/test/resources/drive/sample-notes.txt Adds text upload fixture.
openmetadata-integration-tests/src/test/resources/drive/sample-data.csv Adds CSV upload fixture.
openmetadata-integration-tests/src/test/java/org/openmetadata/it/knowledge/KnowledgeCenterIT.java Adds IT coverage for page related entities/domains/data products.
openmetadata-integration-tests/src/test/java/org/openmetadata/it/drive/DriveTestUsers.java Adds helper to create drive test users.
openmetadata-integration-tests/src/test/java/org/openmetadata/it/bootstrap/TestSuiteBootstrap.java Starts MinIO testcontainer when object storage uses S3.
openmetadata-integration-tests/pom.xml Pins commons-compress for POI compatibility in tests.
bootstrap/sql/migrations/native/2.0.0/postgres/schemaChanges.sql Adds DB tables for knowledge_center/drive_folder/context_file/assets/content.
bootstrap/sql/migrations/native/2.0.0/mysql/schemaChanges.sql Adds DB tables for knowledge_center/drive_folder/context_file/assets/content.

Comment thread openmetadata-sdk/pom.xml Outdated
Comment thread openmetadata-integration-tests/src/test/resources/drive/sample-pricing.xlsx Outdated
@github-actions
Copy link
Copy Markdown
Contributor

The Java checkstyle failed.

Please run mvn spotless:apply in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Java code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

- KnowledgePageRepository: null-safe pageType in getHierarchyWithSearch
  and getHierarchyWithSearchForActivePage so the /search/hierarchy
  endpoint no longer NPEs when the pageType query param is omitted. The
  ES/OS client helpers already skip the pageType term when the value is
  null or empty, so this is a pure null-guard.
- ContextFileResource.uploadFile: when a failure happens after the
  ContextFileContent row is created (e.g. inside extractionService.submit),
  the cleanup path now hard-deletes that content row so the DB is not
  left with an orphaned record.
- ContextFileResource: replace the raw Content-Disposition string with a
  buildContentDisposition helper that emits both the legacy quoted
  filename= and the RFC 5987 filename*=UTF-8'' parameter with
  percent-encoded bytes, so international filenames round-trip while
  staying header-injection safe. sanitizeFileName also falls back to
  "download" on null/blank input.
- ContextFileResourceTest: new cases for sanitizeFileName null/blank
  fallbacks and for buildContentDisposition ASCII/unicode/space/injection
  behaviour (18 tests, all passing).
Copilot AI review requested due to automatic review settings April 20, 2026 22:48
@github-actions
Copy link
Copy Markdown
Contributor

The Java checkstyle failed.

Please run mvn spotless:apply in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Java code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 94 out of 338 changed files in this pull request and generated 8 comments.

- AssetRepository.getByFqnPrefix: swap arguments so (assetType, fqnPrefix)
  matches the DAO signature — previous ordering always missed the index.
- FolderResource / ContextFileResource getEntitySpecificOperations: return
  List.of() instead of null so callers iterating the returned list cannot
  NPE.
- SearchUtils.getPageHierarchy: replace UUID.fromString with a parseUuid
  helper that returns null for missing/malformed values and logs a warning
  instead of failing the whole hierarchy response.
- DaoListFilter: qualify the pageType column with the caller-provided
  tableName, rename getArticleCondition to getPageTypeCondition (legacy
  no-arg method kept as @deprecated wrapper for compatibility).
- Elastic/OpenSearch client processPageHierarchyHits: replace the per-hit
  getChildrenCountForPage search (N+1) with a single pass over the batch
  that derives childrenCount from pages whose parent is in the same
  result set. Also drops the now-unused helper and its throws clause.
- openmetadata-sdk/pom.xml: mark JWT, JAX-RS client, Apache HttpClient,
  jakarta.json, parsson, and JUnit Jupiter as <optional>true</optional>
  so they don't leak into SDK consumers that only use the core client.
- InMemoryAssetService: use the shared AsyncService executor for upload
  /read/delete instead of the JVM common ForkJoinPool.
- sample-pricing.xlsx: replace the plain-text placeholder with a real
  minimal XLSX workbook so detection-based and extraction-based code
  paths see a valid Microsoft Excel 2007+ file.
@github-actions
Copy link
Copy Markdown
Contributor

The Java checkstyle failed.

Please run mvn spotless:apply in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Java code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Follow-up to b8458e2. The previous fix derived childrenCount from
pages whose parent appeared in the same batch — that worked for
listPageHierarchyForActivePage (which fetches all depths) but always
returned 0 on the plain listPageHierarchy path (which only fetches one
depth), so top-level listings lost the count semantically.

Replace with a single `filters` aggregation keyed by page id: each
named bucket matches descendants via a fullyQualifiedName prefix query
against the page's FQN. That gives accurate direct-descendant counts
for every returned page in one aggregation round-trip, still O(1)
additional search requests regardless of batch size.
Copilot AI review requested due to automatic review settings April 21, 2026 02:51
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 91 out of 338 changed files in this pull request and generated 5 comments.

Comment thread openmetadata-service/src/main/java/org/openmetadata/service/util/SearchUtils.java Outdated
Comment thread openmetadata-spec/src/main/resources/json/schema/api/data/createContextFile.json Outdated
@github-actions
Copy link
Copy Markdown
Contributor

The Java checkstyle failed.

Please run mvn spotless:apply in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Java code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Fixes SearchSettingsHandlerTest.testEveryAssetTypeHasCorrespondingAllowedFields.

searchSettings.json already had assetTypeConfigurations for contextFile,
folder, and page but no matching allowedFields entries, so the test that
asserts every assetType has a corresponding allowedFields block failed
with 'Asset type contextFile has no corresponding allowedFields entry'.

Adds the three missing blocks with the fields that each index actually
exposes — name / displayName (with .keyword and .ngram variants),
description, fqn, fqnParts, tags/tier/domains/dataProducts, plus
entity-specific fields (fileType/contentType/extractedText for
contextFile, parent.displayName for folder/page, pageType for page).
Copilot AI review requested due to automatic review settings April 21, 2026 02:58
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 90 out of 338 changed files in this pull request and generated 5 comments.

Comment thread bootstrap/sql/migrations/native/2.0.0/mysql/schemaChanges.sql
Comment thread bootstrap/sql/migrations/native/2.0.0/mysql/schemaChanges.sql
- ES/OS populateChildrenCounts: add fqnDepth == parentDepth + 1 to the
  per-page filter so childrenCount is direct children only, matching the
  field name and the UI's isLeaf check semantics. Previously matched all
  descendants.
- ES/OS buildPageNestedSearchHierarchy: filter out hits with a null id
  before Collectors.toMap, which would otherwise NPE when SearchUtils
  drops a malformed UUID.
- SearchUtils.getPageHierarchy: wrap PageType.fromValue in a parsePageType
  helper that logs and returns null on unknown values, so a single bad
  hit can no longer break the whole hierarchy response.
- TestSuiteBootstrap.setupMinIO: pin minio/minio to
  RELEASE.2024-01-16T16-07-38Z instead of :latest so a newly-published
  image cannot break integration tests without a code change.
- createContextFile.json: rewrite the assetId description to be provider
  agnostic (S3 / Azure Blob / in-memory / no-op) and flag it as the legacy
  path, preferring headContentId / ContextFileContent.
@github-actions
Copy link
Copy Markdown
Contributor

The Java checkstyle failed.

Please run mvn spotless:apply in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Java code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Carried over from the latest AttachmentResource review. Three issues:

1. Content-Disposition header injection (security) — downloadAsset() built
   the header by direct string interpolation of asset.getFileName(). A
   filename containing double-quotes or CRLF could inject arbitrary HTTP
   headers. ContextFileResource already has a sanitize + RFC-5987 encode
   helper; rather than duplicate it, promote
   ContextFileUploadSupport.sanitizeFileName / buildContentDisposition to
   public, delete the duplicates from ContextFileResource (now delegators),
   and reuse the shared helpers from AttachmentResource.

2. Unbounded upload buffering (performance / DoS) — createAssetFromUpload
   read the full multipart body into a byte[] via IOUtils.toByteArray
   before checking against MAX_FILE_SIZE. An attacker could send an
   arbitrarily large body and exhaust heap before the validation ran.
   Replace with ContextFileUploadSupport.bufferUpload(), which streams to
   a bounded temp file and throws MaxFileSizeExceededException the moment
   the configured limit is passed; translate that into the same
   AttachmentException size-validation error the previous code raised.
   Promoted BufferedUpload and MaxFileSizeExceededException to public so
   the attachments package can consume them.

3. Startup NPE when objectStorage is null (bug) — initialize() called
   config.getObjectStorage().getMaxFileSize() without a null guard, so a
   deployment that doesn't configure object storage would NPE on server
   start. Added the same guard ContextFileResource.initialize() already
   uses, gave MAX_FILE_SIZE a safe 5 MiB default, and also null-guarded
   the S3-configuration branch of the CDN URL lookup so a pure-Azure or
   pure-NoOp setup doesn't fall off the end of the ternary.

Ran mvn spotless:apply — picks up formatting-only changes in
CollectionDAO.java and FolderIndex.java as a side effect of the shared
helper additions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
harshach and others added 18 commits April 23, 2026 09:21
CI's UI Checkstyle workflow has three per-area jobs (lint-src,
lint-playwright, lint-core-components) that reformat the files changed
in the PR and fail if the reformat produces a diff. CLAUDE.md and
AGENTS.md didn't previously document this flow, so re-running the fix
was a guessing game — the two lint-core-components and lint-playwright
failures on this branch came from stale import order left over from the
main→context_center merge.

This commit:

- Adds a dedicated invocable skill at .claude/skills/ui-checkstyle/SKILL.md
  (Claude Code harness) and a mirror at .agents/skills/ui-checkstyle/SKILL.md
  (Codex-style agents). Both describe the exact three-command sequence CI
  runs — organize-imports-cli → eslint --fix → prettier --write — the
  per-area file scoping, the `--check` dry-run mode, and the rule that
  organize-imports must run BEFORE prettier (otherwise the indentation /
  trailing-comma round-trip leaves a dirty diff).

- Promotes the existing one-liner in CLAUDE.md and AGENTS.md to an explicit
  "run before finishing any UI task" instruction that points at the skill.

- Fixes two residual import-order drifts (KnowledgePagesHierarchy.tsx,
  EntityUtilClassBase.ts) surfaced by running the skill's sequence locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ESLint --fix inserted a blank line between the KNOWLEDGE_PAGE guard and the
fallback return in getEntityByFqn. Committing the formatted version.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… parsing

The previous polling search used the namespaced unique name as a free-text
q= argument. The namespace prefix contains '-' which the ES 9.x query_string
parser treats as a NOT operator, producing a deterministic 500 across the
full 30s polling window even when the document was indexed.

Switch to the direct get-by-id endpoint (/v1/search/get/{index}/doc/{id}),
which performs a real-time ES GET with no query_string parsing and no
analyzer involvement — the most reliable signal that the document was
indexed. Bump the timeout to 60s and capture the response body on any
non-200 so future regressions surface the real ES error.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Copilot <copilot@github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
KnowledgeCenterLayout was importing i18n directly from LocalUtil, but the
global setupTests mock for that module only exposes t/on. Switch to the
useTranslation() hook so it picks up the react-i18next mock that already
provides i18n.dir(), matching how LeftSidebar and RichTextEditor use the
direction.

EntityRightPanelClassBase.getKnowLedgeArticlesWidget now returns the
KnowledgePages component instead of null. Update the corresponding test
case to assert the new return value.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented May 2, 2026

Code Review ⚠️ Changes requested 9 resolved / 10 findings

Upstreams the Context Center feature with comprehensive entity schemas, search indexing, and REST resources. Lexicographic ordering in the getExtensionsWithOffset query must be corrected to prevent breaks in paginated history.

⚠️ Bug: Lexicographic version ordering breaks paginated history

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java:1529-1533

The getExtensionsWithOffset query was changed from numeric version ordering (COALESCE(versionNum, CAST(…))) to plain ORDER BY extension DESC, which sorts lexicographically. Version extension strings look like table.version.0.1, table.version.0.9, table.version.0.10, etc. Lexicographically, 0.9 > 0.10 (because '9' > '1'), so the DB returns the wrong rows for a given page.

The caller in EntityRepository.listVersionsWithOffset() does re-sort results in Java, but that only fixes ordering within the page — with LIMIT/OFFSET pagination, the wrong set of versions is selected in the first place. For an entity with 15+ versions, requesting page 1 (limit=10) would return versions like 0.9, 0.8, …, 0.2, 0.11, 0.10 instead of the correct 1.5, 1.4, …, 0.10, 0.9.

Suggested fix
Restore numeric ordering. If `versionNum` is now always populated,
use `ORDER BY versionNum DESC, extension DESC`. Otherwise restore
the original COALESCE-based ordering with @ConnectionAwareSqlQuery
for MySQL/Postgres variants:
  MySQL:  ORDER BY COALESCE(versionNum, CAST(SUBSTRING_INDEX(extension, '.version.', -1) AS DOUBLE)) DESC
  Postgres: ORDER BY COALESCE(versionNum, split_part(extension, '.version.', 2)::DOUBLE PRECISION) DESC
✅ 9 resolved
Security: Content-Disposition filename vulnerable to quote injection

📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/drive/ContextFileResource.java:444-446 📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/drive/ContextFileResource.java:368
In ContextFileResource.sanitizeFileName(), the regex ["\\r ] replaces double-quotes, backslashes, CR, and LF. However, the filename is then placed in a quoted string (filename="...") without considering characters like semicolons or encoding issues with non-ASCII characters. While the current regex prevents response splitting, it doesn't handle RFC 5987 encoding for international filenames. This is minor since the main injection vectors are covered.

Bug: NullPointerException in search hierarchy when pageType omitted

📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/knowledge/KnowledgePageResource.java:278-283 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/KnowledgePageRepository.java:248-258
The listHierarchyWithSearch endpoint in KnowledgePageResource passes knowledgePageType (a nullable @QueryParam) directly to repository.getHierarchyWithSearch() and repository.getHierarchyWithSearchForActivePage(). These repository methods immediately call pageType.value() without a null check, causing a NullPointerException when the query parameter is not provided.

This is a reachable crash condition since pageType has no @DefaultValue annotation and can be null.

Edge Case: Upload cleanup doesn't delete ContextFileContent on failure

📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/drive/ContextFileResource.java:276-290
In the uploadFile method, if an exception occurs after repository.getContentRepository().create(...) succeeds but before extractionService.submit(...) completes, the catch block only cleans up the ContextFile entity and Asset but does not clean up the persisted ContextFileContent record. This leaves an orphaned content record in the database.

Bug: childrenCount always 0 in flat page listing breaks UI expand

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchClient.java:1245-1259 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchClient.java:1278-1292
The refactored processPageHierarchyHits now derives childrenCount solely from pages present in the current batch. However, getPageHierarchyFromSearch fetches only one depth level (siblings), so no page in that batch will be referenced as a parent by another page in the same batch. This means childrenCount will always be 0 for the flat/top-level listing.

The UI (KnowledgePageUtils.tsx:157) uses isLeaf: page.childrenCount === 0 to decide whether to show an expand arrow. With this change, all pages in the top-level listing will appear as leaf nodes even if they have children, removing the ability to expand them.

The old code issued a count query per page (N+1) which was expensive but correct. A middle ground would be to issue a single aggregation query that counts children for all pages in the batch at once, or include a hasChildren boolean from the index.

Edge Case: NoOpAssetService may return null instead of empty string

📄 openmetadata-service/src/main/java/org/openmetadata/service/attachments/NoOpAssetService.java:31
The new implementation asset == null ? "" : asset.getUrl() intends to return a safe value, but Asset.url is not a required field in the schema, so asset.getUrl() can return null. This breaks the implied contract that the method now returns empty-string when storage is disabled. The caller in AttachmentResource does check for null, so this won't crash, but the inconsistency undermines the stated goal in the Javadoc ("Returning a fake URL would let clients issue downloads that can never succeed").

...and 4 more resolved from earlier reviews

🤖 Prompt for agents
Code Review: Upstreams the Context Center feature with comprehensive entity schemas, search indexing, and REST resources. Lexicographic ordering in the `getExtensionsWithOffset` query must be corrected to prevent breaks in paginated history.

1. ⚠️ Bug: Lexicographic version ordering breaks paginated history
   Files: openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java:1529-1533

   The `getExtensionsWithOffset` query was changed from numeric version ordering (`COALESCE(versionNum, CAST(…))`) to plain `ORDER BY extension DESC`, which sorts lexicographically. Version extension strings look like `table.version.0.1`, `table.version.0.9`, `table.version.0.10`, etc. Lexicographically, `0.9` > `0.10` (because `'9' > '1'`), so the DB returns the wrong rows for a given page.
   
   The caller in `EntityRepository.listVersionsWithOffset()` does re-sort results in Java, but that only fixes ordering *within* the page — with `LIMIT/OFFSET` pagination, the wrong set of versions is selected in the first place. For an entity with 15+ versions, requesting page 1 (limit=10) would return versions like 0.9, 0.8, …, 0.2, 0.11, 0.10 instead of the correct 1.5, 1.4, …, 0.10, 0.9.

   Suggested fix:
   Restore numeric ordering. If `versionNum` is now always populated,
   use `ORDER BY versionNum DESC, extension DESC`. Otherwise restore
   the original COALESCE-based ordering with @ConnectionAwareSqlQuery
   for MySQL/Postgres variants:
     MySQL:  ORDER BY COALESCE(versionNum, CAST(SUBSTRING_INDEX(extension, '.version.', -1) AS DOUBLE)) DESC
     Postgres: ORDER BY COALESCE(versionNum, split_part(extension, '.version.', 2)::DOUBLE PRECISION) DESC

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 2, 2026

Jest test Coverage

UI tests summary

Lines Statements Branches Functions
Coverage: 62%
62.24% (63861/102602) 42.67% (34387/80581) 45.55% (10175/22334)

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 2, 2026

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 2, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants