Conversation
|
The Java checkstyle failed. Please run You can install the pre-commit hooks with |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds “Context Center” Drive + Knowledge Center support by introducing Folder/ContextFile entities, object storage-backed assets, text extraction, and search/indexing enhancements (including knowledge-page hierarchy).
Changes:
- Introduces new JSON schemas + create request schemas for Folder, ContextFile, ContextFileContent, Asset, and Article.
- Adds search indexes/settings for page/folder/contextFile and implements page-hierarchy listing in both Elasticsearch & OpenSearch clients.
- Adds object storage abstraction (S3/Azure/InMemory/NoOp), async delete queueing, extraction service + extensive unit/integration test coverage.
Reviewed changes
Copilot reviewed 94 out of 364 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| openmetadata-spec/src/main/resources/json/schema/entity/data/folder.json | Defines Folder entity schema for Drive. |
| openmetadata-spec/src/main/resources/json/schema/entity/data/contextFileContent.json | Defines ContextFileContent snapshot entity schema. |
| openmetadata-spec/src/main/resources/json/schema/entity/data/contextFile.json | Defines ContextFile entity schema for Drive. |
| openmetadata-spec/src/main/resources/json/schema/entity/data/article.json | Adds Article schema for knowledge pages. |
| openmetadata-spec/src/main/resources/json/schema/attachments/asset.json | Adds Asset schema for uploaded blobs. |
| openmetadata-spec/src/main/resources/json/schema/api/data/createPage.json | Extends page creation schema (article/quicklink union). |
| openmetadata-spec/src/main/resources/json/schema/api/data/createFolder.json | Adds Folder create request schema. |
| openmetadata-spec/src/main/resources/json/schema/api/data/createContextFile.json | Adds ContextFile create request schema. |
| openmetadata-spec/src/main/resources/json/schema/api/attachments/createAsset.json | Adds Asset create request schema. |
| openmetadata-spec/src/main/resources/elasticsearch/indexMapping.json | Registers new search indices for page/folder/contextFile. |
| openmetadata-spec/src/main/antlr4/org/openmetadata/schema/EntityLink.g4 | Adds entity link grammar types for new entities. |
| openmetadata-service/src/test/java/org/openmetadata/service/search/indexes/PageIndexTest.java | Tests Page search document fields (fqnDepth/deleted). |
| openmetadata-service/src/test/java/org/openmetadata/service/search/indexes/FolderIndexTest.java | Tests Folder search index doc fields. |
| openmetadata-service/src/test/java/org/openmetadata/service/search/indexes/ContextFileIndexTest.java | Tests ContextFile search index doc fields. |
| openmetadata-service/src/test/java/org/openmetadata/service/resources/drive/DriveMapperTest.java | Tests Drive mappers (parent/folder references). |
| openmetadata-service/src/test/java/org/openmetadata/service/resources/drive/ContextFileUploadSupportTest.java | Tests upload buffering, checksum, asset/content building. |
| openmetadata-service/src/test/java/org/openmetadata/service/resources/drive/ContextFileResourceTest.java | Tests filename sanitization + expiry clamping. |
| openmetadata-service/src/test/java/org/openmetadata/service/drive/ContextFileTextExtractorTest.java | Tests text extraction for PDF/XLSX/Text/Image OCR modes. |
| openmetadata-service/src/test/java/org/openmetadata/service/drive/ContextFileExtractionServiceTest.java | Tests extraction service success/failure paths and queue rejection handling. |
| openmetadata-service/src/test/java/org/openmetadata/service/context/DefaultContextEntityPromptLoaderTest.java | Tests prompt loader preferring canonical snapshot text. |
| openmetadata-service/src/test/java/org/openmetadata/service/context/ContextEntityPromptServiceTest.java | Tests prompt assembly, budget truncation, and query relevance chunking. |
| openmetadata-service/src/main/resources/json/data/tags/knowledgeCenterTags.json | Adds KnowledgeCenter classification + tags. |
| openmetadata-service/src/main/resources/json/data/settings/searchSettings.json | Adds search settings for contextFile/folder/page. |
| openmetadata-service/src/main/java/org/openmetadata/service/util/SearchUtils.java | Adds mapping helper from search hit -> PageHierarchy. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchClient.java | Implements knowledge page hierarchy listing via OpenSearch. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/PageIndex.java | Adds Page search index representation. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/FolderIndex.java | Adds Folder search index representation. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/ContextFileIndex.java | Adds ContextFile search index representation. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchClient.java | Implements knowledge page hierarchy listing via Elasticsearch. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/SearchIndexFactory.java | Registers Folder/ContextFile/Page index builders. |
| openmetadata-service/src/main/java/org/openmetadata/service/resources/knowledge/KnowledgePageMapper.java | Adds mapper for CreatePage -> Page defaults/fields. |
| openmetadata-service/src/main/java/org/openmetadata/service/resources/drive/FolderResource.java | Adds REST resource for folder CRUD + contents endpoint. |
| openmetadata-service/src/main/java/org/openmetadata/service/resources/drive/FolderMapper.java | Adds mapper for CreateFolder -> Folder. |
| openmetadata-service/src/main/java/org/openmetadata/service/resources/drive/ContextFileUploadSupport.java | Adds upload utility: size cap, checksum, temp buffering, asset/content factories. |
| openmetadata-service/src/main/java/org/openmetadata/service/resources/drive/ContextFileMapper.java | Adds mapper for CreateContextFile -> ContextFile. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/FolderRepository.java | Adds Folder repository + parent/children relationship management. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DaoListFilter.java | Adds DAO list-filter extension with pageType condition. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ContextFileRepository.java | Adds ContextFile repository + cleanup of assets/content snapshots. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ContextFileContentRepository.java | Adds ContextFileContent repository + list-by-file-id query. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java | Adds DAOs for folder/contextFile/content/page/assets. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/AssetRepository.java | Adds Asset repository for asset_entity persistence. |
| openmetadata-service/src/main/java/org/openmetadata/service/drive/ContextFileExtractionService.java | Adds async extraction orchestration and status updates. |
| openmetadata-service/src/main/java/org/openmetadata/service/context/TokenCounter.java | Adds token counting utility for prompt budgets. |
| openmetadata-service/src/main/java/org/openmetadata/service/context/ResolvedContextEntity.java | Adds prompt-ready resolved entity record. |
| openmetadata-service/src/main/java/org/openmetadata/service/context/DefaultContextEntityPromptLoader.java | Loads pages/files into prompt sections with authorization checks. |
| openmetadata-service/src/main/java/org/openmetadata/service/context/ContextPromptInjectionResult.java | Adds result record for prompt injection. |
| openmetadata-service/src/main/java/org/openmetadata/service/context/ContextEntityPromptService.java | Builds structured prompt context with budgets and relevance chunking. |
| openmetadata-service/src/main/java/org/openmetadata/service/context/ContextEntityPromptLoader.java | Adds loader interface for prompt resolution. |
| openmetadata-service/src/main/java/org/openmetadata/service/attachments/S3AssetService.java | Implements S3-backed asset IO + signed URL generation. |
| openmetadata-service/src/main/java/org/openmetadata/service/attachments/QueuedDeleteAssetService.java | Wraps asset delete with queued deletes + blocking wait. |
| openmetadata-service/src/main/java/org/openmetadata/service/attachments/ObjectDeleteQueueService.java | Adds bounded delete queue service with capacity accounting. |
| openmetadata-service/src/main/java/org/openmetadata/service/attachments/NoOpAssetService.java | Adds no-op asset provider for disabled storage. |
| openmetadata-service/src/main/java/org/openmetadata/service/attachments/InMemoryAssetService.java | Adds in-memory asset provider for local testing. |
| openmetadata-service/src/main/java/org/openmetadata/service/attachments/AzureAssetService.java | Implements Azure Blob-backed asset IO + SAS URLs. |
| openmetadata-service/src/main/java/org/openmetadata/service/attachments/AssetServiceFactory.java | Factory for selecting provider + wrapping with delete queue. |
| openmetadata-service/src/main/java/org/openmetadata/service/attachments/AssetService.java | Defines AssetService interface additions. |
| openmetadata-service/src/main/java/org/openmetadata/service/Entity.java | Registers new entity type constants. |
| openmetadata-service/pom.xml | Adds dependencies for extraction (PDFBox/POI/Tika) + Azure blob. |
| openmetadata-sdk/src/main/java/org/openmetadata/sdk/test/util/TestNamespaceExtension.java | JUnit extension for per-test namespace. |
| openmetadata-sdk/src/main/java/org/openmetadata/sdk/test/util/TestNamespace.java | Utility for unique names across IT runs. |
| openmetadata-sdk/src/main/java/org/openmetadata/sdk/test/util/SdkClients.java | JWT-backed SDK clients for IT roles/users. |
| openmetadata-sdk/src/main/java/org/openmetadata/sdk/test/util/RestClient.java | Low-level REST test client for endpoints not in SDK. |
| openmetadata-sdk/src/main/java/org/openmetadata/sdk/test/auth/JwtAuthProvider.java | Issues RSA JWTs for integration tests. |
| openmetadata-sdk/pom.xml | Adds dependencies to support SDK test utilities compilation. |
| openmetadata-integration-tests/src/test/resources/openmetadata-secure-test.yaml | Enables object storage for IT and sets S3 config placeholders. |
| openmetadata-integration-tests/src/test/resources/drive/sample-pricing.xlsx | Adds spreadsheet fixture (currently stub content). |
| openmetadata-integration-tests/src/test/resources/drive/sample-notes.txt | Adds text upload fixture. |
| openmetadata-integration-tests/src/test/resources/drive/sample-data.csv | Adds CSV upload fixture. |
| openmetadata-integration-tests/src/test/java/org/openmetadata/it/knowledge/KnowledgeCenterIT.java | Adds IT coverage for page related entities/domains/data products. |
| openmetadata-integration-tests/src/test/java/org/openmetadata/it/drive/DriveTestUsers.java | Adds helper to create drive test users. |
| openmetadata-integration-tests/src/test/java/org/openmetadata/it/bootstrap/TestSuiteBootstrap.java | Starts MinIO testcontainer when object storage uses S3. |
| openmetadata-integration-tests/pom.xml | Pins commons-compress for POI compatibility in tests. |
| bootstrap/sql/migrations/native/2.0.0/postgres/schemaChanges.sql | Adds DB tables for knowledge_center/drive_folder/context_file/assets/content. |
| bootstrap/sql/migrations/native/2.0.0/mysql/schemaChanges.sql | Adds DB tables for knowledge_center/drive_folder/context_file/assets/content. |
|
The Java checkstyle failed. Please run You can install the pre-commit hooks with |
- KnowledgePageRepository: null-safe pageType in getHierarchyWithSearch and getHierarchyWithSearchForActivePage so the /search/hierarchy endpoint no longer NPEs when the pageType query param is omitted. The ES/OS client helpers already skip the pageType term when the value is null or empty, so this is a pure null-guard. - ContextFileResource.uploadFile: when a failure happens after the ContextFileContent row is created (e.g. inside extractionService.submit), the cleanup path now hard-deletes that content row so the DB is not left with an orphaned record. - ContextFileResource: replace the raw Content-Disposition string with a buildContentDisposition helper that emits both the legacy quoted filename= and the RFC 5987 filename*=UTF-8'' parameter with percent-encoded bytes, so international filenames round-trip while staying header-injection safe. sanitizeFileName also falls back to "download" on null/blank input. - ContextFileResourceTest: new cases for sanitizeFileName null/blank fallbacks and for buildContentDisposition ASCII/unicode/space/injection behaviour (18 tests, all passing).
|
The Java checkstyle failed. Please run You can install the pre-commit hooks with |
- AssetRepository.getByFqnPrefix: swap arguments so (assetType, fqnPrefix) matches the DAO signature — previous ordering always missed the index. - FolderResource / ContextFileResource getEntitySpecificOperations: return List.of() instead of null so callers iterating the returned list cannot NPE. - SearchUtils.getPageHierarchy: replace UUID.fromString with a parseUuid helper that returns null for missing/malformed values and logs a warning instead of failing the whole hierarchy response. - DaoListFilter: qualify the pageType column with the caller-provided tableName, rename getArticleCondition to getPageTypeCondition (legacy no-arg method kept as @deprecated wrapper for compatibility). - Elastic/OpenSearch client processPageHierarchyHits: replace the per-hit getChildrenCountForPage search (N+1) with a single pass over the batch that derives childrenCount from pages whose parent is in the same result set. Also drops the now-unused helper and its throws clause. - openmetadata-sdk/pom.xml: mark JWT, JAX-RS client, Apache HttpClient, jakarta.json, parsson, and JUnit Jupiter as <optional>true</optional> so they don't leak into SDK consumers that only use the core client. - InMemoryAssetService: use the shared AsyncService executor for upload /read/delete instead of the JVM common ForkJoinPool. - sample-pricing.xlsx: replace the plain-text placeholder with a real minimal XLSX workbook so detection-based and extraction-based code paths see a valid Microsoft Excel 2007+ file.
|
The Java checkstyle failed. Please run You can install the pre-commit hooks with |
Follow-up to b8458e2. The previous fix derived childrenCount from pages whose parent appeared in the same batch — that worked for listPageHierarchyForActivePage (which fetches all depths) but always returned 0 on the plain listPageHierarchy path (which only fetches one depth), so top-level listings lost the count semantically. Replace with a single `filters` aggregation keyed by page id: each named bucket matches descendants via a fullyQualifiedName prefix query against the page's FQN. That gives accurate direct-descendant counts for every returned page in one aggregation round-trip, still O(1) additional search requests regardless of batch size.
|
The Java checkstyle failed. Please run You can install the pre-commit hooks with |
Fixes SearchSettingsHandlerTest.testEveryAssetTypeHasCorrespondingAllowedFields. searchSettings.json already had assetTypeConfigurations for contextFile, folder, and page but no matching allowedFields entries, so the test that asserts every assetType has a corresponding allowedFields block failed with 'Asset type contextFile has no corresponding allowedFields entry'. Adds the three missing blocks with the fields that each index actually exposes — name / displayName (with .keyword and .ngram variants), description, fqn, fqnParts, tags/tier/domains/dataProducts, plus entity-specific fields (fileType/contentType/extractedText for contextFile, parent.displayName for folder/page, pageType for page).
- ES/OS populateChildrenCounts: add fqnDepth == parentDepth + 1 to the per-page filter so childrenCount is direct children only, matching the field name and the UI's isLeaf check semantics. Previously matched all descendants. - ES/OS buildPageNestedSearchHierarchy: filter out hits with a null id before Collectors.toMap, which would otherwise NPE when SearchUtils drops a malformed UUID. - SearchUtils.getPageHierarchy: wrap PageType.fromValue in a parsePageType helper that logs and returns null on unknown values, so a single bad hit can no longer break the whole hierarchy response. - TestSuiteBootstrap.setupMinIO: pin minio/minio to RELEASE.2024-01-16T16-07-38Z instead of :latest so a newly-published image cannot break integration tests without a code change. - createContextFile.json: rewrite the assetId description to be provider agnostic (S3 / Azure Blob / in-memory / no-op) and flag it as the legacy path, preferring headContentId / ContextFileContent.
|
The Java checkstyle failed. Please run You can install the pre-commit hooks with |
Carried over from the latest AttachmentResource review. Three issues: 1. Content-Disposition header injection (security) — downloadAsset() built the header by direct string interpolation of asset.getFileName(). A filename containing double-quotes or CRLF could inject arbitrary HTTP headers. ContextFileResource already has a sanitize + RFC-5987 encode helper; rather than duplicate it, promote ContextFileUploadSupport.sanitizeFileName / buildContentDisposition to public, delete the duplicates from ContextFileResource (now delegators), and reuse the shared helpers from AttachmentResource. 2. Unbounded upload buffering (performance / DoS) — createAssetFromUpload read the full multipart body into a byte[] via IOUtils.toByteArray before checking against MAX_FILE_SIZE. An attacker could send an arbitrarily large body and exhaust heap before the validation ran. Replace with ContextFileUploadSupport.bufferUpload(), which streams to a bounded temp file and throws MaxFileSizeExceededException the moment the configured limit is passed; translate that into the same AttachmentException size-validation error the previous code raised. Promoted BufferedUpload and MaxFileSizeExceededException to public so the attachments package can consume them. 3. Startup NPE when objectStorage is null (bug) — initialize() called config.getObjectStorage().getMaxFileSize() without a null guard, so a deployment that doesn't configure object storage would NPE on server start. Added the same guard ContextFileResource.initialize() already uses, gave MAX_FILE_SIZE a safe 5 MiB default, and also null-guarded the S3-configuration branch of the CDN URL lookup so a pure-Azure or pure-NoOp setup doesn't fall off the end of the ternary. Ran mvn spotless:apply — picks up formatting-only changes in CollectionDAO.java and FolderIndex.java as a side effect of the shared helper additions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI's UI Checkstyle workflow has three per-area jobs (lint-src, lint-playwright, lint-core-components) that reformat the files changed in the PR and fail if the reformat produces a diff. CLAUDE.md and AGENTS.md didn't previously document this flow, so re-running the fix was a guessing game — the two lint-core-components and lint-playwright failures on this branch came from stale import order left over from the main→context_center merge. This commit: - Adds a dedicated invocable skill at .claude/skills/ui-checkstyle/SKILL.md (Claude Code harness) and a mirror at .agents/skills/ui-checkstyle/SKILL.md (Codex-style agents). Both describe the exact three-command sequence CI runs — organize-imports-cli → eslint --fix → prettier --write — the per-area file scoping, the `--check` dry-run mode, and the rule that organize-imports must run BEFORE prettier (otherwise the indentation / trailing-comma round-trip leaves a dirty diff). - Promotes the existing one-liner in CLAUDE.md and AGENTS.md to an explicit "run before finishing any UI task" instruction that points at the skill. - Fixes two residual import-order drifts (KnowledgePagesHierarchy.tsx, EntityUtilClassBase.ts) surfaced by running the skill's sequence locally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ESLint --fix inserted a blank line between the KNOWLEDGE_PAGE guard and the fallback return in getEntityByFqn. Committing the formatted version. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… parsing
The previous polling search used the namespaced unique name as a free-text
q= argument. The namespace prefix contains '-' which the ES 9.x query_string
parser treats as a NOT operator, producing a deterministic 500 across the
full 30s polling window even when the document was indexed.
Switch to the direct get-by-id endpoint (/v1/search/get/{index}/doc/{id}),
which performs a real-time ES GET with no query_string parsing and no
analyzer involvement — the most reliable signal that the document was
indexed. Bump the timeout to 60s and capture the response body on any
non-200 so future regressions surface the real ES error.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Copilot <copilot@github.com>
This reverts commit f0cca5f.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
KnowledgeCenterLayout was importing i18n directly from LocalUtil, but the global setupTests mock for that module only exposes t/on. Switch to the useTranslation() hook so it picks up the react-i18next mock that already provides i18n.dir(), matching how LeftSidebar and RichTextEditor use the direction. EntityRightPanelClassBase.getKnowLedgeArticlesWidget now returns the KnowledgePages component instead of null. Update the corresponding test case to assert the new return value. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code Review
|
| Compact |
|
Was this helpful? React with 👍 / 👎 | Gitar
|
|



Summary
Brings the Context Center feature into OpenMetadata core:
asset_entityblobs, async delete queue, signed-URL download for S3/Azure providers, and PDF / Office / image OCR / spreadsheet / text extraction via PDFBox, POI, and Tika.entity/data/{folder,contextFile,contextFileContent,page,article,quickLink,pageHierarchy}.json,attachments/asset.json), Java repositories + REST resources, Elasticsearch/OpenSearch index mappings for all supported locales,SearchSettingsasset-type configurations, and native2.0.0SQL migrations that are idempotent against existing customer databases.All moved code was previously shipped in Collate and is being upstreamed so every OpenMetadata user gets Drive + Knowledge Center in 2.0.0. Collate continues to work because it bundles this submodule; the Collate-side cleanup removes the now-duplicated code once this PR merges.
Major pieces
openmetadata-spec/src/main/resources/json/schema/entity/data/andapi/data/(+attachments/for the Asset blob).KnowledgePageRepository,FolderRepository,ContextFileRepository,ContextFileContentRepository,AssetRepository,DaoListFilter,ContextFileExtractionService,ContextFileTextExtractor,AssetService+ S3/Azure/InMemory/NoOp providers,QueuedDeleteAssetService,ContextEntityPromptService+ loaders; REST resources underresources/knowledgeandresources/drive(includingContextFileResource.uploadFilewith RFC 5987 Content-Disposition handling).PageIndex,FolderIndex,ContextFileIndex, ES/OSlistPageHierarchy+listPageHierarchyForActivePage(single-aggregation children count).KnowledgePageswidget wired viaEntityRightPanelClassBaseandCommonWidgetClassBaseso Topic/Table pages surface linked articles.KNOWLEDGE_ARTICLEadded to default layouts with theisLeaf: childrenCount === 0contract preserved.bootstrap/sql/migrations/native/2.0.0/{mysql,postgres}/schemaChanges.sqlcreatesknowledge_center,drive_folder,context_file,context_file_content,asset_entity,search_index_job,search_index_retry_queue; addstag_usage.metadata(JSON) andaudit_log_event.search_textcolumns. Every statement usesIF NOT EXISTS/ prepared-statement guards so it is a no-op on databases that already have these tables from prior Collate migrations.Type of change
Test plan
mvn -pl openmetadata-service test— 866 unit tests, including newContextFileResourceTest,ContextFileUploadSupportTest,ContextFileTextExtractorTest,ContextFileExtractionServiceTest,DriveMapperTest,PageIndexTest,FolderIndexTest,ContextFileIndexTest, and theDefaultContextEntityPromptLoader/ContextEntityPromptServicetests.mvn verify -pl openmetadata-integration-tests— 46 new IT tests covering folder CRUD, file upload against MinIO (auto-started byTestSuiteBootstrap), text extraction, recursive folder delete, and knowledge page hierarchy; existing-customer migration path exercised.Explore Filter by Knowledge Centerpasses end-to-end against the rebuilt UI + reindexed search cluster. The multi-step "Knowledge Center page" test is still in the process of being stabilised against the new backend semantics.knowledge_center,context_file,context_file_content,drive_folder,asset_entity,search_index_job,search_index_retry_queue,tag_usage.metadata,audit_log_event.search_textall present after migration.Checklist
2.0.0migration scripts (idempotent).7e94bd48a7,b8458e2868, and4d8c434412.Summary by Gitar
KnowledgeCenterLayoutto utilizereact-i18nextfor better internationalization support.EntityRightPanelClassBase.testto verify thatgetKnowLedgeArticlesWidgetcorrectly returns theKnowledgePagescomponent.This will update automatically on new commits.