Skip to content

feat(condo): DOMA-13081 field type for store data externally#7412

Merged
AleX83Xpert merged 85 commits intomainfrom
feat/condo/doma-13081/field-type-for-store-data-externally-for-billingreceipt-raw
Apr 24, 2026
Merged

feat(condo): DOMA-13081 field type for store data externally#7412
AleX83Xpert merged 85 commits intomainfrom
feat/condo/doma-13081/field-type-for-store-data-externally-for-billingreceipt-raw

Conversation

@AleX83Xpert
Copy link
Copy Markdown
Member

@AleX83Xpert AleX83Xpert commented Mar 30, 2026

Code Review Guide - ExternalContent Field

🎯 Start Here

This PR adds a new ExternalContent field type that stores large data in external files instead of the database. It also includes a generic backfill script to migrate any existing inline JSON field to file storage.


📍 Review Priority Order

1. Security - Path Traversal Protection ⚠️ CRITICAL

File: packages/keystone/fieldsUtils/ExternalContent/validateFilePath.js

What to check:

  • Empty string and non-string validation at the start of validateFilePath() (for both basePath and filename)
  • Absolute path blocking with path.isAbsolute(filename) before any join
  • path.join() + path.normalize() on both the full path and the base path
  • Guard against normalized === normalizedBase (resolving to base dir itself)
  • Final check: normalized.startsWith(normalizedBase + path.sep) - returns normalized, not fullPath

Why it matters: Prevents attackers from reading arbitrary files from the server.


2. Race Condition Fix - DataLoader ⚠️ CRITICAL

File: packages/keystone/fieldsUtils/ExternalContent/FileContentLoader.js

What to check:

  • Constructor initializes three structures: cache (Map), pending (Map), and queue (Array)
  • load() checks cache first, then pending map before creating a new promise
  • New promise is stored in pending immediately after creation, keyed by cacheKey
  • promise.finally() moves result from pending to cache and cleans up pending
  • Batch scheduling uses _schedulingBatch flag to prevent a second timer from being set between the !this.batchTimer check and the timer assignment
  • batchDelayMs: 0 uses setImmediate instead of setTimeout for immediate execution

Why it matters: Without pending map, concurrent GraphQL resolver calls for the same file create duplicate queue entries and duplicate reads.


3. Backfill Script - Migration Safety ⚠️ CRITICAL

File: apps/rb/bin/backfillExternalContentField.js

What to check:

  • isFileMeta(fieldValue) check skips records that are already migrated (idempotent)
  • null/undefined raw values are skipped (falsy check is intentional - false/0/'' are valid JSON payloads)
  • Migration is delegated to serverUtils[model].update() which triggers resolveInput — the field's own save logic handles file creation and DB update atomically within Keystone's mutation pipeline
  • --dry-run flag logs what would happen without writing files or updating DB
  • --continue-on-error flag allows partial runs; failed record IDs are collected and printed at the end
  • Offset-based pagination (skip) — safe because processed records change their shape (become file-meta), so they won't appear in a re-run

Why it matters: Prevents re-migrating already-migrated records and provides a safe recovery path on partial failures.


4. Dual GraphQL Fields - field vs fieldResolved

File: packages/keystone/fields/ExternalContent/Implementation.js

What to check:

  • gqlOutputFields() returns two fields: ${path}: String (raw file-meta JSON for Admin UI) and ${path}Resolved: <format type> (deserialized content for API clients)
  • gqlOutputFieldResolvers() only registers a resolver for ${path}Resolved — the raw field is served directly from the DB column
  • resolveInput() receives context and passes context?.authedItem as user to adapter.publicUrl() — needed for adapters that sign URLs per-user
  • Returned object always includes _type: FILE_META_TYPE and meta: { format } so resolveExternalContentValue can deserialize without guessing the format

5. Admin UI - Field Component

File: packages/keystone/fields/ExternalContent/views/Field.js

What to check:

  • isReadOnly = field.isReadOnly !== false — reads directly from the controller instance, not from field.controller
  • initialEditValue stringify logic and useEffect both use typeof resolvedContent === 'string' ? resolvedContent : JSON.stringify(resolvedContent, null, 2) — consistent handling prevents [object Object] in textarea
  • useEffect dependency is [resolvedContent] only — triggers when async resolved content arrives after initial render

6. Input Validation

File: packages/keystone/fieldsUtils/ExternalContent/createExternalDataField.js

What to check:

  • adapter is required — throws immediately with clear message if missing
  • maxSizeBytes validated as a positive finite number
  • batchDelayMs validated as a non-negative finite number
  • isAdminUIReadOnly mapped to adminConfig.isReadOnly and not leaked into the returned field object
  • type: 'ExternalContent' is placed after ...otherProps to prevent the caller from overriding it

Why it matters: Fails fast with clear errors if misconfigured.

Summary by CodeRabbit

  • New Features

    • ExternalContent field: file-backed storage with adapters, configurable formats (json/xml/text), size limits, batching/DataLoader, safe path validation, admin UI editor/cell/controller, GraphQL raw + resolved virtuals.
  • Documentation

    • Detailed ExternalContent README covering usage, config, migration, security, and troubleshooting.
  • Tests

    • Extensive tests for processors, loaders, validation, serialization, resolution, batching, and error handling.
  • Chores

    • Submodule pointer updated; package exports and historical handling extended to include ExternalContent.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 30, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new ExternalContent Keystone field type with adapter-backed external storage, processors (json/xml/text), request-scoped batched loader, secure path validation, admin UI views/controllers, GraphQL raw + resolved outputs, utilities/tests/README, and a submodule pointer bump.

Changes

Cohort / File(s) Summary
Field Implementation
packages/keystone/fields/ExternalContent/index.js, packages/keystone/fields/ExternalContent/Implementation.js
New ExternalContent field module and implementation: adapter-backed file-meta persistence, GraphQL raw and ${path}Resolved virtual field, processor selection, input serialization, max-size enforcement, and afterChange cleanup (delete previous files).
Admin UI / Views
packages/keystone/fields/ExternalContent/views/Field.js, packages/keystone/fields/ExternalContent/views/Cell.js, packages/keystone/fields/ExternalContent/views/Controller.js, packages/keystone/fields/ExternalContent/views/utils.js
New controller and React components plus parsing utils: handle file-meta vs legacy inline content, read-only vs editable UI, query fragment selection, and safe download-link rendering.
Field Factory & Registry
packages/keystone/fieldsUtils/ExternalContent/createExternalDataField.js, packages/keystone/fields/index.js, packages/keystone/KSv5v6/v5/registerSchema.js
Factory to create validated ExternalContent field configs (adapter, format, maxSizeBytes, batchDelayMs), register the field export, and map ExternalContent in schema registration.
Loaders & Caching
packages/keystone/fieldsUtils/ExternalContent/FileContentLoader.js, packages/keystone/fieldsUtils/ExternalContent/getOrCreateLoader.js
Request-scoped FileContentLoader with batching/deduplication, adapter-aware local vs cloud reads, ENOENT/404 handling, and helper that caches loaders per-context per-adapter.
Processors & Serialization
packages/keystone/fieldsUtils/ExternalContent/defaultProcessors.js
DEFAULT_PROCESSORS for json, xml, and text including serialize/deserialize semantics, mimetypes, file extensions, and GraphQL type names.
Resolution & Validation Utils
packages/keystone/fieldsUtils/ExternalContent/resolveExternalContentValue.js, .../isFileMeta.js, .../validateFilePath.js, .../constants.js
Utilities to resolve/deserialized content (via loader or direct stream), validate file-meta structure, secure path validation, and FILE_META_TYPE constant.
Utilities Index & Exports
packages/keystone/fieldsUtils/ExternalContent/index.js, packages/keystone/fieldsUtils/index.js, packages/keystone/package.json
Aggregation entrypoints and package exports entry for ./fieldsUtils, re-exporting ExternalContent utilities.
Tests
packages/keystone/fields/ExternalContent/Implementation.spec.js, packages/keystone/fieldsUtils/ExternalContent/*spec.js
Extensive Jest coverage: implementation behavior, loader batching/error paths, processors, validators, path validation, factory validation, DataLoader semantics, size limits, and edge cases.
Docs
packages/keystone/fields/ExternalContent/README.md
Comprehensive README describing usage, storage format, GraphQL semantics, admin UI behavior, migration tooling, security, and examples.
Miscellaneous
apps/rb (git submodule), packages/keystone/plugins/historical.js
apps/rb submodule pointer updated; historical plugin mapping extended to treat ExternalContent as Text; lodash imports changed to single-entry requires.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant AdminUI as Admin UI
  participant Keystone as Keystone Server
  participant Loader as FileContentLoader
  participant Adapter as File Adapter
  participant Storage as Storage/HTTP

  AdminUI->>Keystone: Query <field> and <field>Resolved
  Keystone->>Loader: load(fileMeta) [context-scoped]
  Loader->>Adapter: read / generateUrl (filename, metadata)
  Adapter->>Storage: fetch/read file
  Storage-->>Adapter: bytes / 404 / error
  Adapter-->>Loader: Buffer | null | error
  Loader-->>Keystone: Buffer | null
  Keystone->>Keystone: processor.deserialize(Buffer -> value)
  Keystone-->>AdminUI: resolved content
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I hopped through bytes and metadatums bright,
I batched the reads beneath the moonlight,
JSON, XML, and plain-text cheer,
Paths guarded close — no traversal near,
A nibble of files, a carrot-coded delight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 63.64% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main feature being added: a new field type for storing data externally, directly aligned with the primary change throughout the PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/condo/doma-13081/field-type-for-store-data-externally-for-billingreceipt-raw

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@AleX83Xpert AleX83Xpert added the 🔬 WIP Not intended to be merged right now, it is a work in progress label Mar 30, 2026
@AleX83Xpert AleX83Xpert force-pushed the feat/condo/doma-13081/field-type-for-store-data-externally-for-billingreceipt-raw branch from 598b875 to aca2921 Compare April 1, 2026 09:17
@AleX83Xpert AleX83Xpert force-pushed the feat/condo/doma-13081/field-type-for-store-data-externally-for-billingreceipt-raw branch from f1a0bf9 to 5b4a31b Compare April 2, 2026 05:02
@AleX83Xpert AleX83Xpert force-pushed the feat/condo/doma-13081/field-type-for-store-data-externally-for-billingreceipt-raw branch from 5b4a31b to cdeadf1 Compare April 2, 2026 11:53
@AleX83Xpert AleX83Xpert force-pushed the feat/condo/doma-13081/field-type-for-store-data-externally-for-billingreceipt-raw branch from cdeadf1 to d7bbce1 Compare April 3, 2026 04:18
@AleX83Xpert AleX83Xpert force-pushed the feat/condo/doma-13081/field-type-for-store-data-externally-for-billingreceipt-raw branch from 506c858 to c82ddcf Compare April 3, 2026 12:55
@AleX83Xpert AleX83Xpert force-pushed the feat/condo/doma-13081/field-type-for-store-data-externally-for-billingreceipt-raw branch from 5894686 to fc54960 Compare April 7, 2026 06:58
…typeof for undefined comparisons in ExternalContent
…t to top-level in ExternalContent Implementation
…nConfig in ExternalContentController getQueryFragment
…entLoader by wrapping filename in data object
…king for cloud adapter capabilities instead of local adapter properties
…t null checks in ExternalContentController serialize and deserialize methods
…ENT_FIELD_TYPE_META and change file-meta detection logic
… URL generation and update file metadata with item id in afterChange
…o clarify cloud storage URL handling and access control, replace <oops>Cloud reference with generic cloud storage term
@sonarqubecloud
Copy link
Copy Markdown

// Update file metadata with actual item id for new files
// This allows OBSFilesMiddleware to verify access control
const prevFilename = isFileMeta(prevValue) ? prevValue.filename : null
const nextFilename = isFileMeta(nextValue) ? nextValue.filename : null
Copy link
Copy Markdown
Contributor

@YEgorLu YEgorLu Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1) Above filename is being made with random ID every time. Does that mean that nextFilename === prefFilename situation impossible?

2) If filename changes, we do not delete old file. But if filename did not change, we delete old file.... I don't really get it. We delete same file after it's update or something?

It seems I was mistaken. For some reason I read all "if"s backwards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🐘 BIG No so easy to review changes up to 1300 lines of code 📦 Packages ✋🙂 Review please Comments are resolved, take a look, please

Development

Successfully merging this pull request may close these issues.

5 participants