Skip to content

Extend CacheRuntime phase 3: add dataload/dataprocess support#5798

Open
xliuqq wants to merge 5 commits intofluid-cloudnative:masterfrom
xliuqq:cache_op
Open

Extend CacheRuntime phase 3: add dataload/dataprocess support#5798
xliuqq wants to merge 5 commits intofluid-cloudnative:masterfrom
xliuqq:cache_op

Conversation

@xliuqq
Copy link
Copy Markdown
Collaborator

@xliuqq xliuqq commented Apr 20, 2026

Ⅰ. Describe what this PR does

add dataload/dataprocess support, use curvine as an example.

Ⅱ. Does this pull request fix one issue?

a part of #5412

Ⅲ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.

added e2e test for curvine dataload.

Ⅳ. Describe how to verify it

as a part of github action test.

Ⅴ. Special notes for reviews

Signed-off-by: xliuqq <xlzq1992@gmail.com>

add imagePullPolicy

Signed-off-by: xliuqq <xlzq1992@gmail.com>

add test

Signed-off-by: xliuqq <xlzq1992@gmail.com>
@fluid-e2e-bot
Copy link
Copy Markdown

fluid-e2e-bot Bot commented Apr 20, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign zwwhdls for approval by writing /assign @zwwhdls in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@xliuqq xliuqq requested review from Syspretor and cheyang April 20, 2026 07:20
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements data operation support for the CacheRuntime, including DataLoad, DataBackup, DataMigrate, and DataProcess. It introduces new API fields in CacheRuntimeClass, refactors the operation reconciliation logic for better reusability, and adds a dedicated Helm chart for data loading. Feedback focuses on correcting the DataOperationSpec enum to include DataProcess, generalizing the CronJob template to remove Alluxio-specific scripts, and fixing a type mismatch in the Helm values. Additionally, several copy-paste errors referencing JuiceFSRuntime were found, and the image selection priority logic needs to be aligned with its documentation.

Comment thread api/v1alpha1/cacheruntimeclass_types.go Outdated
Comment thread charts/fluid-dataloader/cache/templates/cronjob.yaml Outdated
Comment thread charts/fluid-dataloader/cache/values.yaml Outdated
Comment thread pkg/ddc/cache/engine/data_operation.go Outdated
Comment thread pkg/ddc/cache/engine/databackup.go Outdated
Comment thread pkg/ddc/cache/engine/dataload.go Outdated
Comment thread pkg/ddc/cache/engine/datamigrate.go Outdated
Comment thread pkg/ddc/cache/engine/image.go Outdated
Comment thread pkg/ddc/cache/engine/dataload.go Dismissed
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 20, 2026

Codecov Report

❌ Patch coverage is 6.80851% with 219 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.17%. Comparing base (041354d) to head (a28d00f).
⚠️ Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
pkg/ddc/cache/engine/dataload.go 0.00% 116 Missing ⚠️
pkg/ddc/cache/engine/data_operation.go 0.00% 48 Missing ⚠️
pkg/ddc/base/operation.go 15.78% 16 Missing ⚠️
pkg/ddc/cache/engine/image.go 0.00% 14 Missing ⚠️
pkg/ddc/cache/engine/dataprocess.go 0.00% 11 Missing ⚠️
pkg/ddc/cache/engine/databackup.go 0.00% 6 Missing ⚠️
pkg/ddc/cache/engine/datamigrate.go 0.00% 6 Missing ⚠️
pkg/ddc/base/operation_lock.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5798      +/-   ##
==========================================
- Coverage   58.46%   58.17%   -0.29%     
==========================================
  Files         473      478       +5     
  Lines       32222    32485     +263     
==========================================
+ Hits        18839    18899      +60     
- Misses      11836    12042     +206     
+ Partials     1547     1544       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

xliuqq added 2 commits April 20, 2026 20:53
Signed-off-by: xliuqq <xlzq1992@gmail.com>
Signed-off-by: xliuqq <xlzq1992@gmail.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the CacheRuntime (CacheEngine) data lifecycle support by adding DataLoad/DataProcess integration (with Curvine as an e2e example), introducing a CacheRuntimeClass dataOperationSpecs API surface, and wiring a cache-specific dataloader Helm chart.

Changes:

  • Add CacheEngine support for generating Helm values for DataLoad/DataProcess (and explicit “not supported” stubs for DataBackup/DataMigrate).
  • Introduce dataOperationSpecs in CacheRuntimeClass + CRD/OpenAPI updates, plus RBAC adjustments needed to read CacheRuntimeClass.
  • Add a new fluid-dataloader cache Helm chart and Curvine GHA e2e updates to exercise DataLoad.

Reviewed changes

Copilot reviewed 33 out of 33 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
tools/diagnose-fluid-curvine.sh Capture additional pod diagnostics during e2e troubleshooting.
test/gha-e2e/curvine/write_job.yaml Pin pull behavior for busybox in e2e.
test/gha-e2e/curvine/test.sh Add DataLoad creation + completion wait to Curvine e2e flow.
test/gha-e2e/curvine/read_job.yaml Pin pull behavior for busybox in e2e.
test/gha-e2e/curvine/minio_create_bucket.yaml Pin pull behavior for minio/mc in e2e.
test/gha-e2e/curvine/minio.yaml Pin pull behavior for minio in e2e.
test/gha-e2e/curvine/dataset.yaml Update comment/example mount command.
test/gha-e2e/curvine/dataload.yaml New DataLoad CR used by Curvine e2e.
test/gha-e2e/curvine/cacheruntimeclass.yaml Add dataOperationSpecs example for Curvine DataLoad.
pkg/ddc/cache/engine/image.go Derive data-operation image from runtime/runtimeclass worker image.
pkg/ddc/cache/engine/error.go Remove unused “not implemented” helper.
pkg/ddc/cache/engine/engine.go Import cleanup/reorder.
pkg/ddc/cache/engine/dataprocess.go Add CacheEngine DataProcess values generation hook.
pkg/ddc/cache/engine/datamigrate.go Add CacheEngine DataMigrate “not supported” stub.
pkg/ddc/cache/engine/dataload.go Add CacheEngine DataLoad values generation + runtimeclass op spec handling.
pkg/ddc/cache/engine/databackup.go Add CacheEngine DataBackup “not supported” stub.
pkg/ddc/cache/engine/data_operation.go Route operations through shared reconciler + implement CheckRuntimeReady and value-file dispatch.
pkg/ddc/base/template_engine_operation.go Add TemplateEngine wrapper for shared operation reconciler.
pkg/ddc/base/operation_lock.go Generalize dataset operation lock to accept a new OperationEngine interface.
pkg/ddc/base/operation.go Refactor shared operation reconcile logic into EngineOperationReconciler.
pkg/dataload/value.go Extend DataLoadInfo with env/command/args for cache-engine-driven jobs.
pkg/controllers/operation_controller.go Add CacheRuntime shortcut mapping to cache engine implementation.
config/crd/bases/data.fluid.io_cacheruntimeclasses.yaml CRD schema: add dataOperationSpecs.
charts/fluid/fluid/templates/role/dataset/rbac.yaml RBAC: allow reading cacheruntimeclasses.
charts/fluid/fluid/templates/role/cache/rbac.yaml RBAC tweak for cacheruntimeclasses/status.
charts/fluid/fluid/crds/data.fluid.io_cacheruntimeclasses.yaml Helm CRD: add dataOperationSpecs.
charts/fluid-dataloader/cache/values.yaml New cache dataloader chart default values.
charts/fluid-dataloader/cache/templates/job.yaml New Job template for cache DataLoad (once policy).
charts/fluid-dataloader/cache/templates/cronjob.yaml New CronJob template for cache DataLoad (cron policy).
charts/fluid-dataloader/cache/Chart.yaml New cache dataloader Helm chart metadata/deps.
api/v1alpha1/openapi_generated.go OpenAPI generation updates for new CacheRuntimeClass field/type.
api/v1alpha1/cacheruntimeclass_types.go API types: add DataOperationSpecs and DataOperationSpec.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/ddc/cache/engine/datamigrate.go Outdated
Comment thread pkg/ddc/cache/engine/databackup.go Outdated
Comment thread pkg/ddc/cache/engine/dataload.go
Comment thread pkg/ddc/cache/engine/dataload.go
Comment thread pkg/ddc/cache/engine/dataload.go
Comment thread pkg/ddc/base/operation.go
Comment thread charts/fluid-dataloader/cache/values.yaml Outdated
Comment thread pkg/ddc/cache/engine/dataprocess.go Outdated
Comment thread pkg/ddc/cache/engine/dataload.go Outdated
Comment thread charts/fluid-dataloader/cache/values.yaml Outdated
Comment thread pkg/ddc/cache/engine/dataload.go
Comment thread pkg/ddc/base/operation.go Outdated
Comment thread pkg/controllers/operation_controller.go
Comment thread pkg/ddc/cache/engine/dataload.go Outdated
Comment thread pkg/ddc/cache/engine/image.go Outdated
Comment thread charts/fluid-dataloader/cache/templates/cronjob.yaml
Image string `json:"image,omitempty"`

// Command for data operation Pod container
Command []string `json:"command,omitempty"`
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataOperationSpec.Command has no validation — if both command and args are empty, the pod launches with the container's default entrypoint, which may produce unexpected behavior for cache engines. Should the API require at least one of command/args to be non-empty when the operation is declared?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check in genDataLoadValue.

// SetDataOperationInTargetDataset set status of target dataset to mark the data operation being performed.
func SetDataOperationInTargetDataset(ctx cruntime.ReconcileRequestContext, operation dataoperation.OperationInterface, engine *TemplateEngine) error {
func SetDataOperationInTargetDataset(ctx cruntime.ReconcileRequestContext, operation dataoperation.OperationInterface, engine OperationEngine) error {
targetDataset := ctx.Dataset
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODO comment asks "can we remove this check? dataset binds runtime only when runtime is ready!" — this suggests CheckRuntimeReady may be redundant. For CacheEngine, the check requires both master and worker ready. If worker is still initializing, DataLoad stays Pending indefinitely. Is this the intended behavior, or should the check be relaxed?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data operation reconciler gets runtime by dataset.Status.Runtimes, dataset binds runtime when cache runtime is ready in the engine::SetUp . Is this check used when workers re-construct after SetUp ?

Comment thread pkg/ddc/cache/engine/dataload.go Outdated
Comment thread pkg/ddc/base/operation.go Outdated
Comment thread pkg/ddc/cache/engine/databackup.go
Copy link
Copy Markdown
Collaborator

@cheyang cheyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR — the DataOperationSpecs declarative pattern on CacheRuntimeClass is a good design, and the EngineOperationReconciler extraction reduces duplication nicely.

Three items that should be fixed before merge:

  1. Nil-pointer panic in dataload.go L83 — runtimeClass.Topology.Worker is dereferenced without a nil check (image.go does check). This will panic if Worker is not defined.
  2. envs type mismatch in values.yamlenvs: {} (map) conflicts with []Env (slice) in Go and Helm templates. Should be envs: [].
  3. Missing license headers on all new files (dataload.go, databackup.go, datamigrate.go, dataprocess.go, image.go, template_engine_operation.go).

Non-blocking suggestions: mixed client sources in processTTL, duplicate DataOperationSpecs iteration, comment placement in image.go, env indentation inconsistency in cronjob.yaml, CClient naming, unused imports in stub files.

Patch test coverage is very low (7%) — would recommend at least unit tests for getDataOperationImage priority logic and the supportDataLoad check path.

Signed-off-by: xliuqq <xlzq1992@gmail.com>
@xliuqq
Copy link
Copy Markdown
Collaborator Author

xliuqq commented Apr 21, 2026

cache runtime related tests will be added in next pr.

Signed-off-by: xliuqq <xlzq1992@gmail.com>
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Apr 21, 2026

Quality Gate Passed Quality Gate passed

Issues
11 New issues
1 Accepted issue

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

@xliuqq xliuqq requested a review from cheyang April 22, 2026 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants