Extend CacheRuntime phase 3: add dataload/dataprocess support#5798
Extend CacheRuntime phase 3: add dataload/dataprocess support#5798xliuqq wants to merge 5 commits intofluid-cloudnative:masterfrom
Conversation
Signed-off-by: xliuqq <xlzq1992@gmail.com> add imagePullPolicy Signed-off-by: xliuqq <xlzq1992@gmail.com> add test Signed-off-by: xliuqq <xlzq1992@gmail.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Code Review
This pull request implements data operation support for the CacheRuntime, including DataLoad, DataBackup, DataMigrate, and DataProcess. It introduces new API fields in CacheRuntimeClass, refactors the operation reconciliation logic for better reusability, and adds a dedicated Helm chart for data loading. Feedback focuses on correcting the DataOperationSpec enum to include DataProcess, generalizing the CronJob template to remove Alluxio-specific scripts, and fixing a type mismatch in the Helm values. Additionally, several copy-paste errors referencing JuiceFSRuntime were found, and the image selection priority logic needs to be aligned with its documentation.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #5798 +/- ##
==========================================
- Coverage 58.46% 58.17% -0.29%
==========================================
Files 473 478 +5
Lines 32222 32485 +263
==========================================
+ Hits 18839 18899 +60
- Misses 11836 12042 +206
+ Partials 1547 1544 -3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: xliuqq <xlzq1992@gmail.com>
Signed-off-by: xliuqq <xlzq1992@gmail.com>
There was a problem hiding this comment.
Pull request overview
This PR extends the CacheRuntime (CacheEngine) data lifecycle support by adding DataLoad/DataProcess integration (with Curvine as an e2e example), introducing a CacheRuntimeClass dataOperationSpecs API surface, and wiring a cache-specific dataloader Helm chart.
Changes:
- Add CacheEngine support for generating Helm values for DataLoad/DataProcess (and explicit “not supported” stubs for DataBackup/DataMigrate).
- Introduce
dataOperationSpecsin CacheRuntimeClass + CRD/OpenAPI updates, plus RBAC adjustments needed to read CacheRuntimeClass. - Add a new
fluid-dataloadercache Helm chart and Curvine GHA e2e updates to exercise DataLoad.
Reviewed changes
Copilot reviewed 33 out of 33 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/diagnose-fluid-curvine.sh | Capture additional pod diagnostics during e2e troubleshooting. |
| test/gha-e2e/curvine/write_job.yaml | Pin pull behavior for busybox in e2e. |
| test/gha-e2e/curvine/test.sh | Add DataLoad creation + completion wait to Curvine e2e flow. |
| test/gha-e2e/curvine/read_job.yaml | Pin pull behavior for busybox in e2e. |
| test/gha-e2e/curvine/minio_create_bucket.yaml | Pin pull behavior for minio/mc in e2e. |
| test/gha-e2e/curvine/minio.yaml | Pin pull behavior for minio in e2e. |
| test/gha-e2e/curvine/dataset.yaml | Update comment/example mount command. |
| test/gha-e2e/curvine/dataload.yaml | New DataLoad CR used by Curvine e2e. |
| test/gha-e2e/curvine/cacheruntimeclass.yaml | Add dataOperationSpecs example for Curvine DataLoad. |
| pkg/ddc/cache/engine/image.go | Derive data-operation image from runtime/runtimeclass worker image. |
| pkg/ddc/cache/engine/error.go | Remove unused “not implemented” helper. |
| pkg/ddc/cache/engine/engine.go | Import cleanup/reorder. |
| pkg/ddc/cache/engine/dataprocess.go | Add CacheEngine DataProcess values generation hook. |
| pkg/ddc/cache/engine/datamigrate.go | Add CacheEngine DataMigrate “not supported” stub. |
| pkg/ddc/cache/engine/dataload.go | Add CacheEngine DataLoad values generation + runtimeclass op spec handling. |
| pkg/ddc/cache/engine/databackup.go | Add CacheEngine DataBackup “not supported” stub. |
| pkg/ddc/cache/engine/data_operation.go | Route operations through shared reconciler + implement CheckRuntimeReady and value-file dispatch. |
| pkg/ddc/base/template_engine_operation.go | Add TemplateEngine wrapper for shared operation reconciler. |
| pkg/ddc/base/operation_lock.go | Generalize dataset operation lock to accept a new OperationEngine interface. |
| pkg/ddc/base/operation.go | Refactor shared operation reconcile logic into EngineOperationReconciler. |
| pkg/dataload/value.go | Extend DataLoadInfo with env/command/args for cache-engine-driven jobs. |
| pkg/controllers/operation_controller.go | Add CacheRuntime shortcut mapping to cache engine implementation. |
| config/crd/bases/data.fluid.io_cacheruntimeclasses.yaml | CRD schema: add dataOperationSpecs. |
| charts/fluid/fluid/templates/role/dataset/rbac.yaml | RBAC: allow reading cacheruntimeclasses. |
| charts/fluid/fluid/templates/role/cache/rbac.yaml | RBAC tweak for cacheruntimeclasses/status. |
| charts/fluid/fluid/crds/data.fluid.io_cacheruntimeclasses.yaml | Helm CRD: add dataOperationSpecs. |
| charts/fluid-dataloader/cache/values.yaml | New cache dataloader chart default values. |
| charts/fluid-dataloader/cache/templates/job.yaml | New Job template for cache DataLoad (once policy). |
| charts/fluid-dataloader/cache/templates/cronjob.yaml | New CronJob template for cache DataLoad (cron policy). |
| charts/fluid-dataloader/cache/Chart.yaml | New cache dataloader Helm chart metadata/deps. |
| api/v1alpha1/openapi_generated.go | OpenAPI generation updates for new CacheRuntimeClass field/type. |
| api/v1alpha1/cacheruntimeclass_types.go | API types: add DataOperationSpecs and DataOperationSpec. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Image string `json:"image,omitempty"` | ||
|
|
||
| // Command for data operation Pod container | ||
| Command []string `json:"command,omitempty"` |
There was a problem hiding this comment.
DataOperationSpec.Command has no validation — if both command and args are empty, the pod launches with the container's default entrypoint, which may produce unexpected behavior for cache engines. Should the API require at least one of command/args to be non-empty when the operation is declared?
There was a problem hiding this comment.
check in genDataLoadValue.
| // SetDataOperationInTargetDataset set status of target dataset to mark the data operation being performed. | ||
| func SetDataOperationInTargetDataset(ctx cruntime.ReconcileRequestContext, operation dataoperation.OperationInterface, engine *TemplateEngine) error { | ||
| func SetDataOperationInTargetDataset(ctx cruntime.ReconcileRequestContext, operation dataoperation.OperationInterface, engine OperationEngine) error { | ||
| targetDataset := ctx.Dataset |
There was a problem hiding this comment.
The TODO comment asks "can we remove this check? dataset binds runtime only when runtime is ready!" — this suggests CheckRuntimeReady may be redundant. For CacheEngine, the check requires both master and worker ready. If worker is still initializing, DataLoad stays Pending indefinitely. Is this the intended behavior, or should the check be relaxed?
There was a problem hiding this comment.
The data operation reconciler gets runtime by dataset.Status.Runtimes, dataset binds runtime when cache runtime is ready in the engine::SetUp . Is this check used when workers re-construct after SetUp ?
cheyang
left a comment
There was a problem hiding this comment.
Thanks for this PR — the DataOperationSpecs declarative pattern on CacheRuntimeClass is a good design, and the EngineOperationReconciler extraction reduces duplication nicely.
Three items that should be fixed before merge:
- Nil-pointer panic in
dataload.goL83 —runtimeClass.Topology.Workeris dereferenced without a nil check (image.go does check). This will panic if Worker is not defined. - envs type mismatch in
values.yaml—envs: {}(map) conflicts with[]Env(slice) in Go and Helm templates. Should beenvs: []. - Missing license headers on all new files (
dataload.go,databackup.go,datamigrate.go,dataprocess.go,image.go,template_engine_operation.go).
Non-blocking suggestions: mixed client sources in processTTL, duplicate DataOperationSpecs iteration, comment placement in image.go, env indentation inconsistency in cronjob.yaml, CClient naming, unused imports in stub files.
Patch test coverage is very low (7%) — would recommend at least unit tests for getDataOperationImage priority logic and the supportDataLoad check path.
Signed-off-by: xliuqq <xlzq1992@gmail.com>
|
cache runtime related tests will be added in next pr. |
Signed-off-by: xliuqq <xlzq1992@gmail.com>
|

Ⅰ. Describe what this PR does
add dataload/dataprocess support, use curvine as an example.
Ⅱ. Does this pull request fix one issue?
a part of #5412
Ⅲ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.
added e2e test for curvine dataload.
Ⅳ. Describe how to verify it
as a part of github action test.
Ⅴ. Special notes for reviews