Skip to content

Commit 09724fe

Browse files
committed
add ufs update and inplace update
Signed-off-by: xliuqq <xlzq1992@gmail.com>
1 parent 64729e4 commit 09724fe

File tree

3 files changed

+54
-10
lines changed

3 files changed

+54
-10
lines changed

proposals/runtime/v1.1.0_extend_cache_runtime/full_cache_runtime.md

Lines changed: 34 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -83,15 +83,22 @@ type MountUFS struct {
8383
}
8484
```
8585

86-
Curvine Cache Runtime 的处理流程如下图所示:本项工作的重点在于:
86+
Curvine Cache Runtime 中在 SetUp 阶段相关的处理流程如下图所示:
87+
88+
![img](./pics/curvine_integration.jpeg)
89+
90+
SetUp 阶段所涉及的工作任务有:
8791

8892
1. 提供启动脚本,将 Fluid 提供的 RuntimeConfig 转化为 Curvine 所使用的配置文件;
8993
- 拟采用 go template 要求的格式定义 Curvine 的配置文件,并进行替换;
9094
1. 添加 Mount UFS 步骤,在 Master Sts 启动完成后,进入 Master Pod 执行 cv mount 操作;
9195
- **mount 操作在 CacheRuntimeClass 中定义,指定在特定的角色(如Master)的Pod 中执行指定的命令,以RuntimeConfig文件为参数。**
9296
- 对于 JuiceFS 缓存系统,不需要单独执行 mount 参数,在 CacheRuntimeClass 中不定义即可;
97+
1. 对于 Worker/Client 而言,当前在创建时提供的上下文中,仍缺乏 Master Service 的信息,以便与其通信。
9398

94-
![img](./pics/curvine_integration.jpeg)
99+
此外,如果修改了 DataSet 的 mount path,则需要在 Sync 阶段,进行 Ufs Update,此处逻辑可以参考已有的 TemplateEngine 实现。
100+
101+
- 要求 MountUfs 中定义的 Command,能够根据当前的挂载信息和目标挂载信息,进行 umount 和 mount 操作。
95102

96103
### 2. 定义标准API,支持 DataOperation
97104

@@ -131,7 +138,7 @@ type DataOperator interface {
131138

132139
<img src="pics/state_transform.jpeg" alt="img" style="zoom:80%;" />
133140

134-
在核心实现上,与 TemplateEngine 的不同点在于 Helm 文件的生成,即需要实现接口
141+
在核心实现上,与 TemplateEngine 的不同点在于 Executing 阶段 Helm 所用的文件的生成,即需要实现接口
135142

136143
```go
137144
// DataOperatorYamlGenerator is the implementation of DataOperator interface for runtime engine.
@@ -145,13 +152,35 @@ type DataOperatorYamlGenerator interface {
145152
- 通过新增的 DataOperationSpec 定义相应 Pod ,启动并执行相应的数据操作的命令,其中 Fluid DataOperation的相关配置信息,会挂载到 /etc/fluid/config/dataop 文件中;
146153

147154

155+
148156
### 3. 支持 In-Place UpgradeReBuild
149157

150-
版本更新时的原地升级:
158+
Fluid 准备采用类似 OpenKruise 中的 AdvancedStatefulSet 的能力替代现有 Cache RuntimeStatefulSet,AdvancedStatefulSet 自身具备原地升级的能力,因此本项工作的内容,是结合 Cache Runtime 的生命周期,梳理相关的改动点,并实现支持原地升级和缓存重建的能力。
151159

160+
Fluid 对于 Cache RuntimeReconcile flow 如下图最左侧所示:
152161

162+
![img](pics/inplace_add_funcs.jpg)
153163

154-
配置更新时的缓存重建:
164+
Cache Runtime 中对缓存系统的各个组件(Master/Worker/Client Component)抽象了 ComponentHelper 接口,其接口定义如下:
165+
166+
- 对于 Component 的销毁,采用 OwnererReference 进行管理,因此不用定义接口。
167+
168+
```go
169+
type ComponentHelper interface {
170+
// reconcile to create component workload
171+
Reconciler(ctx context.Context, component *common.CacheRuntimeComponentValue) error
172+
// create RuntimeComponentStatus accoring to component workload status
173+
ConstructComponentStatus(ctx context.Context, component *common.CacheRuntimeComponentValue) (datav1alpha1.RuntimeComponentStatus, error)
174+
// get TopologyConfig accoring to component workload spec, will be recorded in the Runtime ConfigMap
175+
GetComponentTopologyInfo(ctx context.Context, component *common.CacheRuntimeComponentValue) (common.TopologyConfig, error)
176+
// check component exist or not, currently useless
177+
CheckComponentExist(ctx context.Context, component *common.CacheRuntimeComponentValue) (bool, error)
178+
// clean up orphaned resources, currently useless
179+
CleanupOrphanedComponentResources(ctx context.Context, component *common.CacheRuntimeComponentValue) error
180+
}
181+
```
155182

183+
因此,第一点工作是使用 AdvancedStatefulSet 实现上面的接口。
156184

185+
其次,当前的 CacheRuntime 框架中,是通过修改 CacheRuntimeSpec 的定义来修改各个Component的资源对象的(如副本数,环境变量等),当 Spec 发生改变时,需要在 Sync 函数中,新增 syncCacheRuntimeSpec 函数,用于比较当前组件的AdvancedStatefulSet 跟 Spec 定义的差别,并通过 AdvancedStatefulSet 的相应字段的修改,实现原地更新和升级的能力(该能力由 AdvancedStatefulSet 的控制器提供)。同时,新增 checkRuntimeHealthy 函数,根据最新的组件的状态(如目标副本数、可用副本数等信息)更新 CacheRuntimeStatus 字段。
157186

286 KB
Loading

roadmap/v1.1.0_extend_cache_runtime .md

Lines changed: 20 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -31,8 +31,8 @@ Extend generic cache runtime interface to achieve:
3131

3232
| Task | Description | Deliverable |
3333
| ---- | ------------------------------------------------------------ | ------------------------------------------------------------ |
34-
| 1.1 | Integrate curvine with cache runtime and improve the implementation code of cache runtime | Cache Runtime supports the usage scenarios of Curvine |
35-
| 1.2 | Write cache runtime unit test code | 80%+ unit tests coverage |
34+
| 1.1 | Adjust the implementation code of cache runtime to integrate Curvine seamlessly | Cache Runtime supports the usage scenarios of Curvine |
35+
| 1.2 | Write cache runtime unit test code | 75%+ unit tests coverage |
3636
| 1.3 | Write cache runtime e2e test code | E2E test for cuvine with cache runtime |
3737
| 1.4 | Document how to use Cache Runtime for Cache System | Docs for "Cache System Intergration for Cache Runtime with Curvine Example" |
3838

@@ -44,7 +44,7 @@ Objective: Cache Runtime supports DataOperation (including DataLoad, DataProcess
4444
| ---- | -------------------------------------------------------- | ------------------------------------------------------------ |
4545
| 1.1 | Support DataLoad for Cache Runtime | Cache Runtime supports DataLoad of Curvine |
4646
| 1.2 | Support DataProcess for Cache Runtime | Cache Runtime supports DataProcess of Curvine |
47-
| 1.3 | Write cache runtime unit test code | 80%+ unit tests coverage |
47+
| 1.3 | Write cache runtime unit test code | 75%+ unit tests coverage |
4848
| 1.4 | Write cache runtime e2e test code | E2E test for cuvine with cache runtime |
4949
| 1.5 | Document how to implement DataOperation for Cache System | Docs section: "Implement DataOperation for Cache System using Cache Runtime". |
5050

@@ -63,9 +63,24 @@ Objective: To support in-place upgrades during the version upgrade of the cachin
6363

6464

6565

66-
## 3. Dependencies
66+
## 3. Implementation Notes
6767

68-
- No new external service dependencies at build time.
68+
For further details of the design, please refer to [the proposal](../proposals/runtime/v1.1.0_extend_cache_runtime/full_cache_runtime.md).
69+
70+
### 3.1 Technology Choices
71+
72+
**Language**: Go, consistent with the Fluid codebase.
73+
74+
**Test Framework**: keep consistent with the Fluid current implemetation.
75+
76+
- Unit Test will use Ginkgo + Gomega + Gomonkey.
77+
- Kind-e2e will use Shell + Yaml, fluid-e2e will use Prow + Python.
78+
79+
80+
81+
### 3.2 Dependencies
82+
83+
No new external service dependencies at build time.
6984

7085

7186

0 commit comments

Comments
 (0)