Skip to content

fix(compose): fix ToolInfo.ParamsOneOf loss and eliminate deep-copy in checkpoint serialization#1007

Open
shentongmartin wants to merge 1 commit intoalpha/09from
fix/checkpoint-serialization
Open

fix(compose): fix ToolInfo.ParamsOneOf loss and eliminate deep-copy in checkpoint serialization#1007
shentongmartin wants to merge 1 commit intoalpha/09from
fix/checkpoint-serialization

Conversation

@shentongmartin
Copy link
Copy Markdown
Contributor

Summary

Problem Solution
ToolInfo.ParamsOneOf lost during checkpoint serialization — InternalSerializer.internalMarshal passes a non-addressable value to json.Marshal, preventing (*ToolInfo).MarshalJSON() (pointer receiver) from being dispatched Ensure a pointer is always passed to json.Marshal when the reflect.Value is not addressable, so custom pointer-receiver marshalers are correctly invoked
Deep-copy via InternalSerializer was inherently fragile — any type with unexported fields or custom marshaling could silently lose data Replace deepCopyState with mutex-based synchronization: acquire state.mu around checkPointer.set() to get mutual exclusion with concurrent ProcessState calls, eliminating the need for serialization-based deep-copy entirely

Key Insight

The root cause is a subtle interaction between Go's reflect package and encoding/json:

internalMarshal extracted a struct value via rv.Interface() and passed it to json.Marshal. When rv is not addressable (e.g., it came from a map value or was created via reflect.New(...).Elem()-style paths that lost addressability), Go cannot obtain a pointer to it. Since MarshalJSON is defined on *ToolInfo (pointer receiver), json.Marshal never sees the custom marshaler and falls back to default struct encoding — which silently drops ParamsOneOf (a field with unexported sub-fields only accessible through the custom method).

The second insight is architectural: deep-copying state via marshal/unmarshal was introduced to prevent races between checkpoint persistence and concurrent ProcessState mutations. But ProcessState already acquires state.mu, and convertCheckPoint only reads cp.Channels and cp.Inputs (not cp.State). Therefore, acquiring the same mutex around checkPointer.set() provides the same mutual exclusion guarantee without relying on serialization fidelity.

Changed Files

  • internal/serialization/serialization.go — in internalMarshal, check addressability; if not addressable, create a new pointer and set the value before calling json.Marshal
  • compose/graph_run.go — remove deepCopyState function; acquire state.mu.Lock() around checkPointer.set(cp) in runPregel

概要

问题 解决方案
checkpoint 序列化时 ToolInfo.ParamsOneOf 丢失 — InternalSerializer.internalMarshal 传入不可寻址的值给 json.Marshal,导致 (*ToolInfo).MarshalJSON()(指针接收者)无法被调用 确保传入 json.Marshal 的始终是指针,使自定义指针接收者的 marshaler 能被正确派发
基于 InternalSerializer 的深拷贝本质上脆弱 — 任何包含未导出字段或自定义序列化的类型都可能静默丢失数据 用互斥锁同步替代 deepCopyState:在 checkPointer.set() 周围获取 state.mu 锁,与并发的 ProcessState 调用互斥,从而完全消除对序列化深拷贝的依赖

核心洞察

根因是 Go 的 reflect 包与 encoding/json 之间的微妙交互:

internalMarshal 通过 rv.Interface() 提取结构体值并传给 json.Marshal。当 rv 不可寻址时(例如来自 map 值、或经过丧失可寻址性的 reflect 路径),Go 无法获取其指针。由于 MarshalJSON 定义在 *ToolInfo(指针接收者)上,json.Marshal 看不到自定义 marshaler,回退到默认结构体编码 — 静默丢弃 ParamsOneOf(一个仅通过自定义方法可访问未导出子字段的字段)。

第二个洞察是架构层面的:通过 marshal/unmarshal 深拷贝 state 是为了防止 checkpoint 持久化与并发 ProcessState 修改之间的竞争。但 ProcessState 已经获取 state.mu,且 convertCheckPoint 只读取 cp.Channelscp.Inputs(不读 cp.State)。因此在 checkPointer.set() 周围获取同一把锁即可提供相同的互斥保证,无需依赖序列化的正确性。

变更文件

  • internal/serialization/serialization.go — 在 internalMarshal 中检查可寻址性;不可寻址时创建新指针并设置值后再调用 json.Marshal
  • compose/graph_run.go — 删除 deepCopyState 函数;在 runPregel 中的 checkPointer.set(cp) 周围获取 state.mu.Lock()

… InternalSerializer

When InternalSerializer marshals a struct value that implements
json.Marshaler via pointer receiver (e.g. *ToolInfo), rv.Interface()
produces a non-addressable copy. json.Marshal then cannot call the
pointer method and falls back to default struct encoding, which skips
unexported fields — causing ParamsOneOf data loss after deepCopyState
during interrupt/resume.

Fix: pass a pointer to json.Marshal by using rv.Addr() when addressable,
or copying into reflect.New() otherwise.

Change-Id: Ib325f5cbb1f97271f1609d8f29b9669e9ec01d60
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (alpha/09@58d7b8d). Learn more about missing BASE report.

Additional details and impacted files
@@             Coverage Diff             @@
##             alpha/09    #1007   +/-   ##
===========================================
  Coverage            ?   82.97%           
===========================================
  Files               ?      162           
  Lines               ?    22040           
  Branches            ?        0           
===========================================
  Hits                ?    18288           
  Misses              ?     2529           
  Partials            ?     1223           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@shentongmartin shentongmartin force-pushed the fix/checkpoint-serialization branch 2 times, most recently from 6c1d86a to 58b8751 Compare April 30, 2026 04:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant