Encapsulate Checkpoint internal state#148
Encapsulate Checkpoint internal state#148nojnhuh wants to merge 13 commits intokubernetes-sigs:mainfrom
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: nojnhuh The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/cc |
| "sigs.k8s.io/dra-example-driver/internal/profiles" | ||
| ) | ||
|
|
||
| type PreparedClaims map[string]profiles.PreparedDevices |
There was a problem hiding this comment.
type PreparedDevice struct {
drapbv1.Device
ContainerEdits *cdiapi.ContainerEdits
AdminAccess bool
}This does not look like a struct that is intended to be used in an on-disk file format. All it takes is for someone to rename one of these fields and encoding/decoding will break. The current unit test can catch that only for fields which are covered by a test case.
Should we use this opportunity (i.e. this PR) to introduced proper serialization of the internal state with a versioned on-disk format?
See kubernetes/kubernetes#137086 for the machinery, used there for the on-disk metadata information file format. The problem is the same: one process writing a file, another reading it. That's an API, in this case between one version of the example driver and another.
There was a problem hiding this comment.
We might as well do that here.
I'm trying to make the internal version contain a literal k8s.io/kubelet/pkg/apis/dra/v1beta1.Device and tags.cncf.io/container-device-interface/pkg/cdi.ContainerEdits since that's what the driver ultimately uses, but codegen is not liking those at all. The protobuf type has unexported fields which controller-gen tries to access in generated DeepCopy methods. The CDI types are structured kind of weirdly that's either weird to mirror in our API or needs a bunch of manual conversion to handle. To generate code that compiles it seems like we'll need to entirely duplicate all the types for every version and then manually convert from the internal API version to the types that the CDI and kubelet Go interfaces use. At that point I'd almost rather generate no code at all. I lost steam following this all the way through but pushed what I have so far in the newest commit here in case that's informative.
Am I way off base or do we need to accept that this just isn't going to be smooth?
There was a problem hiding this comment.
Does all of that information really need to be persisted?
There was a problem hiding this comment.
I pared down the checkpoints all the way down to just the claim UID and rely on recomputing everything else from the ResourceClaim. I added some comments and refactored a bit to describe how real drivers can handle more complex scenarios.
04676a0 to
240f94e
Compare
Co-authored-by: Jon Huhn <nojnhuh@users.noreply.github.com>
240f94e to
848c141
Compare
|
I plan to squash before merging. /hold |
…into encapsulate_checkpoint # Conflicts: # go.mod # go.sum
pohly
left a comment
There was a problem hiding this comment.
I think this goes in the right direction, but needs documentation of the concept.
| checkpointScheme, | ||
| json.SerializerOptions{ | ||
| Pretty: true, | ||
| Strict: true, |
There was a problem hiding this comment.
"Strict" can bite us. This causes unknown fields during decoding to result in an error, right?
Suppose we need to extend the on-disk format with a field that older releases of the DRA driver don't know about. If we treat API versions like in the rest of Kubernetes, we'd add that field to all API versions and serialize it. Older clients then need to ignore unknown fields, which we can rely upon when adding fields that are okay to be ignored.
The alternative is to add new fields only in a new API version and then only serialize using that version. This immediately prevents downgrading, because older clients won't be able to read the file.
Which brings me to my next point: how do new API versions get rolled out? We need to document it. Probably something like this:
- In driver version x.y, support v1alpha1.
- In driver version x.y+1, support v1beta1, but still serialize as v1alpha1 (thus supporting downgrades).
- In driver version x.y+2, write as v1beta1 and then only support downgrades back to x.y+1.
There was a problem hiding this comment.
The docs suggest it means something different: https://pkg.go.dev/k8s.io/apimachinery@v0.35.4/pkg/runtime/serializer/json#SerializerOptions.Strict
Strict: configures the Serializer to return strictDecodingError's when duplicate fields are present decoding JSON or YAML. Note that enabling this option is not as performant as the non-strict variant, and should not be used in fast paths.
I tried manually editing the checkpoint on disk to add an unknown field and the next NodePrepareResources succeeds, removing the unknown field when it serializes the checkpoint again. I think that's the expected behavior?
- In driver version x.y+1, support v1beta1, but still serialize as v1alpha1 (thus supporting downgrades).
Does it make sense to even define a v1beta1 if the driver only writes v1alpha1? If v1beta1 is added to support some feature, then the driver should start writing v1beta1 right away. Maybe the driver can be more intelligent and continue writing v1alpha1 if it can tell it doesn't need any of the features in v1beta1. I think we can worry about that though once we need a new API version. Added some go docs to describe that process.
There was a problem hiding this comment.
Interesting. In https://pkg.go.dev/sigs.k8s.io/yaml#UnmarshalStrict, "strict" includes "presence of fields in the serialized data unknown to the struct will yield an error".
I tried manually editing the checkpoint on disk to add an unknown field and the next NodePrepareResources succeeds, removing the unknown field when it serializes the checkpoint again. I think that's the expected behavior?
Yes, thanks for checking.
There was a problem hiding this comment.
Does it make sense to even define a v1beta1 if the driver only writes v1alpha1?
I think we can do the opposite and go straight to v1. Basically the driver then commits to only use that version and maybe extend it with new fields, with the caveat that older releases will ignore and drop those. Versioning as "v1" is just future-proofing the storage format because if structural changes are ever going to be needed, they can be done.
There was a problem hiding this comment.
Note this code was pulled from a (very) old version of the DRA driver for GPUs, and we have since relaxed the use of strict for (most) code paths. The example driver should probably do the same: kubernetes-sigs/dra-driver-nvidia-gpu@fb4822f
There was a problem hiding this comment.
Thanks, updated to match. Strict for config, non-strict for checkpoints.
There was a problem hiding this comment.
I think we can do the opposite and go straight to v1. Basically the driver then commits to only use that version and maybe extend it with new fields, with the caveat that older releases will ignore and drop those. Versioning as "v1" is just future-proofing the storage format because if structural changes are ever going to be needed, they can be done.
Makes sense, updated to v1.
69b1bf0 to
ff58c14
Compare
|
/assign |
This PR builds off of #90:
The second commit drops our dependency on k/k by refactoring away from the kubelet's
CheckpointManagerto a pair of functions to read and write checkpoints from the local filesystem directly. Checksums have been dropped from the checkpoint format in favor of safely updating the file by writing a temporary copy and performing an atomic rename.In addition, the checkpoint format has been changed to include an
apiVersionandkindlike other Kubernetes APIs. The new format also includes less data, relying on most container config for a ResourceClaim to be recomputed rather than restored directly from the checkpoint.