Add support for cgroup-only device nodes#314
Conversation
This change moves adding device nodes to the OCI spec to a new function so as to make future extensions easier to implement. Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2dcb801 to
cde499e
Compare
| var major *int64 | ||
| if d.Major != -1 { | ||
| major = &d.Major | ||
| } | ||
| var minor *int64 | ||
| if d.Minor != -1 { | ||
| minor = &d.Minor | ||
| } |
There was a problem hiding this comment.
It feels as if this is the CORE change here since this SHOULD allow us to inject the cgroup rule even without performing additional checks as long as SOME non-zero device major is specified. This works because we explicitly ignore errors getting device node information if at least the Major and Type is is specified. What we need is the -1 to nil mapping since nil is interpreted as * by the OCI runtime spec.
There was a problem hiding this comment.
Just a mental note... It's a pity we don't have Major and Minor as *int64 instead, so we could differentiate between zero- and omitted values.
There was a problem hiding this comment.
I actually wonder, do we need "safety net' to prevent accidental "allow all devices" setting?
| func (d *DeviceNode) isWildcardDevice() bool { | ||
| if d.Minor == -1 || d.Major == -1 { | ||
| return true | ||
| } |
There was a problem hiding this comment.
Cannot we (and if we can, shouldn't we) always require that the device path is always set to "*" for such a device ? I think it would be less confusing if we didn't allow it, and doing it otherwise also might give the false impression that something might be filled in from the same device on the host...
There was a problem hiding this comment.
My one thought here was whether we wanted to use the path for naming (from a user perspective), but maybe always using * is better. That is to say that a wildcard device ALWAYS has a path that is exactly equal to "*" and MAY include specific major and minor numbers or -1 as a wildcard major or minor number.
|
@elezar This looks fine to me as a feature to have, but I have a few questions about some implementation/design details. |
cde499e to
bdc6b5a
Compare
klihub
left a comment
There was a problem hiding this comment.
This LGTM now. We'll need to update documentation and tests, though.
Yes, I just wanted to get some early feedback before adding more of those. Another question. I suppose that this would justify a spec bump, or would we consider back-porting this to older versions? If we do bump the spec, then we would have to update the minimum version checks. |
Yes, I think this probably warrants a spec version bump. In principle it changes how we semantically interpret the spec. Although now that we require the device name for a wildcard injection to be a literal "*", the practical difference is reduced to a "it would have failed earlier but now it won't and will have a special meaning", and it hard to imagine that someone could possibly rely on the old failing behavior. But better be safe. About backporting (without bumping the Spec version), I think that would not help those who need the new behavior, if/when they ran with an already released runtime version. That would just unexpectedly fail. |
bdc6b5a to
07292c1
Compare
4be965e to
680d858
Compare
This change allows the following device node CDI spec to be considered valid: path: "*" type: "c" major: 195 minor: -1 permissions: "rwm" When this is included in the spec, no device nodes are added to the OCI runtime spec, but the linux resources (cgroups) are updated to allow access to all devices in the same way as the Docker `--device-cgroup-rule` flag. Note that a major or minor number of -1 is equivalent to a * cgroup rule. This is useful when starting a container that needs to be able to create device nodes that do not exist when it is started. Signed-off-by: Evan Lezar <elezar@nvidia.com>
680d858 to
7646b1b
Compare
| v080: requiresV080, | ||
| v100: requiresV100, | ||
| v110: requiresV110, | ||
| v120: requiresV120, |
There was a problem hiding this comment.
Note to self. I still need to add tests for this.
This change proposes a mechanism to allow for the specification of cgroup-only device nodes (See #302). This is represented by a device node with a
pathequal to*. For such a device node, the device node information is NOT injected into the OCI runtime spec, but a cgroup rule is added for the major and minor number of the device. This is similar to the Docker--device-cgroup-ruleargument and means that the device node NEED not exist on the host.In addition, a mojor or minor number of
-1as specified inopencontainers/cgrouphttps://github.com/opencontainers/cgroups/blob/d492e44fbd24c97075c46c33e67ae43b5c7fa0ac/devices/config/device.go#L10 and used inruncis treated as a "wildcard" device matching all major (or minor) numbers.Note that
runcskips device nodes with an empty path: https://github.com/opencontainers/runc/blob/edbed618bff99edaaa9d622f9f8687004fe5e50a/libcontainer/rootfs_linux.go#L984-L988 assuming they are only provided for cgroup access.