Skip to content

runtime-rs: port patches from go shim to rust shim#54

Closed
TheRayquaza wants to merge 3 commits into
datadogfrom
mateo.lelong/datadog-rust-ports
Closed

runtime-rs: port patches from go shim to rust shim#54
TheRayquaza wants to merge 3 commits into
datadogfrom
mateo.lelong/datadog-rust-ports

Conversation

@TheRayquaza
Copy link
Copy Markdown

@TheRayquaza TheRayquaza commented Apr 8, 2026

Summary

  • Netkit endpoint support: Port the Go runtime's netkit endpoint to the Rust shim. Adds NetkitEndpoint modeled after VethEndpoint with L3-mode detection (missing MAC address → clear error).
  • CPU shares fallback: When no CPU quota or cpuset is set, calculate vCPUs from CPU shares (shares / 1024), mirroring Go's CalculateCPUsF().
  • Block device annotation mounts: Parse io.katacontainers.volume.block-mounts annotation and convert matching volumeDevices into agent Storage objects, enabling block device passthrough via annotation.

Motivation

These are direct ports of existing Go runtime patches to keep the Rust shim at feature parity with the Go runtime for Datadog's use cases.

Test plan

  • Netkit endpoint: container with netkit interface starts without panic; L3 device without MAC returns clear error
  • CPU shares: pod with only resources.requests.cpu (no limit) gets correct vCPU count
  • Block mount annotation: pod with io.katacontainers.volume.block-mounts annotation correctly provisions block storage

@TheRayquaza TheRayquaza force-pushed the mateo.lelong/datadog-rust-ports branch from b46e687 to bca9f07 Compare April 8, 2026 15:56
@TheRayquaza
Copy link
Copy Markdown
Author

TheRayquaza commented Apr 9, 2026

🧪 Tests - Rust Shim Patch

Nodes tested: us3-staging-dog-arbok-90eec5c94014646d000003
Runtime class: kata-qemu-runtime-rs · CNI: Cilium + netkit

Patch Feature Test Result Verdict
b46e6878c CPU shares fallback A1 (shares→2 vCPU) ✅ nproc=2 ✅ PASS
b46e6878c CPU shares fallback A2 (with limits) ✅ nproc=2 ✅ PASS
f09414eb6 Netkit endpoint B1 ✅ Running in ~2s ✅ PASS
b46e6878c Block annotation mounts C1 (PVC→FS) ✅ Pod Ready / Can write to FS ✅ PASS
b46e6878c Block annotation mounts C2 (JSON → error) ✅ clean parse error ✅ PASS

@TheRayquaza
Copy link
Copy Markdown
Author

TheRayquaza commented Apr 9, 2026

Test B1 - Netkit endpoint (pod starts + routing check)

  • Patch: f09414eb6
  • Expected: pod reaches Running, kata shim logs "netkit network interface found: eth0"
 apiVersion: v1                                                                                                                                           
 kind: Pod                                                                                                                                                
 metadata:                                                                                                                                                
   name: kata-val-b2                                   
   namespace: default                                  
 spec:
   nodeName: us3-staging-dog-arbok-90eec5c94014646d000003
   runtimeClassName: kata-qemu-runtime-rs                                                                                                                 
   containers:
   - name: test                                                                                                                                           
     image: registry.ddbuild.io/images/base/gbi-ubuntu_2204:release
     command: ["sleep", "3600"]                        

Verify:

  kubectl apply -f kata-val-b2-netkit.yaml
  kubectl wait --for=condition=Ready pod/kata-val-b2 --timeout=120s                                                                                        
                                                                   
  # Check routing table inside VM                       
  kubectl exec kata-val-b2 -- cat /proc/net/route                                                                                                                                                          
                                                                                                                                                           
  # Check kata shim logs on node for netkit detection                                                                                                      
  ssh us3-staging-dog-arbok-90eec5c94014646d000003 \    
    "journalctl -u containerd --since '5 minutes ago' | grep -i netkit | tail -10"                                                                         
                                                                                                                                                           
  kubectl delete pod kata-val-b2 --ignore-not-found                                                                                                        

Pass signals:

  • Pod reaches Running - netkit endpoint created without L3 guard firing
  • Shim log contains field_type="netkit" - correct code path exercised
root@us3-staging-dog-arbok-90eec5c94014646d000003:/home/ddeng# journalctl -u containerd --since '5 minutes ago' | grep -i netkit | tail -10
Apr 09 12:25:53 us3-staging-dog-arbok-90eec5c94014646d000003 kata[51438]: netkit network interface found: eth0
Apr 09 12:25:53 us3-staging-dog-arbok-90eec5c94014646d000003 kata[51438]: network info NetworkInfoFromLink { interface: Interface { device: "eth0", name: "eth0", ip_addresses: [IPAddress { family: V4, address: "10.192.210.50", mask: "32" }], mtu: 1500, hw_addr: "00:00:00:00:00:00", device_path: "", field_type: "netkit", raw_flags: 128 }, neighs: [], routes: [Route { dest: "", gateway: "10.192.210.41", device: "eth0", source: "", scope: 0, family: V4, flags: 0, mtu: 1500 }, Route { dest: "10.192.210.41", gateway: "", device: "eth0", source: "", scope: 253, family: V4, flags: 0, mtu: 0 }, Route { dest: "", gateway: "10.192.210.41", device: "eth0", source: "", scope: 0, family: V4, flags: 0, mtu: 1500 }, Route { dest: "10.192.210.41", gateway: "", device: "eth0", source: "", scope: 253, family: V4, flags: 0, mtu: 0 }] }
  • /proc/net/route shows a default gateway entry

root@kata-val-b2:/# cat /proc/net/route
Iface   Destination     Gateway         Flags   RefCnt  Use     Metric  Mask            MTU     Window  IRTT                                                       
eth0    00000000        29D2C00A        0003    0       0       0       00000000        0       0       0                                                                               
eth0    29D2C00A        00000000        0005    0       0       0       FFFFFFFF        0       0       0 

@TheRayquaza
Copy link
Copy Markdown
Author

TheRayquaza commented Apr 9, 2026

Test A1 - CPU shares fallback (shares→2 vCPU)

  • Patch: b46e6878c
apiVersion: v1                                                                                                                                           
kind: Pod                                                                                                                                                
metadata:                                                                                                                                                
 name: kata-val-a2                                   
 namespace: default                                  
spec:                                                                                                                                                    
 nodeName: us3-staging-dog-arbok-90eec5c94014646d000003
 runtimeClassName: kata-qemu-runtime-rs                                                                                                                 
 containers:                                                                                                                                            
 - name: test                                        
   image: registry.ddbuild.io/images/base/gbi-ubuntu_2204:release                                                                                       
   command: ["sleep", "3600"]             
   resources:                                                                                                                                           
     requests:
       cpu: "2"                

Pass signals:

  • Pod reaches Running
  • nproc = 2 instead of fallback 1
root@kata-val-a2:/# nproc
2

Some logs from containerd:
journalctl -u containerd --since "2 minutes ago" | grep -i "vcpu\|cpu_shares\|initial size"

Apr 09 12:51:45 us3-staging-dog-arbok-90eec5c94014646d000003 kata[57343]: resource with vcpu 2

@TheRayquaza
Copy link
Copy Markdown
Author

TheRayquaza commented Apr 9, 2026

Test A2 - CPU shares fallback (with limits set)

  • Patch: b46e6878c
apiVersion: v1                                                                                                                                           
kind: Pod                                                                                                                                                
metadata:                                                                                                                                                
 name: kata-val-a2                                   
 namespace: default                                  
spec:                                                                                                                                                    
 nodeName: us3-staging-dog-arbok-90eec5c94014646d000003
 runtimeClassName: kata-qemu-runtime-rs                                                                                                                 
 containers:                                                                                                                                            
 - name: test                                        
   image: registry.ddbuild.io/images/base/gbi-ubuntu_2204:release                                                                                       
   command: ["sleep", "3600"]             
   resources:                                                                                                                                           
     requests:
       cpu: "2" 
     limits:
       cpu: "4"               

Pass signals:

  • Pod reaches Running
  • nproc = 2
root@kata-val-a3:/# nproc
2

@TheRayquaza
Copy link
Copy Markdown
Author

TheRayquaza commented Apr 9, 2026

Test C2 - Invalid annotation JSON

apiVersion: v1
kind: Pod
metadata:
 name: kata-val-e3
 namespace: default
 annotations:
   io.katacontainers.volume.block-mounts: "not valid json {"
spec:
 nodeName: us3-staging-dog-arbok-90eec5c94014646d000003
 runtimeClassName: kata-qemu-runtime-rs
 containers:
 - name: test
   image: registry.ddbuild.io/images/base/gbi-ubuntu_2204:release
   command: ["sleep", "3600"]             

Pass signals:

  • Warning, error parsing block mounts, result in error

@TheRayquaza
Copy link
Copy Markdown
Author

TheRayquaza commented Apr 9, 2026

Test C1 - Block annotation mounts

  • Patch: b46e6878c
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
 name: kata-val-e1-pvc-2
 namespace: default
spec:
 accessModes: [ReadWriteOnce]
 volumeMode: Block
 storageClassName: ephemeral-premium-v2-lrs
 resources:
   requests:
     storage: 1Gi
---
apiVersion: v1
kind: Pod
metadata:
 name: kata-val-e1-2
 namespace: default
 annotations:
   io.katacontainers.volume.block-mounts: |
     {"/dev/block-vol": {"mount": "/data", "fstype": "ext4", "options": ["rw"]}}
spec:
 nodeName: us3-staging-dog-arbok-90eec5c94014646d000003
 runtimeClassName: kata-qemu-runtime-rs
 containers:
 - name: test
   image: registry.ddbuild.io/images/base/gbi-ubuntu_2204:release
   command: ["sleep", "3600"]
   volumeDevices:
   - name: block-storage
     devicePath: /dev/block-vol
 volumes:
 - name: block-storage
   persistentVolumeClaim:
     claimName: kata-val-e1-pvc-2            

Pass signals:

  • Pod reaches Running
  • can write to fs at the specified mount

After formatting a block device using another pod in root with mkfs.ext4:

dog@kata-val-e1-2:/$ ls /data/
lost+found

@TheRayquaza TheRayquaza changed the title runtime-rs: port Go runtime patches to Rust shim runtime-rs: port go runtime patches to rust shim Apr 9, 2026
TheRayquaza and others added 3 commits April 9, 2026 17:12
Port the Go runtime netkit endpoint to runtime-rs. Add NetkitEndpoint
modeled after VethEndpoint with L3-mode detection. Handle InfoKind::Netkit
and InfoData::Netkit in link_info() to avoid "unsupported link type: device"
errors on netkit interfaces (kernel sends [Kind, Data] in LIFO order via
pop(), Data arm must be handled before Kind fires).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When sandbox-cpu-quota/period annotations are zero (CFS disabled or no
limits set), fall back to sandbox-cpu-shares/1024 to size the microVM
vCPUs, mirroring Go's CalculateCPUsF(quota, period, shares). Also wire
the computed vCPU count into hv.cpu_info.default_vcpus in setup_config
so it is actually applied to the hypervisor (previously only logged).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Parse io.katacontainers.volume.block-mounts annotation and convert
matching volumeDevices into agent Storage objects, enabling block device
annotation mounts in the Rust shim, mirroring the Go runtime behavior.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@TheRayquaza TheRayquaza force-pushed the mateo.lelong/datadog-rust-ports branch from bca9f07 to f790c54 Compare April 9, 2026 15:14
@TheRayquaza TheRayquaza marked this pull request as ready for review April 9, 2026 15:20
@TheRayquaza TheRayquaza changed the title runtime-rs: port go runtime patches to rust shim runtime-rs: port patches from go shim to rust shim Apr 9, 2026
@TheRayquaza
Copy link
Copy Markdown
Author

Splitting that PR in 3: #55 #56 #57

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant