Support local ephemeral nvme disks#1411
Conversation
| } | ||
|
|
||
| disk { | ||
| auto_delete = true |
There was a problem hiding this comment.
No reason to keep a boot or cache disk around after we delete the VM instance, right?
There was a problem hiding this comment.
Maybe in some strange debugging circumstances, but agree here.
Does this mean we are accumulating the disk somewhere right now?
There was a problem hiding this comment.
No, turns out auto_delete is set to true by default. I can remove this if we don't want it be clear.
| echo "persisting array configuration" | ||
| sudo mdadm --detail --scan --verbose | sudo tee -a /etc/mdadm/mdadm.conf | ||
| %{ else } | ||
| DISK="/dev/disk/by-id/google-persistent-disk-1" |
There was a problem hiding this comment.
I wasn't sure if we want to commit to this, or a/b test this in production. I can assume we'll commit and remove the old stuff.
|
Will the local cache cleaner in orchestrator work ok with the RAID? |
I'll double check, but assuming it's just looking at files and folders, shouldn't be a problem. |
|
Yup, |
Local benchmarks indicate that this drop base sandbox latency ~20-30ms
Downsides:
Note
Replaces PD cache disks with configurable local NVMe SSDs for client/build nodes, provisioning them via startup script and wiring new disk-count variables through Terraform.
local-ssdNVMe scratch disks (375GB) usingdynamic "disk"andrange(var.*_cluster_cache_disk_count).auto_deleteon client; keep root disk sizing viabuild_cluster_root_disk_size_gb.scripts/start-client.sh):mdadm) when multiple; persist config./etc/fstab, mount at/orchestrator; createsandbox,template,builddirs.LOCAL_CACHE_DISK_COUNTpassed from Terraform.build_cluster_cache_disk_countandclient_cluster_cache_disk_count(with validation) in root and module; pass through inmain.tf.nomad-clustermodule (PD-based settings).Written by Cursor Bugbot for commit 4e008cd. This will update automatically on new commits. Configure here.