Skip to content

DiskPressure: overlayfs snapshot accumulation accelerates significantly from 1.57.0 → 1.58.0 → 1.59.0 #4825

@tom-birk-easygo

Description

@tom-birk-easygo

Summary

We are running EKS clusters on Bottlerocket across multiple regions and observe persistent DiskPressure caused by unbounded growth in:

/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/

Nodes fill to the kubelet eviction threshold within hours to days of launch under normal rolling deployment churn. GC tuning (image-gc-high-threshold, image-gc-low-threshold) does not resolve this — the accumulation persists regardless.

Fill rate acceleration across versions

We have measured a significant acceleration in fill rate across Bottlerocket versions:

BR Version containerd Fill rate observed Disk / node type
1.57.0 (k8s-1.33) 1.7.30 ≥ 8.9 GB/h 500 GB / m6i.4xlarge
1.58.0 2.1.6 2.9–3.3 GB/h 500 GB / c6a.4xlarge
1.59.0 2.1.6 ≥ 17.2 GB/h 500 GB / m6in.8xlarge

Note: rates marked ≥ are lower bounds derived from youngest DiskPressure node age. Staging rates are measured directly via kubelet proxy stats API. Different instance types and workloads mean this is not a controlled comparison — but the pattern is consistent across clusters.

What we observe on affected nodes

From logdog bundles collected via SSM on DiskPressure nodes:

  • containerd-shim-runc-v2 processes become unresponsive after container exit, logging context deadline exceeded every 2 seconds indefinitely via ttrpc — never cleaned up
  • Snapshot counts significantly exceed active mount counts (e.g. 264 snapshots, 107 active mounts → 157 snapshots with no active mount)
  • Zombie task directory count exceeds the number of active pods
  • sync_remove = false (AMI default) — async snapshot removal goroutine is abandoned if the shim is killed before it completes, leaving the snapshot on disk silently
  • discard_unpacked_layers = false (AMI default) — compressed image tarballs are retained in the content store permanently, contributing a second independent growth vector (57 GB content store observed vs 8.9 GB on a clean node)

Root cause hypothesis

This matches the upstream bug reported and fixed in containerd:

The discard_unpacked_layers = false default is tracked separately in #3314. We confirmed this setting is not configurable via Bottlerocket userdata or apiclient.

Questions

  1. Which Bottlerocket release will include the containerd build containing the fixes from containerd PRs #12400 and #12397?
  2. Is there a configuration workaround available in the meantime — e.g. via [plugins."io.containerd.gc.v1.scheduler"] settings in userdata, or a supported snapshotter alternative?

Environment

  • Bottlerocket versions: 1.57.0, 1.58.0, 1.59.0
  • Kubernetes: 1.33
  • containerd: 1.7.30 (1.57.0), 2.1.6 (1.58.0 and 1.59.0)
  • Regions: eu-west-1, ca-central-1
  • Disk: 500 GB
  • Workload: high-churn rolling deployments (CI/CD), ~4,000 active pods per cluster

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions