status.deploy.stdout can exceed etcd's 2 MiB limit on large clusters, causing reconciliation delays

**What steps did you take:**
[A clear and concise description steps that can be used to reproduce the problem.]

- Run kapp-controller on a cluster with 250+ nodes.
- Deploy an App CR that manages a large number of resources (e.g. DaemonSets or cluster-wide resources that produce one change entry per node in kapp's output).
- Observe the App enters a ReconcileFailed loop and takes ~40 minutes to eventually show Reconcile succeeded.

**What happened:**
[A small description of the issue]

When kapp deploys a large number of resources it produces verbose stdout — on a 250-node cluster this can reach several MiB. kapp-controller writes this verbatim into status.deploy.stdout and then calls the Kubernetes API to persist the status. etcd rejects objects larger than its hard 2 MiB gRPC message limit, causing the UpdateStatus call to fail.

The deployment itself has succeeded; only the status write fails. On the next reconcile cycle kapp produces a much smaller "no-op" diff that fits within the limit, so the status update eventually goes through — but only after repeated failed retries, resulting in a ~40-minute apparent delay.

**What did you expect:**
[A description of what was expected]

The status.deploy.stdout (and other output fields such as status.fetch.stdout, status.inspect.stdout, and status.usefulErrorMessage) should be bounded so that the status object never exceeds etcd's 2 MiB limit regardless of cluster size. When output is clipped the field should begin with a clear truncation notice (e.g. [output truncated]\n) so operators immediately know the field is incomplete, rather than silently truncating without any indication.

**Anything else you would like to add:**
[Additional information that will assist in solving the issue.]

The issue is non-fatal — reconciliation will eventually succeed — but the delay is significant in production (observed: ~40 minutes) and can mask real problems if operators treat ReconcileFailed as a signal to investigate.
The fix is to truncate individual output fields to a reasonable limit (e.g. 1 MiB per field) before writing them into the status struct. Keeping the tail of the output is preferable because the most actionable content (final resource summary, error lines) always appears at the end of kapp output.
Other status string fields that may also grow large in failure scenarios: status.deploy.stderr, status.fetch.stderr, status.usefulErrorMessage

**Environment:**

- kapp Controller version (execute `kubectl get deployment -n kapp-controller kapp-controller -o yaml` and the annotation is `kbld.k14s.io/images`):any version
- Kubernetes version (use `kubectl version`) any version; effect is most pronounced on 250+ node clusters

---
Vote on this request

This is an invitation to the community to vote on issues, to help us prioritize our backlog. Use the "smiley face" up to the right of this comment to vote.

👍 "I would like to see this addressed as soon as possible"
👎 "There are other more important things to focus on right now"

We are also happy to receive and review Pull Requests if you want to help working on this issue.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

status.deploy.stdout can exceed etcd's 2 MiB limit on large clusters, causing reconciliation delays #1839

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

status.deploy.stdout can exceed etcd's 2 MiB limit on large clusters, causing reconciliation delays #1839

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions