Skip to content

Rapidly updating a parent image can cause a child image to use stale base#6784

Open
metronom72 wants to merge 1 commit into
tilt-dev:masterfrom
metronom72:fix/6634-stale-parent-base-image
Open

Rapidly updating a parent image can cause a child image to use stale base#6784
metronom72 wants to merge 1 commit into
tilt-dev:masterfrom
metronom72:fix/6634-stale-parent-base-image

Conversation

@metronom72

Copy link
Copy Markdown

Problem

Fixes #6634.

Under concurrent rebuilds of a shared base image, a child image can be built
against a stale parent base image and never self-correct. The parent
deployment runs the latest base, but a child keeps an older base baked into its
FROM, so files inherited from the base diverge across images. The larger the
project and the more image relations there are, the easier it is to trigger.

Root Cause

A child image resolves its base (FROM) tag from the base ImageMap. That tag
is published over two channels with different timing:

  • the engine's build result — written synchronously when the build completes;
  • the apiserver ImageMap.Status — flushed asynchronously by the
    dockerimage/cmdimage reconcilers.

In internal/engine/buildcontrol/image_build_and_deployer.go, BuildAndDeploy
builds its imageMapSet from the apiserver (ctrlClient.Get) for every image
target, including ones that are reused (not rebuilt) in this pass. Tilt
de-dups redundant base rebuilds across manifests that share a base image, so when
a base is rebuilt in one manifest and reused in another, the second manifest's
child reads its FROM from the apiserver ImageMap — which can still lag the
propagated result. The engine already considers the child up-to-date, so nothing
re-triggers it and the stale tag becomes permanent.

(Within a single manifest this can't happen: UpdateImageMap mutates the shared
imageMapSet in place and RunBuilds is topologically ordered, so a child
always sees its freshly-built base. The bug is specific to the cross-manifest
reuse path.)

Solution

When seeding imageMapSet, for an image that is reused in this pass, take its
status from the authoritative build result the engine already computed
(TargetQueue.ReusedResults()) instead of the asynchronously-flushed apiserver
ImageMap.Status. Rebuilt targets are still overwritten by UpdateImageMap as
before.

Changes

  • internal/engine/buildcontrol/image_build_and_deployer.go: prefer the reused
    build result's status over the apiserver ImageMap status when building
    imageMapSet.
  • internal/engine/buildcontrol/image_build_and_deployer_test.go: add
    TestMultiStageDockerBuildReusedBaseWithStaleImageMap.

Testing

  • New test reproduces the bug: a base image is reused while the apiserver
    ImageMap status is deliberately stale, and the child must be built FROM the
    latest base tag. It fails without the fix (child built FROM …:tilt-stale)
    and passes with it.
  • Existing cross-manifest / base-image tests still pass, including
    TestManifestsWithCommonAncestorAndTrigger,
    TestManifestsWithTwoCommonAncestors, TestTwoK8sTargetsWithBaseImage* and
    the multi-stage build tests — dedup and trigger-spillover behavior unchanged.
  • No change to the propagation guards in buildcontrols/reducers.go, so the
    infinite-build behavior referenced by engine: fix bugs in image build caching #3542 is unaffected.

A child image resolves its base (FROM) tag from the base ImageMap. That
tag is published over two channels: the engine's build result, which is
written synchronously, and the apiserver ImageMap status, which is
flushed asynchronously by the image reconcilers.

When a base image is reused (not rebuilt) in a build pass -- e.g. it was
just rebuilt in another manifest that shares it -- BuildAndDeploy read
the FROM tag from the apiserver ImageMap status, which can still lag the
propagated result. The child was then built against a stale base tag and
never self-corrected, because the engine already considered it
up-to-date.

Seed the imageMapSet status from the reused build result instead of the
apiserver status, so reused base images inject the latest tag.

Fixes tilt-dev#6634

Signed-off-by: Mikhail Dorokhovich <mikhail@dorokhovich.com>
@metronom72 metronom72 force-pushed the fix/6634-stale-parent-base-image branch from 82dc61a to a2af852 Compare June 19, 2026 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rapidly updating a parent image can cause a child image to use stale base

1 participant