Skip to content

v1.2.6: SSH connection drops (EOF) during Ansible provisioning over IAP tunnel #347

Description

@kthhrv

Title

v1.2.6: SSH connection drops (EOF) during Ansible provisioning over IAP tunnel

Body

Overview

After upgrading from v1.2.5 to v1.2.6 (automatic, via >= 1.1.0 constraint), Packer builds that use use_iap = true with Ansible provisioning fail with SSH EOF errors. The SSH connection drops mid-playbook, consistently after heavier tasks (apt install, SFTP directory copy). Pinning back to v1.2.5 immediately restores normal behaviour.

Environment

  • Packer 1.15.0
  • packer-plugin-ansible v1.1.4
  • ansible-core 2.20.0 (installed via Alpine apk at build time)
  • Cloud Build (GCP) running gcr.io/google.com/cloudsdktool/cloud-sdk:alpine
  • Target: GCE e2-standard-4, ubuntu-2204-lts, use_iap = true, use_os_login = true, internal IP only

Reproduction

Minimal Packer config:

source "googlecompute" "example" {
    # ...
    use_internal_ip = true
    use_iap         = true
    omit_external_ip = true
    use_os_login     = true
}

build {
    sources = ["source.googlecompute.example"]

    provisioner "ansible" {
        playbook_file = "playbook.yml"
        user          = "packer"
        extra_arguments = [
            "--extra-vars", "ansible_become=true",
        ]
    }
}

The playbook installs packages via apt and copies files via the copy module. With v1.2.6, the SSH connection drops (EOF) partway through the playbook. The exact task varies between runs but is always after heavier operations (large apt install or multi-file SFTP transfer).

Observed behaviour

v1.2.5 — Every build completes in ~7 minutes. Tested across dozens of runs over several weeks.

v1.2.6 — Every build fails with SSH EOF. Ansible reports either:

EOF
[ERROR]: Task failed: Timeout (12s) waiting for privilege escalation prompt:
fatal: [default]: UNREACHABLE! => {"changed": false, "msg": "Task failed: Timeout (12s) waiting for privilege escalation prompt:", "unreachable": true}

or just:

EOF

(when ANSIBLE_BECOME_TIMEOUT is raised high enough that the timeout isn't hit before Packer gives up on the dead connection)

The failure point shifts depending on mitigations applied:

  • Default config: fails on template or file tasks after a large apt install
  • With ANSIBLE_PIPELINING=True: gets further (fewer SSH round-trips) but still EOF on copy module tasks that require SFTP
  • With ServerAliveInterval=15 + TCPKeepAlive=yes: still EOF

This rules out Ansible become timeout, SSH keepalives, and task-level issues. The underlying SSH channel through the IAP tunnel is being dropped.

What we tested to isolate

Variable Tested Result
Ubuntu source image (v20260504 vs v20260520) Yes No effect — both fail on v1.2.6
cloud-sdk builder image (pinned May 13 digest vs latest) Yes No effect — both fail on v1.2.6
VM contention (1 vs 8 concurrent VMs) Yes No effect on the EOF
ANSIBLE_BECOME_TIMEOUT=60 Yes Still EOF — not a timeout issue
ANSIBLE_PIPELINING=True Yes Gets further but still EOF on SFTP tasks
SSH keepalives (ServerAliveInterval=15) Yes Still EOF
ANSIBLE_SSH_RETRIES=5 Yes Still EOF
Pin googlecompute plugin to v1.2.5 Yes Immediate fix — builds complete in ~7 min

Likely cause

v1.2.6 bumped several dependencies that touch the SSH path:

  • Go toolchain: 1.24.0 → 1.25.10
  • packer-plugin-sdk: v0.6.4 → v0.6.9
  • golang.org/x/crypto: v0.46.0 → v0.52.0

The x/crypto bump includes changes to the Go SSH implementation. The IAP tunnel proxies SSH through gcloud compute start-iap-tunnel, so the Packer-side SSH client (backed by x/crypto) is on one end and OpenSSH sshd is on the other, with the IAP proxy in between. Something in the new x/crypto SSH client behaviour appears to cause the proxied connection to drop under sustained load.

Workaround

Pin the plugin version:

packer {
    required_plugins {
        googlecompute = {
            version = "1.2.5"
            source  = "github.com/hashicorp/googlecompute"
        }
    }
}

Versions

$ packer version
Packer v1.15.0

# v1.2.5 (works)
Installed plugin github.com/hashicorp/googlecompute v1.2.5

# v1.2.6 (broken)
Installed plugin github.com/hashicorp/googlecompute v1.2.6

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions