Skip to content

Provide preseeding job to avoid stuck pre-fetcher image pulls #188

@porridge

Description

@porridge

From time to time the prefetcher pods themselves get stuck in an indefinite wait for an image pull to complete.
One way to avoid this would be to use a "preseed" job with one-per-node pods and an active deadline seconds that prevents an indefinite delay.
This job could launch very early after the cluster is created, such that it hopefully finishes by the time the actual prefetcher is run.

apiVersion: batch/v1
kind: Job
metadata:
  name: preseed
spec:
  completions: $NUM_NODES
  parallelism: $NUM_NODES 
  
  backoffLimit: 
  
  template:
    metadata:
      labels:
        job-name: preseed
    spec:
      # The pod will be forcefully killed if it runs past 30 seconds, even in image pull phase
      activeDeadlineSeconds: 30 
      
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: job-name
                operator: In
                values:
                - preseed
            topologyKey: kubernetes.io/hostname
      containers:
      - name: task
        image: busybox # this is for testing, the actual job would use the prefetcher image and exit 0 immediately
        # Calculate a random time between 20 and 35, echo it, and sleep
        command: 
        - "sh"
        - "-c"
        - |
          SLEEP_TIME=$(( (RANDOM % 16) + 20 ))
          echo "Sleeping for $SLEEP_TIME seconds..."
          sleep $SLEEP_TIME
      restartPolicy: Never # to play nice with activeDeadlineSeconds

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions