Skip to content

Latest commit

 

History

History
73 lines (61 loc) · 1.57 KB

File metadata and controls

73 lines (61 loc) · 1.57 KB

Docker Swarm Integration

Purpose

Docker Swarm integration allows orchestrated GPU workloads to be deployed across multiple nodes by leveraging GPU UUIDs and Docker’s resource framework.

Docker Daemon Configuration for Swarm

Configure each swarm node's Docker daemon with GPU resources in /etc/docker/daemon.json:

{
  "default-runtime": "amd",
  "runtimes": {
    "amd": {
      "path": "amd-container-runtime",
      "runtimeArgs": []
    }
  },
  "node-generic-resources": [
    "AMD_GPU=0x378041e1ada6015",
    "AMD_GPU=0xef39dad16afb86ad",
    "GPU_COMPUTE=0x583de6f2d99dc333"
  ]
}

After updating the configuration, restart the Docker daemon:

sudo systemctl restart docker

Deploy GPU Enabled Services

Deploy services with specific GPU requirements using docker-compose:

Using generic resources:

# docker-compose.yml for Swarm deployment
version: '3.8'
services:
  rocm-service:
    image: rocm/dev-ubuntu-24.04
    command: rocm-smi
    deploy:
      replicas: 1
      resources:
        reservations:
          generic_resources:
            - discrete_resource_spec:
                kind: 'AMD_GPU'  # Matches daemon.json key
                value: 1

Using environment variables:

# docker-compose.yml for Swarm deployment with environment variable
version: '3.8'
services:
  rocm-service:
    image: rocm/dev-ubuntu-24.04
    command: rocm-smi
    environment:
      - AMD_VISIBLE_DEVICES=all
    deploy:
      replicas: 1

Deploy the service:

docker stack deploy -c docker-compose.yml rocm-stack