You can also attach Arc cluster and create KubernetesCompute target easily via Azure ML 2.0 CLI.
-
Refer to Install, set up, and use the 2.0 CLI (preview) to install ML 2.0 CLI. Compute attach support requires ml extension >= 2.0.1a4.
-
Attach the Arc-enabled Kubernetes cluster,
az ml compute attach --resource-group
--workspace-name
--name
--resource-id
--type
[--file]
[--no-wait]
Required Parameters
-
--resource-group -gName of resource group. You can configure the default group using
az configure --defaults group=<name>. -
--workspace-name -wName of the Azure ML workspace. You can configure the default group using
az configure --defaults workspace=<name>. -
--name -nName of the compute target.
-
--resource-idThe fully qualified ID of the resource, including the resource name and resource type.
-
--type -tThe type of compute target. Allowed values: kubernetes, AKS, virtualmachine. Specify
kubernetesto attach arc-enabled kubernetes cluster.
Optional Parameters
-
--fileLocal path to the YAML file containing the compute specification. Ignoring this param will allow the default compute configuration for simple compute attach scenario, or specify a YAML file with customized compute defination for advanced attach scenario.
-
--no-waitDo not wait for the long-running operation to finish.
AzureML Kubernetes compute target allows user to specify an attach configuration file for some advanced compute target capabilities. Following is a full example of attach configuration YAML file:
default_instance_type: gpu_instance
namespace: amlarc-testing
instance_types:
- name: gpu_instance
node_selector:
accelerator: nvidia-tesla-k80
resources:
requests:
cpu: 1
memory: 4Gi
"nvidia.com/gpu": 1
limits:
cpu: 1
memory: 4Gi
"nvidia.com/gpu": 1
- name: big_cpu_sku
node_selector: null
resources:
requests:
cpu: 4
memory: 16Gi
"nvidia.com/gpu": 0
limits:
cpu: 4
memory: 16Gi
"nvidia.com/gpu": 0The attach configuration YAML file allows user to specify 3 kind of custom properties for a compute target:
-
namespace- Default todefaultnamespace if this is not specified. This is the namespace where all training job will use and pods will run under this namespace. Note the namespace specified in compute target must preexist and it is usually created with Cluster Admin privilege. -
defaultInstanceType- You must specify adefaultInstanceTypeif you specifyinstanceTypesproperty, and the value ofdefaultInstanceTypemust be one of values frominstanceTypesproperty. -
instanceTypes- This is the list of instance_types to be used for running training job. Each instance_type is defined bynodeSelectorandresources requests/limitsproperties:-
nodeSelector- one or more node labels. Cluster Admin privilege is needed to create labels for cluster nodes. If this is specified, training job will be scheduled to run on nodes with the specified node labels. You can usenodeSelectorto target a subset of nodes for training workload placement. This can be very handy if a cluster has different SKUs, or different type of nodes such as CPU or GPU nodes, and you want to target certain node pool for training workload. For examples, you could create node labels for all GPU nodes and define an instanceType for GPU node pool, in this way you will be able to submit training job to that GPU node pool. -
Resources requests/limits-Resources requests/limitsspecifies resources requests and limits a training job pod to run.
-
Note: User can specify compute target and instance type in job submision. If instance type is not specified,
defaultInstanceTypewill be used.
Note: For simple compute attach without specifying compute configuration file, AzureML will use following configuration for training job. To ensure successful job run completion, we recommend to always specify resources requests/limits according to training job needs.
default_instance_type: defaultInstanceType
namespace: default
instance_types:
- name: defaultInstanceType
node_selector: null
resources:
requests:
cpu: 1
memory: 4Gi
"nvidia.com/gpu": 0
limits:
cpu: 1
memory: 4Gi
"nvidia.com/gpu": 0
It is easy to attach Azure Arc-enabled Kubernetes cluster to AML workspace, you can do so from AML Studio UI portal.
-
Go to AML studio portal, Compute > Attached compute, click "+New" button, and select "Kubernetes (Preview)"
-
Enter a compute name, and select your Azure Arc-enabled Kubernetes cluster from Azure Arc-enabled Kubernetes cluster dropdown list.
-
(Optional) Browse and upload an attach config file. Skip this step to use the default compute configuration for simple compute attach scenario, or specify a YAML file with customized compute defination for advanced attach scenario
-
Click 'Attach' button. You will see the 'provisioning state' as 'Creating'. If it succeeds, you will see a 'Succeeded' state or else 'Failed' state.
You can also attach Arc cluster and create KubernetesCompute target easily via AML Python SDK 1.30 or above.
Following Python code snippets shows how you can easily attach an Arc cluster and create a compute target to be used for training job.
from azureml.core.compute import KubernetesCompute
from azureml.core.compute import ComputeTarget
import os
ws = Workspace.from_config()
# choose a name for your Azure Arc-enabled Kubernetes compute
amlarc_compute_name = os.environ.get("AML_COMPUTE_CLUSTER_NAME", "amlarc-ml")
# resource ID for your Azure Arc-enabled Kubernetes cluster
resource_id = "/subscriptions/123/resourceGroups/rg/providers/Microsoft.Kubernetes/connectedClusters/amlarc-cluster"
if amlarc_compute_name in ws.compute_targets:
amlarc_compute = ws.compute_targets[amlarc_compute_name]
if amlarc_compute and type(amlarc_compute) is KubernetesCompute:
print("found compute target: " + amlarc_compute_name)
else:
print("creating new compute target...")
amlarc_attach_configuration = KubernetesCompute.attach_configuration(resource_id)
amlarc_compute = ComputeTarget.attach(ws, amlarc_compute_name, amlarc_attach_configuration)
amlarc_compute.wait_for_completion(show_output=True)
# For a more detailed view of current KubernetesCompute status, use get_status()
print(amlarc_compute.get_status().serialize())You can also create a compute target with a list of instanceTypes, including custom properties like namespace, nodeSelector, or resources requests/limits. Following Python code snippet shows how to accomplish this.
from azureml.core.compute import KubernetesCompute
from azureml.core.compute import ComputeTarget
import os
ws = Workspace.from_config()
# choose a name for your Azure Arc-enabled Kubernetes compute
amlarc_compute_name = os.environ.get("AML_COMPUTE_CLUSTER_NAME", "amlarc-ml")
# resource ID for your Azure Arc-enabled Kubernetes cluster
resource_id = "/subscriptions/123/resourceGroups/rg/providers/Microsoft.Kubernetes/connectedClusters/amlarc-cluster"
if amlarc_compute_name in ws.compute_targets:
amlarc_compute = ws.compute_targets[amlarc_compute_name]
if amlarc_compute and type(amlarc_compute) is KubernetesCompute:
print("found compute target: " + amlarc_compute_name)
else:
print("creating new compute target...")
ns = "amlarc-testing"
instance_types = {
"gpu_instance": {
"nodeSelector": {
"accelerator": "nvidia-tesla-k80"
},
"resources": {
"requests": {
"cpu": "2",
"memory": "16Gi",
"nvidia.com/gpu": "1"
},
"limits": {
"cpu": "2",
"memory": "16Gi",
"nvidia.com/gpu": "1"
}
}
},
"big_cpu_sku": {
"nodeSelector": {
"VMSizes": "VM-64vCPU-256GB"
}
}
}
amlarc_attach_configuration = KubernetesCompute.attach_configuration(resource_id = resource_id, namespace = ns, default_instance_type="gpu_instance", instance_types = instance_types)
amlarc_compute = ComputeTarget.attach(ws, amlarc_compute_name, amlarc_attach_configuration)
amlarc_compute.wait_for_completion(show_output=True)
# For a more detailed view of current KubernetesCompute status, use get_status()
print(amlarc_compute.get_status().serialize())


