This page contains in-depth details on how to use the Batch Shipyard tool. Please see the Container Image CLI section for information regarding how to use the Docker or Singularity image if not invoking the Python script or pre-built binary directly.
If you installed Batch Shipyard using the install.sh script, then
you can invoke as:
# Change directory to batch-shipyard installed directory
./shipyardYou can also invoke shipyard from any directory if given the full path
to the script.
If you are on Windows and installed using the install.cmd script, then
you can invoke as:
shipyard.cmdIf you installed manually (i.e., took the non-recommended installation path and did not use the installer scripts), then you will need to invoke the Python interpreter and pass the script as an argument. For example:
python3 shipyard.py
The -h or --help option will list the available options, which are
explained below.
Nearly all REST calls or commands that are issued against the normal Azure Batch APIs and tooling such as the Azure Portal or Azure CLI will work fine against Azure Batch Shipyard created resources. However, there are some notable exceptions:
- All pools must be created with Batch Shipyard if you intend to use any Batch Shipyard functionality.
- Please note all of the current limitations for other actions.
- Batch Shipyard pools that are deleted outside of Batch Shipyard will not
have their associated metadata (in Azure Storage) cleaned up. Please use
the
pool delcommand instead. You can use thestoragecommand to clean up orphaned data if you accidentially deleted Batch Shipyard pools outside of Batch Shipyard.
shipyard (and shipyard.py) is invoked with a commands and sub-commands as
positional arguments, i.e.:
shipyard <command> <subcommand> <options>For instance:
shipyard pool add --configdir config
# or equivalent in Linux for this particular command
SHIPYARD_CONFIGDIR=config shipyard pool addWould create a pool on the Batch account as specified in the config files
found in the config directory. Please note that <options> must be
specified after the command and subcommand.
You can issue the -h or --help option at every level to view all
available options for that level and additional help text. For example:
shipyard -h
shipyard pool -h
shipyard pool add -hThere are a set of shared options which are used between most sub-commands. These options must be specified after the command and sub-command. These are:
-y, --yes Assume yes for all confirmation prompts
--raw Output data as returned by the service for
supported operations as raw json
--show-config Show configuration
-v, --verbose Verbose output
--configdir TEXT Configuration directory where all
configuration files can be found. Each
config file must be named exactly the same
as the regular switch option, e.g.,
pool.yaml for --pool. Individually specified
config options take precedence over this
option. This defaults to "." if no other
configuration option is specified.
--credentials TEXT Credentials config file
--config TEXT Global config file
--fs TEXT RemoteFS config file
--pool TEXT Pool config file
--jobs TEXT Jobs config file
--monitor TEXT Resource monitoring config file
--subscription-id TEXT Azure Subscription ID
--keyvault-uri TEXT Azure KeyVault URI
--keyvault-credentials-secret-id TEXT
Azure KeyVault credentials secret id
--aad-endpoint TEXT Azure Active Directory endpoint
--aad-directory-id TEXT Azure Active Directory directory (tenant) id
--aad-application-id TEXT Azure Active Directory application (client)
id
--aad-auth-key TEXT Azure Active Directory authentication key
--aad-authority-url TEXT Azure Active Directory authority URL
--aad-user TEXT Azure Active Directory user
--aad-password TEXT Azure Active Directory password
--aad-cert-private-key TEXT Azure Active Directory private key for X.509
certificate
--aad-cert-thumbprint TEXT Azure Active Directory certificate SHA1
thumbprint
-yor--yesis to assume yes for all confirmation prompts--rawwill output JSON to stdout for the command result. Only a subset of commands support this option. Note many of the supported commands are returning raw JSON body results from the Batch API server, thus the output may change/break if the underlying service version changes. It is important to pin the Batch Shipyard release to a specific version if using this feature and perform upgrade testing/validation for your scenario and workflow between releases. The following commands support this option:account imagesaccount infoaccount quotacert listfed createfed destroyfed listfed jobs addfed jobs delfed jobs listfed jobs termfed jobs zapfed pool addfed pool removefed proxy statusjobs listjobs tasks countjobs tasks listmonitor statuspool autoscale evaluatepool autoscale lastexecpool images listpool images updatepool listpool nodes countpool nodes grlspool nodes listpool nodes pspool nodes prunepool nodes zap
--show-configwill output the merged configuration prior to execution-vor--verboseis for verbose output--configdir pathcan be used instead of the individual config switches below if all configuration files are in one directory and named after their switch. For example, if you have a directory namedconfigand under that directory you have the filescredentials.yaml,config.yaml,pool.yamlandjobs.yaml, then you can use this argument instead of the following individual conf options. If this parameter is not specified or any of the individual conf options, then this paramter defaults to the current working directory (i.e.,.).--credentials path/to/credentials.yamlis required for all actions except for a select fewkeyvaultcommands.--config path/to/config.yamlis required for all actions.--pool path/to/pool.yamlis required for most actions.--jobs path/to/jobs.yamlis required for job-related actions.--fs path/to/fs.yamlis required for fs-related actions and some pool actions.--monitor path/to/monitor.yamlis required for resource monitoring actions.
--subscription-idis the Azure Subscription Id associated with the Batch account or Remote file system resources. This is only required for creating pools with a virtual network specification or withfscommands.--keyvault-uriis required for allkeyvaultcommands.--keyvault-credentials-secret-idis required if utilizing a credentials config stored in Azure KeyVault--aad-endpointis the Active Directory endpoint for the resource. Note that this can cause conflicts for actions that require multiple endpoints for different resources. It is better to specify endpoints explicitly in the credential file.--aad-directory-idis the Active Directory Directory Id (or Tenant Id)--aad-application-idis the Active Directory Application Id (or Client Id)--aad-auth-keyis the authentication key for the application (or client)--aad-authority-urlis the Azure Active Directory Authority URL--aad-useris the Azure Active Directory user--aad-passwordis the Azure Active Directory password for the user--aad-cert-private-keyis the Azure Active Directory Service Principal RSA private key corresponding to the X.509 certificate for certificate-based auth--aad-cert-thumbprintis the X.509 certificate thumbprint for Azure Active Directory certificate-based auth
Note that only one of Active Directory Service Principal or User/Password can
be specified at once, i.e., --aad-auth-key, --aad-password, and
--aad-cert-private-key are mutually exclusive.
Note that the following options can be specified as environment variables instead:
SHIPYARD_AAD_APPLICATION_IDin lieu of--aad-application-idSHIPYARD_AAD_AUTH_KEYin lieu of--aad-auth-keySHIPYARD_AAD_AUTHORITY_URLin lieu of--aad-authority-urlSHIPYARD_AAD_CERT_PRIVATE_KEYin lieu of--aad-cert-private-keySHIPYARD_AAD_CERT_THUMBPRINTin lieu of--aad-cert-thumbprintSHIPYARD_AAD_DIRECTORY_IDin lieu of--aad-directory-idSHIPYARD_AAD_ENDPOINTin lieu of--aad-endpointSHIPYARD_AAD_PASSWORDin lieu of--aad-passwordSHIPYARD_AAD_USERin lieu of--aad-userSHIPYARD_CONFIG_CONFin lieu of--configSHIPYARD_CONFIGDIRin lieu of--configdirSHIPYARD_CREDENTIALS_CONFin lieu of--credentialsSHIPYARD_FS_CONFin lieu of--fsSHIPYARD_JOBS_CONFin lieu of--jobsSHIPYARD_KEYVAULT_CREDENTIALS_SECRET_IDin lieu of--keyvault-credentials-secret-idSHIPYARD_KEYVAULT_URIin lieu of--keyvault-uriSHIPYARD_MONITOR_CONFin lieu of--monitorSHIPYARD_POOL_CONFin lieu of--poolSHIPYARD_SLURM_CONFin lieu of--slurmSHIPYARD_SUBSCRIPTION_IDin lieu of--subscription-id
shipyard has the following top-level commands:
account Batch account actions
cert Certificate actions
data Data actions
diag Diagnostics actions
fed Federation actions
fs Filesystem in Azure actions
jobs Jobs actions
keyvault KeyVault actions
misc Miscellaneous actions
monitor Monitoring actions
pool Pool actions
slurm Slurm on Batch actions
storage Storage actions
accountcommands deal with Batch accountscertcommands deal with certificates to be used with Azure Batchdatacommands deal with data ingress and egress from Azurediagcommands deal with diganostics for Azure Batchfedcommandsd del with Batch Shipyard Federationsfscommands deal with Batch Shipyard provisioned remote filesystems in Azurejobscommands deal with Azure Batch jobs and taskskeyvaultcommands deal with Azure KeyVault secrets for use with Batch Shipyardmisccommands are miscellaneous commands that don't fall into other categoriespoolcommands deal with Azure Batch poolsslurmcommands deal with Slurm on Batchstoragecommands deal with Batch Shipyard metadata on Azure Storage
The account command has the following sub-commands:
images List available VM images available to the Batch account
info Retrieve Batch account information and quotas
list Retrieve a list of Batch accounts and associated quotas in...
quota Retrieve Batch account quota at the subscription level for the...
imageslists available VM images to deploy as compute nodes in a Batch pool--show-unrelatedwill additionally list images that are unrelated for Batch Shipyard. This option is not applicable with--raw.--show-unverifiedwill additionally list images that are unverified
infoprovides information about the specified batch account provided in credentials--nameis the name of the Batch account to query instead of the one specified in credentials--resource-groupis the name of the resource group to use associated with the Batch account instead of the one specified in credentials
listprovides information about all (or a subset) of accounts within the subscription in credentials--resource-groupis the name of the resource group to scope the query to
quotaprovides service level quota information for the subscription for a given location. Requires a valid location argument, e.g.,westus.
The cert command has the following sub-commands:
add Add a certificate to a Batch account
create Create a certificate to use with a Batch...
del Deletes certificate from a Batch account
list List all certificates in a Batch account
addwill add a certificate to the Batch account--fileis the certificate file to add. The operation to transform the cert so it is acceptable for the Batch Service is determined by the file extension. Only.cer,.pemand.pfxfiles are supported. If this option is omitted, theencryption:pfxspecified in the global configuration is used.--pem-no-certswill convert and add the PEM file as a CER in the Batch service without any certificates.--pem-public-keywill convert and add the PEM file as a CER in the Batch service with only the public key.--pfx-passwordis the PFX password to use
createwill create a certificate locally for use with the Batch account.--file-prefixis the PEM and PFX file name prefix to use. If this option is omitted, the global configurationencryption:pfxsection options are used.--pfx-passwordis the PFX passphrase to set. If this option is omitted, the global configurationencryption:pfxsection options are used. If neither are specified, the passphrase is prompted.
delwill delete certificates from the Batch account--sha1specifies the thumbprint to delete. If this option is omitted, then the certificate referenced in the global configuration settingencryption:pfxwill be deleted.
listwill list certificates in the Batch account
Note that in order to use certificates created by cert create for
credential encryption, you must edit your config.yaml to incorporate the
generated certificate and then invoke the cert add command. Please see the
credential encryption guide
for more information.
The data command has the following sub-commands:
files Compute node file actions
ingress Ingress data into Azure
The data files sub-command has the following sub-sub-commands:
list List files for tasks in jobs
node Retrieve file(s) from a compute node
stream Stream a file as text to the local console or...
task Retrieve file(s) from a job/task
files listwill list files for all tasks in jobs--jobidforce scope to just this job id--taskidforce scope to just this task id
files nodewill retrieve a file with node id and filename semantics--all --filespec <nodeid>,<include pattern>can be given to download all files from the compute node with the optional include pattern--filespec <nodeid>,<filename>can be given to download one specific file from compute node
files streamwill stream a file as text (UTF-8 decoded) to the local console or binary if streamed to disk--diskwill write the streamed data as binary to disk instead of output to local console--filespec <jobid>,<taskid>,<filename>can be given to stream a specific file. If<taskid>is set to@FIRSTRUNNING, then the first running task within the job of<jobid>will be used to locate the<filename>.
files taskwill retrieve a file with job, task, filename semantics--all --filespec <jobid>,<taskid>,<include pattern>can be given to download all files for the job and task with an optional include pattern--filespec <jobid>,<taskid>,<filename>can be given to download one specific file from the job and task. If<taskid>is set to@FIRSTRUNNING, then the first running task within the job of<jobid>will be used to locate the<filename>.
ingresswill ingress data as specified in configuration files--to-fs <STORAGE_CLUSTER_ID>transfers data as specified in configuration files to the specified remote file system storage cluster instead of Azure Storage
The diag command has the following sub-commands:
logs Diagnostic log actions
The diag logs sub-command has the following sub-sub-commands:
upload Upload Batch Service Logs from compute node
logs uploadwill upload the Batch compute node service logs to a specified Azure storage container.--cardinalis the zero-based cardinal number of the compute node in the pool to upload from--nodeidis the node id to upload from--waitwill wait until the operation completes
The fed command has the following sub-commands:
create Create a federation
destroy Destroy a federation
jobs Federation jobs actions
list List all federations
pool Federation pool actions
proxy Federation proxy actions
The fed jobs sub-command has the following sub-sub-commands:
add Add jobs to a federation
del Delete a job or job schedule in a federation
list List jobs or job schedules in a federation
term Terminate a job or job schedule in a...
zap Zap a queued unique id from a federation
The fed pool sub-command has the following sub-sub-commands:
add Add a pool to a federation
remove Remove a pool from a federation
The fed proxy sub-command has the following sub-sub-commands:
create Create a federation proxy
destroy Destroy a federation proxy
ssh Interactively login via SSH to federation...
start Starts a previously suspended federation...
status Query status of a federation proxy
suspend Suspend a federation proxy
createwill create a federationFEDERATION_IDis the federation id name--forceforce creates the federation even if a federation with a same id exists.--no-unique-job-idscreates a federation without unique job id enforcement.
destroywill destroy a previously created federationFEDERATION_IDis the federation id name
jobs addsubmits jobs/task groups or job schedules to a federationFEDERATION_IDis the federation id name
jobs delsubmits an action to delete jobs or job schedules from a federationFEDERATION_IDis the federation id name--all-jobsdeletes all jobs in the federation--all-jobschedulesdeletes all job schedules in the federation--job-iddeletes a specific job id. This can be specified multiple times.--job-schedule-iddeletes a specific job schedule id. This can be specified multiple times.
jobs listlists jobs or locates a job or job scheduleFEDERATION_IDis the federation id name--blockedwill list blocked actions--job-idlocates a specific job id--job-schedule-iddeletes a specific job schedule id--queuedwill list queued actions
jobs termsubmits an action to terminate jobs or job schedules from a federationFEDERATION_IDis the federation id name--all-jobsdeletes all jobs in the federation--all-jobschedulesdeletes all job schedules in the federation--forceforces submission of a termination action for a job even if it doesn't exist--job-iddeletes a specific job id. This can be specified multiple times.--job-schedule-iddeletes a specific job schedule id. This can be specified multiple times.
jobs zapremoves a unique id action from a federationFEDERATION_IDis the federation id name--unique-idis the unique id associated with the action to zap
listwill list federations--federation-idwill limit the list to the specified federation id
pool addwill add a pool to a federationFEDERATION_IDis the federation id name--batch-service-urlis the batch service url of the pool id to add instead of read from the credentials configuration--pool-idis the pool id to add instead of the pool id read from the pool configuration
pool removeFEDERATION_IDis the federation id name--allremove all pools from the federation--batch-service-urlis the batch service url of the pool id to remove instead of read from the credentials configuration--pool-idis the pool id to remove instead of the pool id read from the pool configuration
proxy createwill create the federation proxyproxy destroywill destroy the federation proxy--delete-resource-groupwill delete the entire resource group that contains the federation proxy. Please take care when using this option as any resource in the resoure group is deleted which may be other resources that are not Batch Shipyard related.--delete-virtual-networkwill delete the virtual network and all of its subnets--generate-from-prefixwill attempt to generate all resource names using conventions used. This is helpful when there was an issue with monitoring creation/deletion and the original virtual machine resources cannot be enumerated. Note that OS disks cannot be deleted with this option. Please use an alternate means (i.e., the Azure Portal) to delete disks.--no-waitdoes not wait for deletion completion. It is not recommended to use this parameter.
proxy sshwill interactive log into the federation proxy via SSHCOMMANDis an optional argument to specify the command to run. If your command has switches, prefaceCOMMANDwith double dash as per POSIX convention, e.g.,pool ssh -- sudo docker ps -a.--ttyallocates a pseudo-terminal
proxy startwill start a previously suspended federation proxy--no-waitdoes not wait for the restart to complete. It is not recommended to use this parameter.
proxy statuswill query status of a federation proxyproxy suspendsuspends a federation proxy--no-waitdoes not wait for the suspension to complete. It is not recommended to use this parameter.
The fs command has the following sub-commands which work on two different
parts of a remote filesystem:
cluster Filesystem storage cluster in Azure actions
disks Managed disk actions
fs cluster command has the following sub-commands:
add Create a filesystem storage cluster in Azure
del Delete a filesystem storage cluster in Azure
expand Expand a filesystem storage cluster in Azure
orchestrate Orchestrate a filesystem storage cluster in Azure with the...
resize Resize a filesystem storage cluster in Azure.
ssh Interactively login via SSH to a filesystem storage cluster...
start Starts a previously suspended filesystem storage cluster in...
status Query status of a filesystem storage cluster in Azure
suspend Suspend a filesystem storage cluster in Azure
As the fs.yaml configuration file can contain multiple storage cluster
definitions, all fs cluster commands require the argument
STORAGE_CLUSTER_ID after any option below is specified targeting the
storage cluster to perform actions against.
addwill create a remote fs cluster as defined in the fs config filedelwill delete a remote fs cluster as defined in the fs config file--delete-resource-groupwill delete the entire resource group that contains the server. Please take care when using this option as any resource in the resoure group is deleted which may be other resources that are not Batch Shipyard related.--delete-data-diskswill delete attached data disks--delete-virtual-networkwill delete the virtual network and all of its subnets--generate-from-prefixwill attempt to generate all resource names using conventions used. This is helpful when there was an issue with cluster creation/deletion and the original virtual machine(s) resources cannot be enumerated. Note that OS disks and data disks cannot be deleted with this option. Please usefs disks delto delete disks that may have been used in the storage cluster.--no-waitdoes not wait for deletion completion. It is not recommended to use this parameter.
expandexpands the number of disks used by the underlying filesystems on the file server.--no-rebalancerebalances the data and metadata among the disks for better data spread and performance after the disk is added to the array.
orchestratewill create the remote disks and the remote fs cluster as defined in the fs config fileresizeresizes the storage cluster with additional virtual machines as specified in the configuration. This is an experimental feature.sshwill interactively log into a virtual machine in the storage cluster. If neither--cardinalor--hostnameare specified,--cardinal 0is assumed.COMMANDis an optional argument to specify the command to run. If your command has switches, prefaceCOMMANDwith double dash as per POSIX convention, e.g.,fs cluster ssh mycluster -- df -h.--cardinalis the zero-based cardinal number of the virtual machine in the storage cluster to connect to.--hostnameis the hostname of the virtual machine in the storage cluster to connect to--ttyallocates a pseudo-terminal
startwill start a previously suspended storage cluster--no-waitdoes not wait for the restart to complete. It is not recommended to use this parameter.
statusdisplays the status of the storage cluster--detailreports in-depth details about each virtual machine in the storage cluster--hostswill output the public IP to hosts mapping for mounting aglusterfsbased remote filesystem locally.glusterfsmust be allowed in the network security rules for this to work properly.
suspendsuspends a storage cluster--no-waitdoes not wait for the suspension to complete. It is not recommended to use this parameter.
fs disks command has the following sub-commands:
add Create managed disks in Azure
del Delete managed disks in Azure
list List managed disks in resource group
addcreates managed disks as specified in the fs config filedeldeletes managed disks as specified in the fs config file--alldeletes all managed disks found in a specified resource group--delete-resource-groupdeletes the specified resource group--namedeletes a specific named disk in a resource group--no-waitdoes not wait for disk deletion to complete. It is not recommended to use this parameter.--resource-groupdeletes one or more managed disks in this resource group
listlists managed disks found in a resource group--resource-grouplists disks in this resource group only--restrict-scopelists disks only if found in the fs config file
The jobs command has the following sub-commands:
add Add jobs
cmi Cleanup non-native multi-instance jobs
del Delete jobs and job schedules
disable Disable jobs and job schedules
enable Enable jobs and job schedules
list List jobs
migrate Migrate jobs or job schedules to another pool
stats Get statistics about jobs
tasks Tasks actions
term Terminate jobs and job schedules
The jobs tasks sub-command has the following sub-sub-commands:
count Get task counts for a job
del Delete specified tasks in jobs
list List tasks within jobs
term Terminate specified tasks in jobs
addwill add all jobs and tasks defined in the jobs configuration file to the Batch pool--recreatewill recreate any completed jobs with the same id--tailwill tail the specified file of the last job and task added with this command invocation
cmiwill cleanup any stale non-native multi-instance tasks and jobs. Note that this sub-command is typically not required ifauto_completeis set totruein the job specification for the job.--deletewill delete any stale cleanup jobs
delwill delete jobs and job scheudles specified in the jobs configuration file. If an autopool is specified for all jobs and a jobid option is not specified, the storage associated with the autopool will be cleaned up.--all-jobswill delete all jobs found in the Batch account--all-jobscheduleswill delete all job schedules found in the Batch account--jobidforce deletion scope to just this job id--jobscheduleidforce deletion scope to just this job schedule id--termtaskswill manually terminate tasks prior to deletion. Termination of running tasks requires a valid SSH user if the tasks are running on a non-nativecontainer support pool.--waitwill wait for deletion to complete
disablewill disable jobs or job schedules--jobidforce disable scope to just this job id--jobscheduleidforce disable scope to just this job schedule id--requeuerequeue running tasks--terminateterminate running tasks--waitwait for running tasks to complete
enablewill enable jobs or job schedules--jobidforce enable scope to just this job id--jobscheduleidforce enable scope to just this job schedule id
listwill list all jobs in the Batch accountmigratewill migrate jobs or job schedules to another pool. Ensure that the new target pool has the Docker images required to run the job.--jobidforce migration scope to just this job id--jobscheduleidforce migration scope to just this job schedule id--poolidforce migration to this specified pool id--requeuerequeue running tasks--terminateterminate running tasks--waitwait for running tasks to complete
statswill generate a statistics summary of a job or jobs--jobidwill query the specified job instead of all jobs
tasks countwill count the task states within a job--jobidwill query the specified job instead of all jobs
tasks delwill delete tasks within jobs specified in the jobs configuration file. Active or running tasks will be terminated first on non-nativecontainer support pools.--jobidforce deletion scope to just this job id--taskidforce deletion scope to just this task id--waitwill wait for deletion to complete
tasks listwill list tasks from jobs specified in the jobs configuration file--alllist all tasks in all jobs in the account--jobidforce scope to just this job id--poll-until-tasks-completewill poll until all tasks have completed
tasks termwill terminate tasks within jobs specified in the jobs configuration file. Termination of running tasks requires a valid SSH user if tasks are running on a non-nativecontainer support pool.--forceforce send docker kill signal regardless of task state--jobidforce termination scope to just this job id--taskidforce termination scope to just this task id--waitwill wait for termination to complete
termwill terminate jobs and job schedules found in the jobs configuration file. If an autopool is specified for all jobs and a jobid option is not specified, the storage associated with the autopool will be cleaned up.--all-jobswill terminate all jobs found in the Batch account--all-jobscheduleswill terminate all job schedules found in the Batch account--jobidforce termination scope to just this job id--jobscheduleidforce termination scope to just this job schedule id--termtaskswill manually terminate tasks prior to termination. Termination of running tasks requires a valid SSH user if tasks are running on a non-nativecontainer support pool.--waitwill wait for termination to complete
The keyvault command has the following sub-commands:
add Add a credentials config file as a secret to...
del Delete a secret from Azure KeyVault
list List secret ids and metadata in an Azure...
The following subcommands require --keyvault-* and --aad-* options in
order to work. Alternatively, you can specify these in the credentials.yaml
file, but these options are mutually exclusive of other properties.
Please refer to the
Azure KeyVault and Batch Shipyard guide
for more information.
addwill add the specified credentials config file as a secret to an Azure KeyVault. A valid credentials config file must be specified as an option.NAMEargument is required which is the name of the secret associated with the credentials config to store in the KeyVault
delwill delete a secret from the Azure KeyVaultNAMEargument is required which is the name of the secret to delete from the KeyVault
listwill list all secret ids and metadata in an Azure KeyVault
The misc command has the following sub-commands:
mirror-images Mirror Batch Shipyard system images to the...
tensorboard Create a tunnel to a Tensorboard instance for...
mirror-imageswill mirror Batch Shipyard Docker images to the designatedfallback_registryspecified in the global configuration for the version of Batch Shipyard that is executed in the command invocation.tensorboardwill create a tunnel to the compute node that is running or has run the specified task--jobidspecifies the job id to use. If this is not specified, the first and only jobspec is used from jobs.yaml.--taskidspecifies the task id to use. If this is not specified, the last run or running task for the job is used.--logdirspecifies the TensorFlow logs directory generated by summary operations--imagespecifies an alternate TensorFlow image to use for Tensorboard. Thetensorboard.pyfile must be in the expected location in the Docker image as stock TensorFlow images. If not specified, Batch Shipyard will attempt to find a suitable TensorFlow image from Docker images in the global resource list or will acquire one on demand for this command.
The monitor command has the following sub-commands:
add Add a resource to monitor
create Create a monitoring resource
destroy Destroy a monitoring resource
list List all monitored resources
remove Remove a resource from monitoring
ssh Interactively login via SSH to monitoring...
start Starts a previously suspended monitoring...
status Query status of a monitoring resource
suspend Suspend a monitoring resource
addwill add a resource to monitor to an existing monitoring VM--poolidwill add the specified Batch pool to monitor--remote-fswill add the specified RemoteFS cluster to monitor
createwill create a monitoring resource VMdestroywill destroy a monitoring resource VM--delete-resource-groupwill delete the entire resource group that contains the monitoring resource. Please take care when using this option as any resource in the resoure group is deleted which may be other resources that are not Batch Shipyard related.--delete-virtual-networkwill delete the virtual network and all of its subnets--generate-from-prefixwill attempt to generate all resource names using conventions used. This is helpful when there was an issue with monitoring creation/deletion and the original virtual machine resources cannot be enumerated. Note that OS disks cannot be deleted with this option. Please use an alternate means (i.e., the Azure Portal) to delete disks that may have been used by the monitoring VM.--no-waitdoes not wait for deletion completion. It is not recommended to use this parameter.
listwill list all monitored resourcesremovewill remove a resource to monitor to an existing monitoring VM--allwill remove all resources that are currently monitored--poolidwill remove the specified Batch pool to monitor--remote-fswill remove the specified RemoteFS cluster to monitor
sshwill interactively log into the monitoring resource via SSH.COMMANDis an optional argument to specify the command to run. If your command has switches, prefaceCOMMANDwith double dash as per POSIX convention, e.g.,pool ssh -- sudo docker ps -a.--ttyallocates a pseudo-terminal
startwill start a previously suspended monitoring VM--no-waitdoes not wait for the restart to complete. It is not recommended to use this parameter.
statuswill query status of a monitoring VMsuspendsuspends a monitoring VM--no-waitdoes not wait for the suspension to complete. It is not recommended to use this parameter.
The pool command has the following sub-commands:
add Add a pool to the Batch account
autoscale Autoscale actions
del Delete a pool from the Batch account
exists Check if a pool exists
images Container images actions
list List all pools in the Batch account
nodes Compute node actions
rdp Interactively login via RDP to a node in a pool
resize Resize a pool
ssh Interactively login via SSH to a node in a pool
stats Get statistics about a pool
user Remote user actions
The pool autoscale sub-command has the following sub-sub-commands:
disable Disable autoscale on a pool
enable Enable autoscale on a pool
evaluate Evaluate autoscale formula
lastexec Get the result of the last execution of the...
The pool images sub-command has the following sub-sub-commands:
list List container images in a pool
update Update container images in a pool
The pool nodes sub-command has the following sub-sub-commands:
count Get node counts in pool
del Delete a node or nodes from a pool
grls Get remote login settings for all nodes in...
list List nodes in pool
prune Prune container/image data on nodes in pool
ps List running containers on nodes in pool
reboot Reboot a node or nodes in a pool
zap Zap all container processes on nodes in pool
The pool user sub-command has the following sub-sub-commands:
add Add a remote user to all nodes in pool
del Delete a remote user from all nodes in pool
addwill add the pool defined in the pool configuration file to the Batch account--no-waitwill not wait for nodes to provision successfully. This will prevent creation of remote users and is incompatible with certain options that require nodes to be provisioned.--recreatewill delete and recreate the pool if there already exists a pool with the same id. Note that you should only use this option if you are certain that it will not cause side-effects.
autoscale disablewill disable autoscale on the poolautoscale enablewill enable autoscale on the poolautoscale evaluatewill evaluate the autoscale formula in the pool configuration fileautoscale lastexecwill query the last execution information for autoscaledelwill delete the pool defined in the pool configuration file from the Batch account along with associated metadata in Azure Storage used by Batch Shipyard. It is recommended to use this command instead of deleting a pool directly from the Azure Portal, Batch Labs, or other tools as this action can conveniently remove all associated Batch Shipyard metadata on Azure Storage.--poolidwill delete the specified pool instead of the pool from the pool configuration file--waitwill wait for deletion to complete
existswill check if a pool exists under the Batch account--pool-idwill query the specified pool instead of the pool from the pool configuration file
images listwill query the nodes in the pool for Docker images. Common and mismatched images will be listed. Requires a provisioned SSH user and private key.images updatewill update container images on all compute nodes of the pool. This command may require a valid SSH user. This command does not work on Windows. Specific Singularity images updated with--singularity-imagewill not be verified.--docker-imagewill restrict the update to just the Docker image or image:tag--docker-image-digestwill restrict the update to just the Docker image or image:tag and a specific digest--singularity-imagewill restrict the update to just the Singularity image or image:tag--sshwill force the update to occur over an SSH side channel rather than a Batch job.
listwill list all pools in the Batch accountnodes countwill count the node states within a pool--poolidwill query the specified pool instead of the pool from the pool configuration file
nodes delwill delete the specified node from the pool--all-start-task-failedwill delete all nodes in the start task failed state--all-startingwill delete all nodes in the starting state--all-unusablewill delete all nodes in the unusable state--nodeidis the node id to delete
nodes grlswill retrieve all of the remote login settings for every node in the specified pool--no-generate-tunnel-scriptwill disable generating an SSH tunnel script even if enabled in the pool configuration
nodes listwill list all nodes in the specified pool--start-task-failedwill list nodes in start task failed state--unusablewill list nodes in unusable state
nodes prunewill prune unused Docker data. This command requires a provisioned SSH user.--volumeswill also include volumes
nodes pswill list all Docker containers and their status. This command requires a provisioned SSH user.nodes rebootwill reboot a specified node in the pool--all-start-task-failedwill reboot all nodes in the start task failed state--nodeidis the node id to reboot
nodes zapwill send a kill signal to all running Docker containers. This command requires a provisioned SSH user.--no-removewill not remove exited containers--stopwill execute docker stop instead
rdpwill interactively log into a compute node via RDP. If neither--cardinalor--nodeidare specified,--cardinal 0is assumed. This command requires Batch Shipyard executing on Windows with target Windows containers pools.--cardinalis the zero-based cardinal number of the compute node in the pool to connect to as listed bygrls--no-autowill prevent automatic login via temporary credential saving if an RDP password is supplied via the pool configuration file--nodeidis the node id to connect to in the pool
resizewill resize the pool to thevm_countspecified in the pool configuration file--waitwill wait for resize to complete
sshwill interactively log into a compute node via SSH. If neither--cardinalor--nodeidare specified,--cardinal 0is assumed.COMMANDis an optional argument to specify the command to run. If your command has switches, prefaceCOMMANDwith double dash as per POSIX convention, e.g.,pool ssh -- sudo docker ps -a.--cardinalis the zero-based cardinal number of the compute node in the pool to connect to as listed bygrls--nodeidis the node id to connect to in the pool--ttyallocates a pseudo-terminal
statswill generate a statistics summary of the pool--poolidwill query the specified pool instead of the pool from the pool configuration file
user addwill add an SSH or RDP user defined in the pool configuration file to all nodes in the specified pooluser delwill delete the SSH or RDP user defined in the pool configuration file from all nodes in the specified pool
The slurm command has the following sub-commands:
cluster Slurm cluster actions
ssh Slurm SSH actions
The slurm cluster sub-command has the following sub-sub-commands:
create Create a Slurm cluster with controllers and login nodes
destroy Destroy a Slurm controller
orchestrate Orchestrate a Slurm cluster with shared file system and
Batch...
start Starts a previously suspended Slurm cluster
status Query status of a Slurm controllers and login nodes
suspend Suspend a Slurm cluster contoller and/or login nodes
The slurm ssh sub-command has the following sub-sub-commands:
controller Interactively login via SSH to a Slurm controller virtual...
login Interactively login via SSH to a Slurm login/gateway virtual...
node Interactively login via SSH to a Slurm compute node virtual...
cluster createwill create the Slurm controller and login portions of the clustercluster destroywill destroy the Slurm controller and login portions of the cluster--delete-resource-groupwill delete the entire resource group that contains the Slurm resources. Please take care when using this option as any resource in the resoure group is deleted which may be other resources that are not Batch Shipyard related.--delete-virtual-networkwill delete the virtual network and all of its subnets--generate-from-prefixwill attempt to generate all resource names using conventions used. This is helpful when there was an issue with creation/deletion and the original virtual machine resources cannot be enumerated. Note that OS disks cannot be deleted with this option. Please use an alternate means (i.e., the Azure Portal) to delete disks that may have been used by the Slurm resource VMs.--no-waitdoes not wait for deletion completion. It is not recommended to use this parameter.
cluster orchestratewill orchestrate the entire Slurm cluster with a single Batch pool--storage-cluster-idwill orchestrate the specified RemoteFS shared file system
cluster startwill start a previously suspended Slurm cluster--no-controller-nodesdoes not start controller nodes--no-login-nodesdoes not start login nodes--no-waitdoes not wait for the restart to complete. It is not recommended to use this parameter.
cluster statusqueries the status of the Slurm controller and login nodescluster suspendsuspends the Slurm cluster--no-controller-nodesdoes not suspend controller nodes--no-login-nodesdoes not suspend login nodes--no-waitdoes not wait for the suspension to complete. It is not recommended to use this parameter.
ssh controllerwill SSH into the Slurm controller nodes if permitted with the controller SSH userCOMMANDis an optional argument to specify the command to run. If your command has switches, prefaceCOMMANDwith double dash as per POSIX convention, e.g.,pool ssh -- sudo docker ps -a.--offsetis the cardinal offset of the controller node--ttyallocates a pseudo-terminal
ssh loginwill SSH into the Slurm login nodes with the cluster user identityCOMMANDis an optional argument to specify the command to run. If your command has switches, prefaceCOMMANDwith double dash as per POSIX convention, e.g.,pool ssh -- sudo docker ps -a.--offsetis the cardinal offset of the login node--ttyallocates a pseudo-terminal
ssh nodewill SSH into a Batch compute node with the cluster user identityCOMMANDis an optional argument to specify the command to run. If your command has switches, prefaceCOMMANDwith double dash as per POSIX convention, e.g.,pool ssh -- sudo docker ps -a.--node-nameis the required Slurm node name--ttyallocates a pseudo-terminal
The storage command has the following sub-commands:
clear Clear Azure Storage containers used by Batch...
del Delete Azure Storage containers used by Batch...
sas SAS token actions
The storage sas sub-command has the following sub-sub-commands:
create Create a container- or object-level SAS key
clearwill clear the Azure Storage containers used by Batch Shipyard for metadata purposes--poolidwill target a specific pool id rather than from configuration
delwill delete the Azure Storage containers used by Batch Shipyard for metadata purposes--clear-tableswill clear tables instead of deleting them--poolidwill target a specific pool id
sas createwill create a SAS key for containers, file shares, individual blobs or file objects.STORAGE_ACCOUNTis the storage account link to target. This link must be specified as a credential.PATHis the Azure storage path including the container or file share name--createadds a create permission (only applicable to objects)--deleteadds a delete permission--listadds a list permission (only applicable to container/file share)--filecreates a file SAS rather than a blob SAS--readadds a read permission--writeadds a write permission
shipyard pool add --credentials credentials.yaml --config config.yaml --pool pool.yaml
# ... or if all config files are in the current working directory named as above ...
# (note this is strictly not necessary as Batch Shipyard will search the
# current working directory if the options above are not explicitly specified
shipyard pool add --configdir .
# ... or use environment variables instead
SHIPYARD_CONFIGDIR=. shipyard pool addThe above invocation will add the pool specified to the Batch account. Notice that the options and shared options are given after the command and sub-command and not before.
shipyard jobs add --configdir .
# ... or use environment variables instead
SHIPYARD_CONFIGDIR=. shipyard jobs addThe above invocation will add the jobs specified in the jobs.yaml file to the designated pool.
shipyard data files stream --configdir . --filespec job1,task-00000,stdout.txt
# ... or use environment variables instead
SHIPYARD_CONFIGDIR=. shipyard data files stream --filespec job1,task-00000,stdout.txtThe above invocation will stream the stdout.txt file from the job job1 and
task task1 from a live compute node. Because all portions of the
--filespec option are specified, the tool will not prompt for any input.
If using either the Docker image mcr.microsoft.com/azure-batch/shipyard:latest-cli or the Singularity image library://alfpark/batch/shipyard:latest-cli, then you would invoke Batch Shipyard as:
# if using Docker
docker run --rm -it mcr.microsoft.com/azure-batch/shipyard:latest-cli \
<command> <subcommand> <options...>
# if using Singularity
singularity run library://alfpark/batch/shipyard:latest-cli \
<command> <subcommand> <options...>where <command> <subcommand> is the command and subcommand as described
above and <options...> are any additional options to pass to the
<subcommand>.
Invariably, you will need to pass config files to the tool which reside
on the host and not in the container by default. Please use the -v volume
mount option with docker run or -B bind option with singularity run
to mount host directories inside the container. For example, if your Batch
Shipyard configs are stored in the host path
/home/user/batch-shipyard-configs you could modify the invocations as:
# if using Docker
docker run --rm -it \
-v /home/user/batch-shipyard-configs:/configs \
-w /configs \
mcr.microsoft.com/azure-batch/shipyard:latest-cli \
<command> <subcommand> <options...>
# if using Singularity
singularity run \
-B /home/user/batch-shipyard-configs:/configs \
--pwd /configs \
library://alfpark/batch/shipyard:latest-cli \
<command> <subcommand> <options...>Notice that we specified the working directory as -w for Docker or
--pwd for Singularity to match the /configs container path.
Additionally, if you wish to ingress data from locally accessible file systems using Batch Shipyard, then you will need to map additional volume mounts as appropriate from the host to the container.
Batch Shipyard may generate files with some actions, such as adding a SSH
user or creating a pool with an SSH user. In this case, you will need to
create a volume mount with the -v (or -B) option and also ensure that the
pool specification ssh object has a generated_file_export_path property
set to the volume mount path. This will ensure that generated files will be
written to the host and persisted after the docker container exits. Otherwise,
the generated files will only reside within the docker container and
will not be available for use on the host (e.g., SSH into compute node with
generated RSA private key or use the generated SSH docker tunnel script).
For more information regarding remote filesystems and Batch Shipyard, please see this page.
For more information regarding data movement with respect to Batch Shipyard, please see this page.
For more information regarding Multi-Instance Tasks and/or MPI jobs using Batch Shipyard, please see this page.
Please see this page for current limitations.
Visit the recipes directory for different sample Docker workloads using Azure Batch and Batch Shipyard.
Open an issue on the GitHub project page.