From e42b34e4373bdb23cc0e55c6c0018cd2e10da791 Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Mon, 4 Aug 2025 16:52:41 +0200 Subject: [PATCH 1/7] Update readme for new node_type_map functionality --- README.md | 74 ++++++++++++++++++++++++++++++++++--------------------- 1 file changed, 46 insertions(+), 28 deletions(-) diff --git a/README.md b/README.md index cc4d0ff5..5612b04b 100644 --- a/README.md +++ b/README.md @@ -777,37 +777,38 @@ for signing. The bot calls the script with the two arguments: The section `[architecturetargets]` defines for which targets (OS/SUBDIR), (for example `linux/x86_64/amd/zen2`) the EESSI bot should submit jobs, and which additional `sbatch` parameters will be used for requesting a compute node with the CPU microarchitecture needed to build the software stack. ```ini -arch_target_map = { - "linux/x86_64/generic": "--partition x86-64-generic-node", - "linux/x86_64/amd/zen2": "--partition x86-64-amd-zen2-node" } +node_type_map = { + "cpu_zen2": { + "os": "linux", + "cpu_subdir": "x86_64/amd/zen2", + "slurm_params": "-p rome --nodes 1 --ntasks-per-node 16 --cpus-per-task 1", + "repo_targets": ["eessi.io-2023.06-compat","eessi.io-2023.06-software"] + }, + "gpu_h100": { + "os": "linux", + "cpu_subdir": "x86_64/amd/zen4", + "accel": "nvidia/cc90", + "slurm_params": "-p gpu_h100 --nodes 1 --tasks-per-node 16 --cpus-per-task 1 --gpus-per-node 1", + "repo_targets": ["eessi.io-2023.06-compat","eessi.io-2023.06-software"] + }} ``` -The map has one-to-many entries of the format `OS/SUBDIR: -ADDITIONAL_SBATCH_PARAMETERS`. For your cluster, you will have to figure out -which microarchitectures (`SUBDIR`) are available (as `OS` only `linux` is -currently supported) and how to instruct Slurm to allocate nodes with that -architecture to a job (`ADDITIONAL_SBATCH_PARAMETERS`). +Each entry in the `node_type_map` dictionary describes a build node type. The key is a (descriptive) name for this build node, and its value is a dictionary containing the following build node properties as key-value pairs: + - `os`: its operating system (os) + - `cpu_subdir`: its CPU architecture + - `slurm_params`: the SLURM parameters that need to be passed to submit jobs to it + - `repo_targets`: supported repository targets for this node type + - `accel` (optional): which accelerators this node has +All values are strings, except repo_targets, which is a list of strings. Repository targets listed in `repo_target` should correspond to the repository IDs as defined in the `repos.cfg` file in the `repos_cfg_dir` (see below). -Note, if you do not have to specify additional parameters to `sbatch` to request a compute node with a specific microarchitecture, you can just write something like: +Note that the Slurm parameters should typically be chosen such that a single type of node (with one specific type of CPU and one specific type of GPU) should be allocated. + +To command the bot to build on the `cpu_zen2` node type above, one would give the command `bot:build on:arch=zen2 ...`. To command the bot to build on the `gpu_h100` node type, one would give the command `bot:build on:arch=zen4,accel=nvidia/cc90 ...` -```ini -arch_target_map = { "linux/x86_64/generic": "" } -``` #### `[repo_targets]` section The `[repo_targets]` section defines for which repositories and architectures the bot can run a job. -Repositories are referenced by IDs (or `repo_id`). Architectures are identified -by `OS/SUBDIR` which correspond to settings in the `arch_target_map`. - -```ini -repo_target_map = { - "OS_SUBDIR_1": ["REPO_ID_1_1","REPO_ID_1_2"], - "OS_SUBDIR_2": ["REPO_ID_2_1","REPO_ID_2_2"] } -``` - -For each `OS/SUBDIR` combination a list of available repository IDs can be -provided. The repository IDs are defined in a separate file, say `repos.cfg` which is stored in the directory defined via `repos_cfg_dir`: @@ -911,19 +912,36 @@ event handler will throw an exception when formatting the update of the PR comment corresponding to the job. ```ini -initial_comment = New job on instance `{app_name}` for architecture `{arch_name}`{accelerator_spec} for repository `{repo_id}` in job dir `{symlink}` +new_job_instance_repo = New job on instance `{app_name}` for repository `{repo_id}` ``` -`initial_comment` is used to create a comment to a PR when a new job has been -created. Note, the part '{accelerator_spec}' is only filled-in by the bot if the -argument 'accelerator' to the `bot: build` command has been used. +`new_job_instance_repo` is used as the first line in a comment to a PR when a new job has been created. + +```ini +build_on_arch = Building on: `{on_arch}`{on_accelerator} +``` + +`build_on_arch` is used as the second line in a comment to a PR when a new job has been created. Note that the `on_accelerator` spec is only filled-in by the bot if the `on:...,accel=...` has been passed to the bot. + +```ini +build_for_arch = Building for: `{for_arch}`{for_accelerator} +``` + +`build_for_arch` is used as the third line in a comment to a PR when a new job has been created. Note that the `for_accelerator` spec is only filled-in by the bot if the `for:...,accel=...` has been passed to the bot. + +```ini +jobdir = Job dir: `{symlink}` +``` + +`jobdir` is used as the fourth line in a comment to a PR when a new job has been created. + ```ini with_accelerator =  and accelerator `{accelerator}` ``` `with_accelerator` is used to provide information about the accelerator the job -should build for if and only if the argument `accelerator:X/Y` has been provided. +should build for if and only if the argument `on:...,accel=...` or `for:...,accel=...` has been provided. #### `[new_job_comments]` section From 0b4e22fc3b2cba23bc19c49b07745ca69f822db2 Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Mon, 4 Aug 2025 17:04:56 +0200 Subject: [PATCH 2/7] Add examples of bot build commands --- README.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 5612b04b..0caf77b9 100644 --- a/README.md +++ b/README.md @@ -803,7 +803,11 @@ All values are strings, except repo_targets, which is a list of strings. Reposit Note that the Slurm parameters should typically be chosen such that a single type of node (with one specific type of CPU and one specific type of GPU) should be allocated. -To command the bot to build on the `cpu_zen2` node type above, one would give the command `bot:build on:arch=zen2 ...`. To command the bot to build on the `gpu_h100` node type, one would give the command `bot:build on:arch=zen4,accel=nvidia/cc90 ...` +To command the bot to build on the `cpu_zen2` node type above, one would give the command `bot:build on:arch=zen2 ...`. To command the bot to build on the `gpu_h100` node type, one would give the command `bot:build on:arch=zen4,accel=nvidia/cc90 ...`. + +For a native build (i.e. building for `zen2` on a `zen2` node), one can pass `bot:build on:arch=zen2 for:arch=x86_64/amd/zen2`, or use the short-hand `bot:build for:arch=x86_64/amd/zen2` (i.e. omitting the `on` argument implies a native build). This will trigger a build on the `cpu_zen2` node type (as configured above) and prepare a configuration file in the job directory that instructs to build for a `zen2` CPU architecture. + +For cross-compiling GPU code for Nvidia Compute Capabiltiy 8.0 (and a `zen2` CPU architecture), one would instruct the bot with `bot:build on:arch=zen2 for:arch=x86_64/amd/zen2,accel=nvidia/cc80`. This will trigger a build on the `cpu_zen2` node type (as configured above) and prepare a configuration file in the job directory that instructs to build for a `zen2` CPU architecture with an `nvidia/cc80` GPU architecture. #### `[repo_targets]` section From be7efe42036eaf4bae399bfe1b789e65a19ccbc5 Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Tue, 5 Aug 2025 10:58:15 +0200 Subject: [PATCH 3/7] Process review comments --- README.md | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 0caf77b9..54edfe9d 100644 --- a/README.md +++ b/README.md @@ -805,14 +805,20 @@ Note that the Slurm parameters should typically be chosen such that a single typ To command the bot to build on the `cpu_zen2` node type above, one would give the command `bot:build on:arch=zen2 ...`. To command the bot to build on the `gpu_h100` node type, one would give the command `bot:build on:arch=zen4,accel=nvidia/cc90 ...`. -For a native build (i.e. building for `zen2` on a `zen2` node), one can pass `bot:build on:arch=zen2 for:arch=x86_64/amd/zen2`, or use the short-hand `bot:build for:arch=x86_64/amd/zen2` (i.e. omitting the `on` argument implies a native build). This will trigger a build on the `cpu_zen2` node type (as configured above) and prepare a configuration file in the job directory that instructs to build for a `zen2` CPU architecture. +For a native build (i.e. building for `zen2` on a `zen2` node), one can pass `bot:build on:arch=zen2 for:arch=x86_64/amd/zen2`, or use the short-hand `bot:build for:arch=x86_64/amd/zen2` (i.e. omitting the `on` argument implies a native build; note that the reverse, omitting the `for` argument, does not work). This will trigger a build on the `cpu_zen2` node type (as configured above) and prepare a configuration file in the job directory that instructs to build for a `zen2` CPU architecture. -For cross-compiling GPU code for Nvidia Compute Capabiltiy 8.0 (and a `zen2` CPU architecture), one would instruct the bot with `bot:build on:arch=zen2 for:arch=x86_64/amd/zen2,accel=nvidia/cc80`. This will trigger a build on the `cpu_zen2` node type (as configured above) and prepare a configuration file in the job directory that instructs to build for a `zen2` CPU architecture with an `nvidia/cc80` GPU architecture. +For cross-compiling GPU code for NVIDIA Compute Capabiltiy 8.0 (and a `zen2` CPU architecture), one would instruct the bot with `bot:build on:arch=zen2 for:arch=x86_64/amd/zen2,accel=nvidia/cc80`. This will trigger a build on the `cpu_zen2` node type (as configured above) and prepare a configuration file in the job directory that instructs to build for a `zen2` CPU architecture with an `nvidia/cc80` GPU architecture. +Note that the `arch_target_map` and `repo_target_map` (used in version <=0.8.0) configuration option was replaced by `node_type_map`. The `arch_target_map` and `repo_target_map` that would be equivalent to the `node_type_map` above was + +```ini +arch_target_map = { "linux/x86_64/amd/zen2": "-p rome --nodes 1 --ntasks-per-node 16 --cpus-per-task 1", "linux/x86_64/amd/zen4": "-p gpu_h100 --nodes 1 --tasks-per-node 16 --cpus-per-task 1 --gpus-per-node 1" } +repo_target_map = { "linux/x86_64/amd/zen2": ["eessi.io-2023.06-compat","eessi.io-2023.06-software"], "linux/x86_64/amd/zen4": ["eessi.io-2023.06-compat","eessi.io-2023.06-software"] } +``` #### `[repo_targets]` section -The `[repo_targets]` section defines for which repositories and architectures the bot can run a job. +The `[repo_targets]` section defines where the configuration for the repository targets defined in the `node_type_map` can be found The repository IDs are defined in a separate file, say `repos.cfg` which is stored in the directory defined via `repos_cfg_dir`: From ae691e6f61696df102ee258e652dd6790cd08499 Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Tue, 5 Aug 2025 11:00:02 +0200 Subject: [PATCH 4/7] Fix linting errors --- README.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 54edfe9d..11ea821b 100644 --- a/README.md +++ b/README.md @@ -794,11 +794,13 @@ node_type_map = { ``` Each entry in the `node_type_map` dictionary describes a build node type. The key is a (descriptive) name for this build node, and its value is a dictionary containing the following build node properties as key-value pairs: - - `os`: its operating system (os) - - `cpu_subdir`: its CPU architecture - - `slurm_params`: the SLURM parameters that need to be passed to submit jobs to it - - `repo_targets`: supported repository targets for this node type - - `accel` (optional): which accelerators this node has + +- `os`: its operating system (os) +- `cpu_subdir`: its CPU architecture +- `slurm_params`: the SLURM parameters that need to be passed to submit jobs to it +- `repo_targets`: supported repository targets for this node type +- `accel` (optional): which accelerators this node has + All values are strings, except repo_targets, which is a list of strings. Repository targets listed in `repo_target` should correspond to the repository IDs as defined in the `repos.cfg` file in the `repos_cfg_dir` (see below). Note that the Slurm parameters should typically be chosen such that a single type of node (with one specific type of CPU and one specific type of GPU) should be allocated. @@ -945,7 +947,6 @@ jobdir = Job dir: `{symlink}` `jobdir` is used as the fourth line in a comment to a PR when a new job has been created. - ```ini with_accelerator =  and accelerator `{accelerator}` ``` From 2ad48ec91b44143a7029e8a673f1969f97bf92fb Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen <33718780+casparvl@users.noreply.github.com> Date: Tue, 5 Aug 2025 13:48:46 +0200 Subject: [PATCH 5/7] Update README.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Bob Dröge --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 11ea821b..d4177c1c 100644 --- a/README.md +++ b/README.md @@ -811,7 +811,7 @@ For a native build (i.e. building for `zen2` on a `zen2` node), one can pass `bo For cross-compiling GPU code for NVIDIA Compute Capabiltiy 8.0 (and a `zen2` CPU architecture), one would instruct the bot with `bot:build on:arch=zen2 for:arch=x86_64/amd/zen2,accel=nvidia/cc80`. This will trigger a build on the `cpu_zen2` node type (as configured above) and prepare a configuration file in the job directory that instructs to build for a `zen2` CPU architecture with an `nvidia/cc80` GPU architecture. -Note that the `arch_target_map` and `repo_target_map` (used in version <=0.8.0) configuration option was replaced by `node_type_map`. The `arch_target_map` and `repo_target_map` that would be equivalent to the `node_type_map` above was +Note that the `arch_target_map` and `repo_target_map` (used in version <=0.8.0) configuration options were replaced by `node_type_map`. The `arch_target_map` and `repo_target_map` that would be equivalent to the `node_type_map` above are: ```ini arch_target_map = { "linux/x86_64/amd/zen2": "-p rome --nodes 1 --ntasks-per-node 16 --cpus-per-task 1", "linux/x86_64/amd/zen4": "-p gpu_h100 --nodes 1 --tasks-per-node 16 --cpus-per-task 1 --gpus-per-node 1" } From 769150ddd53dfef21176c0919ac41ef735e84abe Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen <33718780+casparvl@users.noreply.github.com> Date: Tue, 5 Aug 2025 13:50:16 +0200 Subject: [PATCH 6/7] Update README.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Bob Dröge --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index d4177c1c..a35e675d 100644 --- a/README.md +++ b/README.md @@ -820,7 +820,7 @@ repo_target_map = { "linux/x86_64/amd/zen2": ["eessi.io-2023.06-compat","eessi.i #### `[repo_targets]` section -The `[repo_targets]` section defines where the configuration for the repository targets defined in the `node_type_map` can be found +The `[repo_targets]` section defines where the configuration for the repository targets defined in the `node_type_map` can be found. The repository IDs are defined in a separate file, say `repos.cfg` which is stored in the directory defined via `repos_cfg_dir`: From b4d86e0e0649fd2dd1d1b6758926c641148a72ba Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen <33718780+casparvl@users.noreply.github.com> Date: Tue, 5 Aug 2025 13:58:16 +0200 Subject: [PATCH 7/7] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index a35e675d..c139f1d9 100644 --- a/README.md +++ b/README.md @@ -805,7 +805,7 @@ All values are strings, except repo_targets, which is a list of strings. Reposit Note that the Slurm parameters should typically be chosen such that a single type of node (with one specific type of CPU and one specific type of GPU) should be allocated. -To command the bot to build on the `cpu_zen2` node type above, one would give the command `bot:build on:arch=zen2 ...`. To command the bot to build on the `gpu_h100` node type, one would give the command `bot:build on:arch=zen4,accel=nvidia/cc90 ...`. +To command the bot to build on the `cpu_zen2` node type above, one would give the command `bot:build on:arch=zen2 for:...`. To command the bot to build on the `gpu_h100` node type, one would give the command `bot:build on:arch=zen4,accel=nvidia/cc90 for:...`. For a native build (i.e. building for `zen2` on a `zen2` node), one can pass `bot:build on:arch=zen2 for:arch=x86_64/amd/zen2`, or use the short-hand `bot:build for:arch=x86_64/amd/zen2` (i.e. omitting the `on` argument implies a native build; note that the reverse, omitting the `for` argument, does not work). This will trigger a build on the `cpu_zen2` node type (as configured above) and prepare a configuration file in the job directory that instructs to build for a `zen2` CPU architecture.