Skip to content
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
4415afc
Make a first attempt to change the expected arch_target_map. Make sur…
Apr 17, 2025
d3240e4
Remove old code that was replaced
Apr 17, 2025
dfbcec3
Merge branch 'EESSI:develop' into adapt_arch_target_map
casparvl Jun 30, 2025
b54fdf4
Fix xome small issues with non-existing keys
Jun 30, 2025
1e6c3d9
Avoid doing string += None for the arch_dir if accelerator = None. Al…
Jun 30, 2025
e000c91
Make sure that if the context (i.e. app.cfg) defines AN accelerator, …
Jun 30, 2025
40ceefe
Some cleanup
Jun 30, 2025
fad4f47
Remove repo_target_map from config, and all occurences that import it…
Jul 1, 2025
b4032a6
Fix quotation of keys
Jul 1, 2025
3898df7
Fix flake8 issue
Jul 1, 2025
ef0c430
Unpack the actual arch_target_map by accessing it with a key to get t…
Jul 1, 2025
e3df690
Fix mistake in build path
Jul 8, 2025
424a001
Parse on: and for: options, and pass the correct values on to the com…
Jul 9, 2025
3f00e51
Make sure that the for: arguments are used as build parameters
Jul 10, 2025
2625b30
Change path for job dir so that it represents the 'for' architectures
Jul 10, 2025
ffa2303
More extensive reporting by the bot on what to build for/on
Jul 14, 2025
c98e7e8
This is no longer needed, as it is done with the codecs (decode) now
Jul 14, 2025
5904f11
Print real arch_target_map keys when doing show_config
Jul 14, 2025
7c869f0
Reduce number of possible accelerators per node type to one. Nodes wi…
Jul 14, 2025
179ab0a
Fix app.cfg for the fact that partition_info['accel'] is now a string…
Jul 14, 2025
a9a2585
Make sure that we don't access a dict item that doesn't exist
Jul 14, 2025
51a9c74
Make sure a context match fails if the context doesn't provide e.g. a…
Jul 14, 2025
b12c911
Make old config items invalid, rename to node_type and note_type_map,…
Jul 15, 2025
697dc6e
Update the status command to account for the new on:... for:... syntax
Jul 16, 2025
c0fe051
Remove debugging print statements
Jul 16, 2025
f179b66
Warn about the removal of the repo_target_map
Jul 16, 2025
aad663e
Fix typo
Jul 16, 2025
be8c7d0
Fix hound issues
Jul 16, 2025
7f766f4
Format releveant output of show_config as code
Jul 16, 2025
d205598
Rephrase to make things more clear
Jul 16, 2025
ebcc7fd
Forgot to add this new file...
Jul 16, 2025
0a8bc9b
Fix hound issues
Jul 16, 2025
81257db
Update build params call signature
Jul 16, 2025
f974463
Fix example argument, and argument used to create build parameters in…
Jul 16, 2025
4104796
Forgot to actually git add this file again... anyway, updated the syn…
Jul 16, 2025
0b82386
Update the app.cfg used for the unit tests to account for the changes…
Jul 16, 2025
372a7fe
Update tests for new requirement that all filters have to be present …
Jul 16, 2025
d2be02a
Update tests to accomodate for new behaviour of filter checking that …
Jul 16, 2025
6b3a118
Fix hound issues
Jul 16, 2025
3b310f5
Fix flake8 issues
Jul 16, 2025
de0bd1c
Removed some comments that were only there for development, no longer…
Jul 21, 2025
d4ecc7b
Apply suggestions from code review
casparvl Jul 24, 2025
af731e1
Re-comment the awaits_release, as this was done in develop as well. T…
Jul 28, 2025
d48b355
Replace Partition with Node type in show_config output. Also, update …
Jul 28, 2025
6017433
Processed various smaller review comments for tasks/build.py. Elabora…
Jul 28, 2025
279e08f
Apply suggestions from code review
casparvl Jul 29, 2025
80f5f1d
Fix indentation issue
Jul 29, 2025
2f3c0ae
Update tasks/build.py
casparvl Jul 31, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 69 additions & 10 deletions app.cfg.example
Original file line number Diff line number Diff line change
Expand Up @@ -306,19 +306,78 @@ signing =

[architecturetargets]
# defines for which architectures the bot will build and what job submission
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# defines for which architectures the bot will build and what job submission
# defines for which architectures (CPU and/or GPU) the bot can build and what job submission

# parameters shall be used to allocate a compute node with the correct
# parameters shall be used to allocate a compute node with the correct
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# parameters shall be used to allocate a compute node with the correct
# parameters shall be used to allocate a compute node with the correct CPU(+GPU) architecture

# The keys of the arch_target_map are virtual partition names. They don't have any meaning in the bot code,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# The keys of the arch_target_map are virtual partition names. They don't have any meaning in the bot code,
# The keys of the arch_target_map are just strings. They don't have any meaning in the bot code,

"virtual" and "partition" carry some meaning which could be misleading.

# and can thus be chosen as desired.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# and can thus be chosen as desired.
# and can thus be chosen as desired (however, they could be standardised across different bot instances run by an organisation, so users easily understand what they mean).

# Note that you are responsible that ANY bot:build command ONLY matches a single virtual partition!
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now this is the case, but with #310 which only supports exact matches this wouldn't be an issue any longer.

Suggested change
# Note that you are responsible that ANY bot:build command ONLY matches a single virtual partition!
# Note that you are responsible that ANY bot:build command ONLY matches a single key for the architecture!

Actually, this is a little off. Maybe it would be better to explain that (without #310) if one has defined the keys zen4 and zen4+H100, then a bot:build arch:zen4 would result in two jobs. So, at the moment, none of the keys should be a substring of another key.

# If multiple partitions match the same bot:build command, a failure will be triggered in the job dir preparation
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This triggering of a failure is a new feature?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's not really a feature :P I think it is just something that emerges since you can now have multiple items in the arch_target_map that match a filter - that situation couldn't happen before I think. We need to consider how to deal with it. Currently, I give bot admins sufficient knobs to turn to make sure a filter only matches one partition. But we can deal with it differently if we so prefer - e.g. we could decide that the first matching partition is the one where the job will be submitted, and we break the loop that matches the filter to partitions. In that case, the order in which partitions are listed in arch_target_map becomes important.

arch_target_map = {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be worthwhile to explain the structure of the entries in the map and what the key/value pairs are used for?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean when one uses architecture:cpu_zen2 (or architecture:anykey_without_meaning) with a bot:build command?

  • bot performs check X, Y, Z (if any)
  • bot uses slurm_params value for submitting a job
  • bot does x with os and y with cpu_subdir
  • bot checks value of filter repository against value of repo_targets
  • ...

"linux/x86_64/generic": "--partition x86-64-generic-node",
"linux/x86_64/amd/zen2": "--partition x86-64-amd-zen2-node" }

# This is a CPU-based partition. We do not specify an "accel" property explicitly. In this case, invoking the bot
# with ANY accelerator command will trigger a build on this partition, as long as the CPU type matches. E.g.
# bot: build instance:xyz repo:eessi.io-2023.06-software arch:zen2 accel:nvidia/cc90
# will cause the event_filter to mark this virtual partition as a valid match, and will use it to start building
# for zen2 + nvidia/cc90. Thus, by not specifying an "accel" property, this partition may be used for
# cross-compilation for any accelerator.
"cpu_zen2": {
"os": "linux",
"cpu_subdir": "x86_64/amd/zen2",
"slurm_params": "-p rome --nodes 1 --ntasks-per-node 16 --cpus-per-task 1",
"repo_targets": ["eessi.io-2023.06-compat","eessi.io-2023.06-software"]
},
# This is a CPU partition. We specify an explicit "accel": "None" property. Thus, this partition will only be
# used if the bot build command does NOT contain an accel argument, e.g.
# bot: build instance:xyz repo:eessi.io-2023.06-software arch:zen4
# will cause the event_filter to mark this virtual partition as a valid match, and will use it to start building
# for zen4.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't think will use it to start building for zen4 really captures what is going on. The bot does not know anything about architectures, it will just submit jobs for any matches and uses the slurm_params to submit the job(s). Then EasyBuild attempts to build for the architecture using -march=native (or something similar).

# When invoking the bot with an accelerator command, such as
# bot: build instance:xyz repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90
# the event_filter will NOT mark this virtual partition as a valid match. This is intentional, as this particular
# (example) cluster has a native zen4+cc90 partition (gpu_h100) and we want this command to trigger a native build
# on that partition, rather than cross-compiling on this cpu_zen4 partition.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that really the intention? It changes the meaning of accel as we used it until now (defining for which accelerator we want to build for) to if zen4 architecture target defines "accel": ["None"] it is interpreted as a means to select or not select a build node. If cpu_zen4 architecture target does not define "accel" it is interpreted as a means to define for which accelerator we want to build for).

Plus the selection of the build node is now sometimes the result of only arch and sometimes the result of both arch and accel?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's always been a bit strange to me why accel had a different meaning than arch. I.e. arch was used for node selection, but accel wasn't. The only reason that 'worked' was because we always do native builds, and arch thus actually meant both: give me a node that matches this architecture and build for this architecture (i.e. in this architectures prefix). Somehow, accel was special in that it did not have any effect on node selection. But it has to if we ever want to enable a combination of native and cross-compiled builds (how else do I ensure a native build?).

Anyway, I think this:

if zen4 architecture target defines "accel": ["None"] it is interpreted as a means to select or not select a build node. If cpu_zen4 architecture target does not define "accel" it is interpreted as a means to define for which accelerator we want to build for).

Should be seen differently. If a partition defines "accel": ["None"] that is a declaration that this partition is not suitable for (/should not be used for) compiling for accelerators. If a partition does not define "accel", that is an (implicit) declaration that it may be used to (cross)-compile for any accelerator.

The build command then defines what accelerator we want to build for (e.g. it determines the prefix in which we'll install software). It is the job of the matching logic to then select a partition that can facilitate that request.

This is pretty equivalent to how ReFrame does things with partition features. A ReFrame config can declare "Partition A has a feature 'GPU', Partition B does not". A test can declare "I need a partition with the feature 'GPU' to run". As a result, when running ReFrame on a system with Partition A and B, that test will then only be scheduled on Partition A.

# One could still allow cross-compilation for other accelerator architectures, e.g. cc70 and cc80 by defining
# "accel": ["nvidia/cc70", "nvidia/cc80"]
"cpu_zen4": {
"os": "linux",
"cpu_subdir": "x86_64/amd/zen4",
"accel": ["None"],
"slurm_params": "-p genoa --nodes 1 --ntasks-per-node 24 --cpus-per-task 1",
"repo_targets": ["eessi.io-2023.06-compat","eessi.io-2023.06-software"]
},
# This is a GPU partition. We specify an explicit "accel" property. Thus, only if the bot build command
# specifies that explicit accelerator in combination with the relevant CPU type,
# bot: build instance:xyz repo:eessi.io-2023.06-software arch:icelake accel:nvidia/cc80
# will a build be triggered on this partition
# If you want to use this partition also for CPU only builds, you can alter the "accel" property to
# "accel": ["None", "nvidia/cc80"]
"gpu_a100": {
"os": "linux",
"cpu_subdir": "x86_64/intel/icelake",
"accel": ["nvidia/cc80"],
"slurm_params": "-p gpu_a100 --nodes 1 --tasks-per-node 18 --cpus-per-task 1 --gpus-per-node 1",
"repo_targets": ["eessi.io-2023.06-compat","eessi.io-2023.06-software"]
},
# This is a GPU partition. We specify an explicit "accel" property. Thus, only if the bot build command
# specifies that explicit accelerator in combination with the relevant CPU type,
# bot: build instance:xyz repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90
# will a build be triggered on this partition
# If you want to use this partition also for cross-compiling for cc70 and cc80 architectures, you can alter
# the "accel" property to
# "accel": ["nvidia/cc70", "nvidia/cc80", "nvidia/cc90"]
# Note that setting:
# "accel": ["None", "nvidia/cc90"]
# is invalid here, since it would lead to both the cpu_zen4 and the gpu_h100 partitions matching the build command
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Invalid" in the sense the bot checks and refuses to even start or "invalid" in the sense it is not recommended because it's ambiguous?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid in the sense it will lead to an error (the bot will fail on prepare the job dir for the gpu_h100 partition because that job dir was already prepared for the same job on cpu_zen4).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which would never happen if partitions have unique names and a user targets a partition by its name. Would also be easier to understand what is going on.

# bot: build instance:xyz repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90
# This would cause the same job dir to be prepared twice, for different virtual partitions, which will lead
# to an error in the job preparation step
"gpu_h100": {
"os": "linux",
"cpu_subdir": "x86_64/amd/zen4",
"accel": ["nvidia/cc90"],
"slurm_params": "-p gpu_h100 --nodes 1 --tasks-per-node 16 --cpus-per-task 1 --gpus-per-node 1",
"repo_targets": ["eessi.io-2023.06-compat","eessi.io-2023.06-software"]
}}

[repo_targets]
# defines for which repository a arch_target should be build for
#
# EESSI/2023.06 and EESSI/2025.06
repo_target_map = {
"linux/x86_64/amd/zen2" : ["eessi.io-2023.06-software","eessi.io-2025.06-software"] }

# points to definition of repositories (default repository defined by build container)
repos_cfg_dir = PATH_TO_SHARED_DIRECTORY/repos

Expand Down
33 changes: 16 additions & 17 deletions eessi_bot_event_handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,8 @@

# Local application imports (anything from EESSI/eessi-bot-software-layer)
from connections import github
from tasks.build import check_build_permission, get_architecture_targets, get_repo_cfg, \
request_bot_build_issue_comments, submit_build_jobs
from tasks.build import check_build_permission, get_architecture_targets, request_bot_build_issue_comments, \
submit_build_jobs
from tasks.deploy import deploy_built_artefacts, determine_job_dirs
from tasks.clean_up import move_to_trash_bin
from tools import config
Expand Down Expand Up @@ -104,7 +104,6 @@
config.SECTION_JOB_MANAGER: [
config.JOB_MANAGER_SETTING_POLL_INTERVAL], # required
config.SECTION_REPO_TARGETS: [
config.REPO_TARGETS_SETTING_REPO_TARGET_MAP, # required
config.REPO_TARGETS_SETTING_REPOS_CFG_DIR], # required
config.SECTION_SUBMITTED_JOB_COMMENTS: [
config.SUBMITTED_JOB_COMMENTS_SETTING_INITIAL_COMMENT, # required
Expand Down Expand Up @@ -412,22 +411,22 @@ def handle_pull_request_opened_event(self, event_info, pr, req_chatlevel=ChatLev
# TODO check if PR already has a comment with arch targets and
# repositories
arch_map = get_architecture_targets(self.cfg)
repo_cfg = get_repo_cfg(self.cfg)

comment = f"Instance `{app_name}` is configured to build for:"
architectures = ['/'.join(arch.split('/')[1:]) for arch in arch_map.keys()]
comment += "\n- architectures: "
if len(architectures) > 0:
comment += f"{', '.join([f'`{arch}`' for arch in architectures])}"
else:
comment += "none"
repositories = list(set([repo_id for repo_ids in repo_cfg[config.REPO_TARGETS_SETTING_REPO_TARGET_MAP].values()
for repo_id in repo_ids]))
comment += "\n- repositories: "
if len(repositories) > 0:
comment += f"{', '.join([f'`{repo_id}`' for repo_id in repositories])}"
else:
comment += "none"
for partition_num, arch in enumerate(arch_map):
# Do not print virtual partition names, a bot admin may not want to share those
# Instead, just number them
comment += f"\n- Partition {partition_num+1}:"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would one be able to know the names if they are not shown?

Copy link
Copy Markdown
Contributor Author

@casparvl casparvl Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to know. They could be named foo and bar, it's not relevant to the end user, since the names don't have any meaning (it's just helpful for the bot admin to pick a 'sensible' name :))

current_partition = arch_map[arch]
if "os" in current_partition:
comment += f"\n - OS: {current_partition['os']}"
if "cpu_subdir" in current_partition:
comment += f"\n - CPU architecture: {current_partition['cpu_subdir']}"
if "repo_targets" in current_partition:
comment += f"\n - Repositories: {current_partition['repo_targets']}"
if "accel" in current_partition:
comment += f"\n - Accelerators: {current_partition['accel']}"
comment += "\n"

self.log(f"PR opened: comment '{comment}'")

Expand Down
107 changes: 71 additions & 36 deletions tasks/build.py
Original file line number Diff line number Diff line change
Expand Up @@ -241,8 +241,6 @@ def get_repo_cfg(cfg):
Returns:
(dict): dictionary containing repository settings as follows
- {config.REPO_TARGETS_SETTING_REPOS_CFG_DIR: path to repository config directory as defined in 'app.cfg'}
- {config.REPO_TARGETS_SETTING_REPO_TARGET_MAP: json of
config.REPO_TARGETS_SETTING_REPO_TARGET_MAP value as defined in 'app.cfg'}
- for all sections [repo_id] defined in config.REPO_TARGETS_SETTING_REPOS_CFG_DIR/repos.cfg add a
mapping {repo_id: dictionary containing settings of that section}
"""
Expand All @@ -259,21 +257,6 @@ def get_repo_cfg(cfg):
settings_repos_cfg_dir = config.REPO_TARGETS_SETTING_REPOS_CFG_DIR
repo_cfg[settings_repos_cfg_dir] = repo_cfg_org.get(settings_repos_cfg_dir, None)

repo_map = {}
try:
repo_map_str = repo_cfg_org.get(config.REPO_TARGETS_SETTING_REPO_TARGET_MAP)
log(f"{fn}(): repo_map '{repo_map_str}'")

if repo_map_str is not None:
repo_map = json.loads(repo_map_str)

log(f"{fn}(): repo_map '{json.dumps(repo_map)}'")
except json.JSONDecodeError as err:
print(err)
error(f"{fn}(): Value for repo_map ({repo_map_str}) could not be decoded.")

repo_cfg[config.REPO_TARGETS_SETTING_REPO_TARGET_MAP] = repo_map

if repo_cfg[config.REPO_TARGETS_SETTING_REPOS_CFG_DIR] is None:
return repo_cfg

Expand Down Expand Up @@ -627,13 +610,34 @@ def prepare_jobs(pr, cfg, event_info, action_filter):
return []

jobs = []
for arch, slurm_opt in arch_map.items():
arch_dir = arch.replace('/', '_')
# check if repo_target_map contains an entry for {arch}
if arch not in repocfg[config.REPO_TARGETS_SETTING_REPO_TARGET_MAP]:
log(f"{fn}(): skipping arch {arch} because repo target map does not define repositories to build for")
# This loop assumes the following structure for arch_target_map
# Note that 'accel' is a list, to easily allow a single CPU partition to be used for cross compilation
# for a lot of accelerator targets
# arch_target_map = {
Comment thread
bedroge marked this conversation as resolved.
Outdated
# 'virtual_partition_name': {
# 'os': 'linux',
# 'cpu_subdir': 'x86_64/amd/zen4',
# 'accel': ['nvidia/cc90'],
# 'slurm_params': '-p genoa <etc>',
# 'repo_targets': ["eessi.io-2023.06-compat","eessi.io-2023.06-software"],
# },
# 'virtual_partition_name2': {
# ... etc
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be used at the start of the explanation in app.cfg.example.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll move it.

for virtual_partition_name, partition_info in arch_map.items():
log(f"{fn}(): virtual_partition_name is {virtual_partition_name}, partition_info is {partition_info}")
# Unpack for convenience
arch_dir = partition_info['cpu_subdir']
if 'accel' in partition_info and accelerator is not None:
# Use the accelerator as defined by the action_filter. We check if this is valid for the current
# virtual partition later
arch_dir += accelerator
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arch_dir could become x86_64/amd/zen4nvidia/cc90 ?

Copy link
Copy Markdown
Contributor Author

@casparvl casparvl Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean: the slash after zen4 is missing? That's indeed a mistake

arch_dir.replace('/', '_')
# check if repo_targets is defined for this virtual partition
if 'repo_targets' not in partition_info:
log(f"{fn}(): skipping arch {virtual_partition_name}, "
"because no repo_targets were defined for this (virtual) partition")
continue
for repo_id in repocfg[config.REPO_TARGETS_SETTING_REPO_TARGET_MAP][arch]:
for repo_id in partition_info['repo_targets']:
# ensure repocfg contains information about the repository repo_id if repo_id != EESSI
# Note, EESSI is a bad/misleading name, it should be more like AS_IN_CONTAINER
if (repo_id != "EESSI" and repo_id != "EESSI-pilot") and repo_id not in repocfg:
Expand All @@ -646,13 +650,40 @@ def prepare_jobs(pr, cfg, event_info, action_filter):
# false --> log & continue to next iteration of for loop
if action_filter:
log(f"{fn}(): checking filter {action_filter.to_string()}")
context = {"architecture": arch, "repository": repo_id, "instance": app_name}
log(f"{fn}(): context is '{json.dumps(context, indent=4)}'")
if not action_filter.check_filters(context):
log(f"{fn}(): context does NOT satisfy filter(s), skipping")
continue
context = {
"architecture": partition_info['cpu_subdir'],
"repository": repo_id,
"instance": app_name
}
# Optionally add accelerator to the context
if 'accel' in partition_info:
match = False
# Create a context for each accelerator defined in app.cfg, then
# check if _any_ of them is valid (one is enough to continue)
for accel in partition_info['accel']:
context['accelerator'] = accel
log(f"{fn}(): context is '{json.dumps(context, indent=4)}'")
if not action_filter.check_filters(context):
log(f"{fn}(): context does NOT satisfy filter(s), skipping")
continue
# check = check | action_filter.check_filters(context)
else:
log(f"{fn}(): context DOES satisfy filter(s), going on with job")
match = True
# Break as soon as we have found a valid context, it means the build args are valid
# for at least one of the accelerators in this virtual partition, that's enough
break
# If we get to this point, and none of the contexts matched the filter, we should continue to the
# next iteration of the partition_info['repo_targets'] loop
if not match:
continue
else:
log(f"{fn}(): context DOES satisfy filter(s), going on with job")
log(f"{fn}(): context is '{json.dumps(context, indent=4)}'")
if not action_filter.check_filters(context):
log(f"{fn}(): context does NOT satisfy filter(s), skipping")
continue
else:
log(f"{fn}(): context DOES satisfy filter(s), going on with job")
# we reached this point when the filter matched (otherwise we
# 'continue' with the next repository)
# for each match of the filter we create a specific job directory
Expand All @@ -673,19 +704,23 @@ def prepare_jobs(pr, cfg, event_info, action_filter):
)
comment_download_pr(base_repo_name, pr, download_pr_exit_code, download_pr_error, error_stage)
# prepare job configuration file 'job.cfg' in directory <job_dir>/cfg
cpu_target = '/'.join(arch.split('/')[1:])
os_type = arch.split('/')[0]

log(f"{fn}(): arch = '{arch}' => cpu_target = '{cpu_target}' , os_type = '{os_type}'"
f", accelerator = '{accelerator}'")
msg = f"{fn}(): virtual partition = '{virtual_partition_name}' => "
msg += f"configured cpu_target = '{partition_info['cpu_subdir']}' , "
msg += f"configured os = '{partition_info['os']}', "
if 'accel' in partition_info:
msg += f"configured accelerator(s) = '{partition_info['accel']}, "
msg += f"requested accelerator = '{accelerator}'"
log(msg)

prepare_job_cfg(job_dir, build_env_cfg, repocfg, repo_id, cpu_target, os_type, accelerator)
prepare_job_cfg(job_dir, build_env_cfg, repocfg, repo_id, partition_info['cpu_subdir'],
partition_info['os'], accelerator)

if exportvars:
prepare_export_vars_file(job_dir, exportvars)

# enlist jobs to proceed
job = Job(job_dir, arch, repo_id, slurm_opt, year_month, pr_id, accelerator)
job = Job(job_dir, partition_info['cpu_subdir'], repo_id, partition_info['slurm_params'], year_month,
pr_id, accelerator)
jobs.append(job)

log(f"{fn}(): {len(jobs)} jobs to proceed after applying white list")
Expand Down
1 change: 0 additions & 1 deletion tools/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,6 @@
NEW_JOB_COMMENTS_SETTING_AWAITS_LAUNCH = 'awaits_launch'

SECTION_REPO_TARGETS = 'repo_targets'
REPO_TARGETS_SETTING_REPO_TARGET_MAP = 'repo_target_map'
REPO_TARGETS_SETTING_REPOS_CFG_DIR = 'repos_cfg_dir'

SECTION_RUNNING_JOB_COMMENTS = 'running_job_comments'
Expand Down
13 changes: 13 additions & 0 deletions tools/filter.py
Original file line number Diff line number Diff line change
Expand Up @@ -303,4 +303,17 @@ def check_filters(self, context):
else:
check = False
break

# If the context declares an accelerator, but the build command did not (i.e. no action filter is defined
# for the accelerator component) then the check should only return True if "None" was the accelerator defined
# in the context. This ensures that no CPU-only builds are done on accelerated partitions, unless these
# partitions are explicitly configured with "None" as _one_ of the valid accelerators in their `accel:` list
# in app.cfg
if (
FILTER_COMPONENT_ACCEL in context and not
any(af.component == FILTER_COMPONENT_ACCEL for af in self.action_filters)
):
if not context[FILTER_COMPONENT_ACCEL] == "None":
check = False

return check