Skip to content

Add DiB variables to allow platform architecture choice#1497

Closed
MaxBed4d wants to merge 8 commits intostackhpc/2025.1from
aarch64-image-build
Closed

Add DiB variables to allow platform architecture choice#1497
MaxBed4d wants to merge 8 commits intostackhpc/2025.1from
aarch64-image-build

Conversation

@MaxBed4d
Copy link
Copy Markdown
Contributor

@MaxBed4d MaxBed4d commented Feb 6, 2025

No description provided.

@MaxBed4d MaxBed4d requested a review from a team as a code owner February 6, 2025 13:34
@product-auto-label product-auto-label bot added size: s ansible Ansible playbooks kolla workflows Workflow files have been modified labels Feb 6, 2025
@Alex-Welsh
Copy link
Copy Markdown
Member

@MaxBed4d could you fix the merge conflict, then re-request reviews from whoever you need?

@Alex-Welsh
Copy link
Copy Markdown
Member

@MaxBed4d does this depend on any changes to DIB or our DIB elements, or is it ready to go?

@Alex-Welsh
Copy link
Copy Markdown
Member

What's the plan for consuming the aarch images? Are they just going to be pulled manually or do we need to extend the existing bifrost settings

stackhpc_overcloud_host_image_url: "{{ stackhpc_release_pulp_content_url_with_auth }}/kayobe-images/\
{{ openstack_release }}/{{ os_distribution }}/{{ os_release }}/\
{{ 'ofed/' if stackhpc_overcloud_host_image_is_ofed else '' }}\
{{ stackhpc_overcloud_host_image_version }}/\
overcloud-{{ os_distribution }}-{{ os_release }}\
{{ '-ofed' if stackhpc_overcloud_host_image_is_ofed else '' }}.qcow2"

@cityofships cityofships added the arm64 Work related to ARM architecture support label Jun 10, 2025
@MaxBed4d
Copy link
Copy Markdown
Contributor Author

MaxBed4d commented Sep 9, 2025

This PR has been updated, rebased onto stackhpc/2025.1 and tested.

As of today (09/09/25) builds are succeeding on SMS for Rocky9, Ubuntu Noble and Ubuntu Jammy. However, builds are for some reason suddenly failing on LeafCloud without a clear error message, however this can be looked at in a patch following this PR.

The updates include the way in which images are saved and consumed. All x86_64 images are saved under the same path they always have been, whereas, aarch64 images are saved in a specific subdirectory within the x86_64's path.

Once this has been merged, we are able to merge in a PR to enable periodic image builds for all images across both architectures.

@Alex-Welsh
Copy link
Copy Markdown
Member

@MaxBed4d could you rebase & squash this, and take a look at the CI failures?

@MaxBed4d
Copy link
Copy Markdown
Contributor Author

MaxBed4d commented Sep 9, 2025

@MaxBed4d could you rebase & squash this, and take a look at the CI failures?

Yeah, I can tidy up the commits.

The failing CI for the AiO seems to be due to a failed clock synchronisation, which seems to be more appropriate for a separate PR to this one. I will look into the tox pep8 tests to resolve them.

Comment on lines +433 to +437
###############################################################################
# Kolla platform architecture configuration.

kolla_base_arch: "{{ 'aarch64' if stackhpc_cpu_arch == 'arm64' else stackhpc_cpu_arch }}"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why move this? It'll make it harder to sync the upstream kayobe-config when we create a new release

Comment on lines +18 to +19
properties:
- cpu_arch: "{{ overcloud_dib_architecture }}"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used to define the architecture for the os_images role in ansible-collection-openstack. This PR shows it being added.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section implies the default can be used:

    # If an architecture
    # of arm64 or aarch64 is defined, set cpu_arch: aarch64. In
    # all other cases, leave it up to the operator to set
    # properties.cpu_arch.
    cpu_arch_properties:
      {{
          {"cpu_arch": "aarch64"}
          if (item.0.architecture is defined and item.0.architecture in ["arm64", "aarch64"])
          else ({"cpu_arch": "x86_64"} if item.0.architecture is not defined else {})
      }}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But only if the build is for x86, if set to aarch64 there will be issues with the image if cpu_arch isn't set, this variable being set ensures this won't ever be an issue.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default says {"cpu_arch": "aarch64"} if (item.0.architecture is defined and item.0.architecture in ["arm64", "aarch64"])

Are you saying it doesn't work?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reviewing ansible-collection-openstack it seems that there were some additional changes made to the os_images role which I wasn't aware of which makes this variable not needed. It should instead be replaced with an architecture item.

However, this means that all aarch based tests ran in the last 2 weeks are invalid.

Thank you for bringing this to my attention; I will make the appropriate changes and restart testing.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make the appropriate changes and restart testing.

It's probably best to hold off on testing for now and address all the other comments. Then test at the end


# Pick build architecture for kolla and disk image builder
# Choose between x86_64 or arm64
stackhpc_cpu_arch: 'x86_64'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be better if we could evaluate this on a host-by-host basis. I think it'd be better to invert the relationship between this and kolla_base_arch i.e. leave kolla_base_arch as it was, and here say stackhpc_cpu_arch: "{{ 'arm64' if kolla_base_arch == 'aarch64' else kolla_base_arch }}"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to implement a host-by-host check then I think we should also include a manual override for this variable. This is because stackhpc_cpu_arch at the moment also allow users to build images for either cpu architecture regardless of the host's cpu architecture, and I think capability this should be retained.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, though I don't think that changes the decision here. You'll always be able to override the variable no matter what it is.

@cityofships
Copy link
Copy Markdown
Member

We should preferably expand pulp-host-image-download.yml too, so that it covers a new architecture. Could be a new PR

vars:
local_image_path: /opt/kayobe/images/overcloud-{{ os_distribution }}-{{ os_release }}/overcloud-{{ os_distribution }}-{{ os_release }}.qcow2
image_name: overcloud-{{ os_distribution }}-{{ os_release }}
image_name: overcloud-{{ os_distribution }}-{{ os_release }}-{{ cpu_platform }}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean X86 images will start getting published under a new URL? If so, this breaks a couple of variables, e.g. stackhpc_overcloud_host_image_url_no_auth

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having had a look, this shouldn't impact the image URL as defined in stackhpc_overcloud_host_image_url. I have changed this to be a conditional variable to only change the name for arm64.

Copy link
Copy Markdown
Member

@cityofships cityofships left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the docs under doc/source/configuration/host-images.rst to be updated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ansible Ansible playbooks arm64 Work related to ARM architecture support kolla size: s workflows Workflow files have been modified

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants