Skip to content

Skc baremetal environment#2023

Draft
claudia-lola wants to merge 24 commits into
stackhpc/2025.1from
skc-baremetal-environment
Draft

Skc baremetal environment#2023
claudia-lola wants to merge 24 commits into
stackhpc/2025.1from
skc-baremetal-environment

Conversation

@claudia-lola

Copy link
Copy Markdown
Contributor

No description provided.

Comment thread etc/kayobe/environments/stackhpc-baremetal/ansible/baremetal-1-check-bmc-up.yml Outdated

- name: Undeploy baremetals in 'deploy failed' or 'error' state
ansible.builtin.command:
cmd: "{{ venv }}/bin/openstack baremetal node undeploy {{ inventory_hostname }}"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JohnG "Maybe lets move this into ./baremetal-4-clean.yml, at least for deploy failed."

Comment thread etc/kayobe/environments/stackhpc-baremetal/ironic.yml
Comment thread etc/kayobe/environments/stackhpc-sushy-baremetal/ansible/auto-setup.yml Outdated
Comment thread etc/kayobe/environments/stackhpc-sushy-baremetal/ansible/sushy-emulator.yml Outdated
Comment thread etc/kayobe/environments/stackhpc-sushy-baremetal/ansible/vbmc-net.xml.j2 Outdated
Comment thread etc/kayobe/environments/stackhpc-sushy-baremetal/ansible/vbmc-pool.xml.j2 Outdated
Comment thread etc/kayobe/environments/stackhpc-sushy-baremetal/ansible/auto-setup.yml Outdated
- python3-devel
state: present

- name: Start and enable the QEMU service

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do actually have a role than configures a host for libvirt: https://github.com/stackhpc/ansible-role-libvirt-host. Ideally we'd use that, but is there a reason not to?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly enable compute_libvirt_enabled?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Config is here https://github.com/stackhpc/stackhpc-kayobe-config/blob/stackhpc/2025.1/etc/kayobe/compute.yml

Maybe do kolla_enable_nova: true and kolla_enable_nova_libvirt_container: false in kolla.yml?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still think these templates should be factor out in a sushy role:

roles/sushy/templates/sushy.conf.j2
...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it would make sense to handle the tasks for sushy and skc-baremetal in roles though perhaps that's gonna delay this merge? Maybe a refactor for later?

@stackhpc-zuul

stackhpc-zuul Bot commented Mar 3, 2026

Copy link
Copy Markdown

Build failed.
https://zuul.stackhpc.com/t/stackhpc/buildset/5b11ad8f45754ae994970939c90d8eb6

openstack-tox-pep8 FAILURE in 2m 10s

Warning:
Failed to update check run stackhpc/check: 502 Server Error: Bad Gateway for url: https://api.github.com/repos/stackhpc/stackhpc-kayobe-config/check-runs/65583502413

Comment thread etc/kayobe/environments/baremetal/README-fix-merge.rst
Comment thread etc/kayobe/environments/stackhpc-baremetal/inventory/groups Outdated
Comment thread etc/kayobe/environments/stackhpc-baremetal/inventory/groups Outdated
ironic_redfish_password: ""
ironic_redfish_verify_ca: false

ironic_resource_class: "example_resource_class"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe everything from line 12 onwards should be in 'baremetal' group_vars rather than 'baremetal-redfish'?

ironic_provision_image: "{{ stackhpc_overcloud_host_image_version }}"
ironic_provision_key_name: "stack"
#new variables for baremetal-0
ironic_boot_interface: "redfish-virtual-media"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except from these 3 lines

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to reconsider this now we need to support iPXE

Comment thread etc/kayobe/environments/baremetal/README-fix-merge.rst
Comment thread etc/kayobe/environments/stackhpc-sushy-baremetal/kolla.yml Outdated
@Alex-Welsh

Copy link
Copy Markdown
Member

@claudia-lola @assumptionsandg could you outline the next steps for this change? It's a massive PR at the moment and I think it would make sense to break it down into some smaller steps.

@assumptionsandg

Copy link
Copy Markdown
Contributor

@claudia-lola @assumptionsandg could you outline the next steps for this change? It's a massive PR at the moment and I think it would make sense to break it down into some smaller steps.

We're dropping the Sushy environment for now... and will consider merging that later since it's a bit out-of-scope for this PR.

I've rebased this on Jack's work and have started the initial work needed to merge the two. We still have some outstanding complications... such as ensuring the 'redfish' groups work with all possible boot interfaces and resolving documentation conflicts.

I propose limiting our new scripts to running specifically on the 'redfish' groups since they have only been tested in this configuration and have BMC checks which specifically use the Redfish API. In addition to this... I will ensure the release note for this change mentions that existing users of this mixin will need to make sure nodes currently under 'baremetal-redfish' will need to be moved to 'baremetal-compute-redfish' due to overcloud baremetal support being merged in this change.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is wrong with kayobe baremetal compute register?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'kayobe baremetal compute register' is limited to only baremetal compute nodes... rather than including the overcloud. It's probably wise to keep a distinction between the two... while supporting both.

In addition to that... it's missing fields we need to configure interfaces individually on baremetal nodes compared to just relying on global Ironic configuration which hasn't been ideal at 6G... for instance when needing to set RAID interfaces etc.

@dougszumski dougszumski left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @assumptionsandg for trimming this back
I think it largely looks good

Comment on lines +1 to +3
---

- name: Register baremetal compute nodes

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Change to "Install OpenStack client"

{% endfor %}
--resource-class {{ ironic_resource_class }} \
{% if ironic_boot_interface %}
--boot-interface {{ ironic_boot_interface }} \

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please could we support --deploy-interface? We are going to need to start setting it to autodetect at some point for diskless. We can continue to set it to direct by default

{% for key, value in ironic_properties.items() %}
--property {{ key }}={{ value }} \
{% endfor %}
--resource-class {{ ironic_resource_class }} \

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point - please can we use {{ ironic_resource_class }}_enroll for the resource class, and then change it back at the end. This is to make it clear that the node is not ready for scheduling.

controller_host: "{{ groups['controllers'][0] }}"

tasks:
- name: Check Ironic variables are defined

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:L This check task could be split out into a separate playbook and included here and in other playbooks via an include task to avoid repeating it.

url: "{{ ironic_redfish_address + '/redfish/v1' }}"
method: GET
status_code: 200
validate_certs: false

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate_certs should be {{ redfish_verify_ca | bool }} given we define it above?


Baremetal nodes are defined in the inventory located ``stackhpc-baremetal/inventory/hosts`` file.
This inventory can be hand-written or generated (e.g. from a Python script).
Each node must have the required Ironic and Redfish variables.

@dougszumski dougszumski Jun 4, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now need to consider the various other hardware types - even if we simply tell people not to use the groups - they are there for legacy reasons.

dependencies:
- ci-aio

Activate the environment using ``source kayobe-config/kayobe-env --environment stackhpc-baremetal``

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/stackhpc-baremetal/baremetal

Run the full baremetal workflow using::

kayobe playbook run \
etc/kayobe/environments/stackhpc-baremetal/ansible/baremetal-all.yml

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment thread etc/kayobe/kolla.yml
- stackhpc_radosgw_usage_exporter_frontend_port
- stackhpc_radosgw_usage_exporter_backend_port
- internal_net_name
- "{{ internal_net_name }}_ips"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need- "{{ oob_oc_net_name }}_ips" here too

Comment thread etc/kayobe/kolla.yml
- gpu_group_map
- stackhpc_radosgw_usage_exporter_frontend_port
- stackhpc_radosgw_usage_exporter_backend_port
- internal_net_name

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is internal_net_name used?


[baremetal-overcloud-idrac]
[baremetal-overcloud-ipmi]
[baremetal-overcloud-redfish]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we create baremetal-overcloud-redfish-vmedia] and `baremetal-overcloud-redfish-ipxe] groups to support both mechanisms?

# Default is 60 seconds
heartbeat_timeout = 360
sync_local_state_interval = 360
# Neccesary for virtual media boot

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comes via Kolla now, which is neat

- node_show.rc != 0
changed_when: false

# NOTE: The openstack.cloud.baremetal_node module cannot be used in this

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shifted chunks of this into the Kayobe node commands:

https://github.com/openstack/kayobe/blob/master/ansible/baremetal-compute-register.yml#L52

I think we should switch over to using those and extend wherever needed.

We can use groups to make a distinction between hypervisors and user facing Ironic nodes.

The end goal should be that this all lives in the Kayobe upstream code base, and we can test it there.

@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Happy Friday @claudia-lola, this is a friendly reminder that this PR is waiting for your changes or response. Please take a look when you have a moment!

Note: Once your changes are ready, remove the waiting-author-response label and add the waiting-review label.

@@ -1,2 +1,4 @@
---
kolla_enable_ironic: true
kolla_enable_ironic_dnsmasq: false # NOTE(hollie): need to double check this since it could break iPXE.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little controversial in 2025.1 as people could be using inspection rules, but is the default in 2026.1 with kolla_inspector_enable_discovery: false.

If no one is using the environment, then it doesn't matter. Perhaps we could do a company survery.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

waiting-author-response PR is waiting for the author to respond

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants