Skc baremetal environment#2023
Conversation
|
|
||
| - name: Undeploy baremetals in 'deploy failed' or 'error' state | ||
| ansible.builtin.command: | ||
| cmd: "{{ venv }}/bin/openstack baremetal node undeploy {{ inventory_hostname }}" |
There was a problem hiding this comment.
JohnG "Maybe lets move this into ./baremetal-4-clean.yml, at least for deploy failed."
| - python3-devel | ||
| state: present | ||
|
|
||
| - name: Start and enable the QEMU service |
There was a problem hiding this comment.
We do actually have a role than configures a host for libvirt: https://github.com/stackhpc/ansible-role-libvirt-host. Ideally we'd use that, but is there a reason not to?
There was a problem hiding this comment.
Possibly enable compute_libvirt_enabled?
There was a problem hiding this comment.
Config is here https://github.com/stackhpc/stackhpc-kayobe-config/blob/stackhpc/2025.1/etc/kayobe/compute.yml
Maybe do kolla_enable_nova: true and kolla_enable_nova_libvirt_container: false in kolla.yml?
There was a problem hiding this comment.
Still think these templates should be factor out in a sushy role:
roles/sushy/templates/sushy.conf.j2
...
There was a problem hiding this comment.
I guess it would make sense to handle the tasks for sushy and skc-baremetal in roles though perhaps that's gonna delay this merge? Maybe a refactor for later?
…hpc-kayobe-config into skc-baremetal-environment
|
Build failed. ❌ openstack-tox-pep8 FAILURE in 2m 10s Warning: |
| ironic_redfish_password: "" | ||
| ironic_redfish_verify_ca: false | ||
|
|
||
| ironic_resource_class: "example_resource_class" |
There was a problem hiding this comment.
Maybe everything from line 12 onwards should be in 'baremetal' group_vars rather than 'baremetal-redfish'?
| ironic_provision_image: "{{ stackhpc_overcloud_host_image_version }}" | ||
| ironic_provision_key_name: "stack" | ||
| #new variables for baremetal-0 | ||
| ironic_boot_interface: "redfish-virtual-media" |
There was a problem hiding this comment.
Except from these 3 lines
There was a problem hiding this comment.
Need to reconsider this now we need to support iPXE
|
@claudia-lola @assumptionsandg could you outline the next steps for this change? It's a massive PR at the moment and I think it would make sense to break it down into some smaller steps. |
We're dropping the Sushy environment for now... and will consider merging that later since it's a bit out-of-scope for this PR. I've rebased this on Jack's work and have started the initial work needed to merge the two. We still have some outstanding complications... such as ensuring the 'redfish' groups work with all possible boot interfaces and resolving documentation conflicts. I propose limiting our new scripts to running specifically on the 'redfish' groups since they have only been tested in this configuration and have BMC checks which specifically use the Redfish API. In addition to this... I will ensure the release note for this change mentions that existing users of this mixin will need to make sure nodes currently under 'baremetal-redfish' will need to be moved to 'baremetal-compute-redfish' due to overcloud baremetal support being merged in this change. |
There was a problem hiding this comment.
What is wrong with kayobe baremetal compute register?
There was a problem hiding this comment.
'kayobe baremetal compute register' is limited to only baremetal compute nodes... rather than including the overcloud. It's probably wise to keep a distinction between the two... while supporting both.
In addition to that... it's missing fields we need to configure interfaces individually on baremetal nodes compared to just relying on global Ironic configuration which hasn't been ideal at 6G... for instance when needing to set RAID interfaces etc.
dougszumski
left a comment
There was a problem hiding this comment.
Thanks @assumptionsandg for trimming this back
I think it largely looks good
| --- | ||
|
|
||
| - name: Register baremetal compute nodes |
There was a problem hiding this comment.
nit: Change to "Install OpenStack client"
| {% endfor %} | ||
| --resource-class {{ ironic_resource_class }} \ | ||
| {% if ironic_boot_interface %} | ||
| --boot-interface {{ ironic_boot_interface }} \ |
There was a problem hiding this comment.
Please could we support --deploy-interface? We are going to need to start setting it to autodetect at some point for diskless. We can continue to set it to direct by default
| {% for key, value in ironic_properties.items() %} | ||
| --property {{ key }}={{ value }} \ | ||
| {% endfor %} | ||
| --resource-class {{ ironic_resource_class }} \ |
There was a problem hiding this comment.
At this point - please can we use {{ ironic_resource_class }}_enroll for the resource class, and then change it back at the end. This is to make it clear that the node is not ready for scheduling.
| controller_host: "{{ groups['controllers'][0] }}" | ||
|
|
||
| tasks: | ||
| - name: Check Ironic variables are defined |
There was a problem hiding this comment.
nit:L This check task could be split out into a separate playbook and included here and in other playbooks via an include task to avoid repeating it.
| url: "{{ ironic_redfish_address + '/redfish/v1' }}" | ||
| method: GET | ||
| status_code: 200 | ||
| validate_certs: false |
There was a problem hiding this comment.
validate_certs should be {{ redfish_verify_ca | bool }} given we define it above?
|
|
||
| Baremetal nodes are defined in the inventory located ``stackhpc-baremetal/inventory/hosts`` file. | ||
| This inventory can be hand-written or generated (e.g. from a Python script). | ||
| Each node must have the required Ironic and Redfish variables. |
There was a problem hiding this comment.
We now need to consider the various other hardware types - even if we simply tell people not to use the groups - they are there for legacy reasons.
| dependencies: | ||
| - ci-aio | ||
|
|
||
| Activate the environment using ``source kayobe-config/kayobe-env --environment stackhpc-baremetal`` |
There was a problem hiding this comment.
s/stackhpc-baremetal/baremetal
| Run the full baremetal workflow using:: | ||
|
|
||
| kayobe playbook run \ | ||
| etc/kayobe/environments/stackhpc-baremetal/ansible/baremetal-all.yml |
| - stackhpc_radosgw_usage_exporter_frontend_port | ||
| - stackhpc_radosgw_usage_exporter_backend_port | ||
| - internal_net_name | ||
| - "{{ internal_net_name }}_ips" |
There was a problem hiding this comment.
I think we need- "{{ oob_oc_net_name }}_ips" here too
| - gpu_group_map | ||
| - stackhpc_radosgw_usage_exporter_frontend_port | ||
| - stackhpc_radosgw_usage_exporter_backend_port | ||
| - internal_net_name |
There was a problem hiding this comment.
where is internal_net_name used?
|
|
||
| [baremetal-overcloud-idrac] | ||
| [baremetal-overcloud-ipmi] | ||
| [baremetal-overcloud-redfish] |
There was a problem hiding this comment.
should we create baremetal-overcloud-redfish-vmedia] and `baremetal-overcloud-redfish-ipxe] groups to support both mechanisms?
| # Default is 60 seconds | ||
| heartbeat_timeout = 360 | ||
| sync_local_state_interval = 360 | ||
| # Neccesary for virtual media boot |
There was a problem hiding this comment.
This comes via Kolla now, which is neat
| - node_show.rc != 0 | ||
| changed_when: false | ||
|
|
||
| # NOTE: The openstack.cloud.baremetal_node module cannot be used in this |
There was a problem hiding this comment.
We shifted chunks of this into the Kayobe node commands:
https://github.com/openstack/kayobe/blob/master/ansible/baremetal-compute-register.yml#L52
I think we should switch over to using those and extend wherever needed.
We can use groups to make a distinction between hypervisors and user facing Ironic nodes.
The end goal should be that this all lives in the Kayobe upstream code base, and we can test it there.
|
Happy Friday @claudia-lola, this is a friendly reminder that this PR is waiting for your changes or response. Please take a look when you have a moment! Note: Once your changes are ready, remove the |
| @@ -1,2 +1,4 @@ | |||
| --- | |||
| kolla_enable_ironic: true | |||
| kolla_enable_ironic_dnsmasq: false # NOTE(hollie): need to double check this since it could break iPXE. | |||
There was a problem hiding this comment.
This is a little controversial in 2025.1 as people could be using inspection rules, but is the default in 2026.1 with kolla_inspector_enable_discovery: false.
If no one is using the environment, then it doesn't matter. Perhaps we could do a company survery.
No description provided.