Add telco NFV day2ops procedures and IPI configuration for shiftstack#16
Add telco NFV day2ops procedures and IPI configuration for shiftstack#16eshulman2 wants to merge 7 commits into
Conversation
Change-Id: I7c112db92016ddd7bd2c93b5c404455c479bb722
Allow overriding the bootstrap VM flavor (needed because CAPI mode uses the control-plane flavor for bootstrap). Also add additionalTrustBundle to the install-config so bootstrap VMs trust self-signed OSP TLS certs (e.g. Glance endpoint). Change-Id: I9b8c15b5797cbc2066cd1984ee64eab3d4e91a19
Telco flavors pre-exist on the cloud and only have a 'name' field in the topology definition. Guard the flavor-creation task so it only runs when 'ram' is defined, avoiding failures when iterating over these entries. Change-Id: I88c1c19a168615306fc182741be6b6bd2e94343e
Add a variant of the procedure runner that skips result verification, needed for setup steps that do not produce a JUnit report. Change-Id: Ib5dc76d9804c1a6b97e01fc0764392b70d226e76
Add procedures and templates for creating telco worker MachineSets with SR-IOV/DPDK networking, including SR-IOV operator static manifests and defaults for the machineset configuration variables. Change-Id: I223bbcf83d552cc63ff65cc8df6f8c818b6358b3
Add procedures to apply performance profiles and SR-IOV network node policies, and to run testpmd-based DPDK throughput tests via ansible-performance-test. Change-Id: I58f563ed8bf5513a8729f198ecc9281c74add5d5
Add a job definition for the telco (SR-IOV/DPDK) verification pipeline and the nfv_setup playbook that orchestrates the day2ops procedures for machineset creation, performance tuning, and test execution. Change-Id: I01690ab8206ad8ef616f7a0c19c984045f5c548d
| - bootstrap_flavor_override is defined | ||
| - bootstrap_flavor_override | length > 0 | ||
|
|
||
| - name: Install OpenShift cluster using openshift-install (standard) |
There was a problem hiding this comment.
The original block merges nightly_disable_image_policy into the environment - this rewrite drops it. Might break nightly builds? Same for ipi_bootstrap_flavor_workaround.yml.
| name: sriov-network-operator-subscription | ||
| namespace: openshift-sriov-network-operator | ||
| spec: | ||
| channel: "{{ sno_channel.stdout | default('stable') }}" |
There was a problem hiding this comment.
default('stable') only catches undefined, not empty strings (Jinja2 docs). Since the oc get above has failed_when: false, a failure gives stdout: "" - empty channel. default('stable', true) would catch both.
| # Telco MachineSet configuration for SRIOV/DPDK workers | ||
| # Used by the create-telco-machinesets procedure | ||
| telco_machinesets: | ||
| delete_default_workers: true # Whether to delete the default worker machineset |
There was a problem hiding this comment.
delete_default_workers: true with machinesets: [] - if someone includes this procedure without overriding machinesets, all workers get deleted with no replacements. Consider defaulting to false?
| ansible.builtin.set_fact: | ||
| api_ip: "{{ ic_content.platform.openstack.apiFloatingIP }}" | ||
| apps_ip: "{{ ic_content.platform.openstack.ingressFloatingIP }}" | ||
| machines_subnet_name: "{{ ic_content.platform.openstack.machinesSubnet | default('') }}" |
There was a problem hiding this comment.
machinesSubnet | default('') then queries OpenStack unconditionally - empty name returns all subnets and silently picks the first. An assert might be cleaner here.
| project: | ||
| name: "{{ user_cloud }}" | ||
| user: user | ||
| password: redhat |
There was a problem hiding this comment.
Plaintext password in a public repo - should go into configs/secret.yaml (vault-encrypted) or be injected via CI secrets. See similar discussion on PR #10.
|
|
||
| - name: Clone ansible-nfv repository | ||
| ansible.builtin.git: | ||
| repo: "{{ perf_test.ansible_nfv_repo }}" |
There was a problem hiding this comment.
NFV18/ansible-nfv is a small GitHub org (3 repos). Is this a team-controlled fork? The legacy jobs clone ansible-nfv from GerritHub - worth confirming this is the same trusted codebase.
There was a problem hiding this comment.
Yes this is the right one we moved to a new org for OSP 18 automation as there were many changes we had to make. This is the right organization, it is as trustworthy because I'm also the owner of the org :)
imatza-rh
left a comment
There was a problem hiding this comment.
Checked again after your replies - looks like the fixes weren't pushed yet (branch still has the June 22 commits). The big one is the nightly_disable_image_policy env merge in ipi_tenant.yml since that breaks all IPI nightly jobs, not just telco. Also noticed run_procedure_no_verify.yml drops the must-gather and XML reporting from run_procedure.yml - might want to keep those for debugging failed runs.
| project: | ||
| name: "{{ user_cloud }}" | ||
| user: user | ||
| password: redhat |
There was a problem hiding this comment.
This is the only hardcoded password across all job definitions - the rest use configs/secret.yaml. Worth moving there too?
| - bootstrap_flavor_override is defined | ||
| - bootstrap_flavor_override | length > 0 | ||
|
|
||
| - name: Install OpenShift cluster using openshift-install (standard) |
There was a problem hiding this comment.
The original block combines nightly_disable_image_policy into the environment - both paths here (and the workaround file) dropped that, which will break nightly payloads for all IPI jobs.
| success_msg: | | ||
| Running task file {{ procedure_task_file }} | ||
|
|
||
| - name: Run procedure {{ procedure_task_file }} |
There was a problem hiding this comment.
This skips the whole block/rescue/always from run_procedure.yml - so if a procedure fails, no must-gather gets collected and the XML report stays empty. Could you keep the rescue and always blocks and just drop the verification role call?
Add additionalTrustBundle support to IPI install-config so bootstrap VMs can trust self-signed OSP TLS certificates (e.g. Glance endpoint).
Change-Id: I8020620c904f2171e234dfe75580af534945ec5a
Assisted-By: Claude Opus 4.6 (1M context) noreply@anthropic.com