Skip to content

Add telco NFV day2ops procedures and IPI configuration for shiftstack#16

Open
eshulman2 wants to merge 7 commits into
shiftstack:mainfrom
eshulman2:NFV
Open

Add telco NFV day2ops procedures and IPI configuration for shiftstack#16
eshulman2 wants to merge 7 commits into
shiftstack:mainfrom
eshulman2:NFV

Conversation

@eshulman2

Copy link
Copy Markdown

Add additionalTrustBundle support to IPI install-config so bootstrap VMs can trust self-signed OSP TLS certificates (e.g. Glance endpoint).

Change-Id: I8020620c904f2171e234dfe75580af534945ec5a
Assisted-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

Change-Id: I7c112db92016ddd7bd2c93b5c404455c479bb722
Allow overriding the bootstrap VM flavor (needed because CAPI mode uses
the control-plane flavor for bootstrap). Also add additionalTrustBundle
to the install-config so bootstrap VMs trust self-signed OSP TLS certs
(e.g. Glance endpoint).

Change-Id: I9b8c15b5797cbc2066cd1984ee64eab3d4e91a19
Telco flavors pre-exist on the cloud and only have a 'name' field in the
topology definition. Guard the flavor-creation task so it only runs when
'ram' is defined, avoiding failures when iterating over these entries.

Change-Id: I88c1c19a168615306fc182741be6b6bd2e94343e
Add a variant of the procedure runner that skips result verification,
needed for setup steps that do not produce a JUnit report.

Change-Id: Ib5dc76d9804c1a6b97e01fc0764392b70d226e76
Add procedures and templates for creating telco worker MachineSets with
SR-IOV/DPDK networking, including SR-IOV operator static manifests and
defaults for the machineset configuration variables.

Change-Id: I223bbcf83d552cc63ff65cc8df6f8c818b6358b3
Add procedures to apply performance profiles and SR-IOV network node
policies, and to run testpmd-based DPDK throughput tests via
ansible-performance-test.

Change-Id: I58f563ed8bf5513a8729f198ecc9281c74add5d5
Add a job definition for the telco (SR-IOV/DPDK) verification pipeline
and the nfv_setup playbook that orchestrates the day2ops procedures
for machineset creation, performance tuning, and test execution.

Change-Id: I01690ab8206ad8ef616f7a0c19c984045f5c548d
@eshulman2 eshulman2 requested review from eurijon and imatza-rh and removed request for imatza-rh June 23, 2026 11:50
- bootstrap_flavor_override is defined
- bootstrap_flavor_override | length > 0

- name: Install OpenShift cluster using openshift-install (standard)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original block merges nightly_disable_image_policy into the environment - this rewrite drops it. Might break nightly builds? Same for ipi_bootstrap_flavor_workaround.yml.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

name: sriov-network-operator-subscription
namespace: openshift-sriov-network-operator
spec:
channel: "{{ sno_channel.stdout | default('stable') }}"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default('stable') only catches undefined, not empty strings (Jinja2 docs). Since the oc get above has failed_when: false, a failure gives stdout: "" - empty channel. default('stable', true) would catch both.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

# Telco MachineSet configuration for SRIOV/DPDK workers
# Used by the create-telco-machinesets procedure
telco_machinesets:
delete_default_workers: true # Whether to delete the default worker machineset

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete_default_workers: true with machinesets: [] - if someone includes this procedure without overriding machinesets, all workers get deleted with no replacements. Consider defaulting to false?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

ansible.builtin.set_fact:
api_ip: "{{ ic_content.platform.openstack.apiFloatingIP }}"
apps_ip: "{{ ic_content.platform.openstack.ingressFloatingIP }}"
machines_subnet_name: "{{ ic_content.platform.openstack.machinesSubnet | default('') }}"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

machinesSubnet | default('') then queries OpenStack unconditionally - empty name returns all subnets and silently picks the first. An assert might be cleaner here.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

project:
name: "{{ user_cloud }}"
user: user
password: redhat

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plaintext password in a public repo - should go into configs/secret.yaml (vault-encrypted) or be injected via CI secrets. See similar discussion on PR #10.


- name: Clone ansible-nfv repository
ansible.builtin.git:
repo: "{{ perf_test.ansible_nfv_repo }}"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NFV18/ansible-nfv is a small GitHub org (3 repos). Is this a team-controlled fork? The legacy jobs clone ansible-nfv from GerritHub - worth confirming this is the same trusted codebase.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is the right one we moved to a new org for OSP 18 automation as there were many changes we had to make. This is the right organization, it is as trustworthy because I'm also the owner of the org :)

@eshulman2 eshulman2 requested a review from imatza-rh June 25, 2026 10:02

@imatza-rh imatza-rh left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked again after your replies - looks like the fixes weren't pushed yet (branch still has the June 22 commits). The big one is the nightly_disable_image_policy env merge in ipi_tenant.yml since that breaks all IPI nightly jobs, not just telco. Also noticed run_procedure_no_verify.yml drops the must-gather and XML reporting from run_procedure.yml - might want to keep those for debugging failed runs.

project:
name: "{{ user_cloud }}"
user: user
password: redhat

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only hardcoded password across all job definitions - the rest use configs/secret.yaml. Worth moving there too?

- bootstrap_flavor_override is defined
- bootstrap_flavor_override | length > 0

- name: Install OpenShift cluster using openshift-install (standard)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original block combines nightly_disable_image_policy into the environment - both paths here (and the workaround file) dropped that, which will break nightly payloads for all IPI jobs.

success_msg: |
Running task file {{ procedure_task_file }}

- name: Run procedure {{ procedure_task_file }}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This skips the whole block/rescue/always from run_procedure.yml - so if a procedure fails, no must-gather gets collected and the XML report stays empty. Could you keep the rescue and always blocks and just drop the verification role call?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants