Skip to content

COS-4032: Add numad and numactl packages#4051

Merged
dustymabe merged 2 commits into
coreos:testing-develfrom
angelcerveraroldan:support-numa-numad
Apr 9, 2026
Merged

COS-4032: Add numad and numactl packages#4051
dustymabe merged 2 commits into
coreos:testing-develfrom
angelcerveraroldan:support-numa-numad

Conversation

@angelcerveraroldan
Copy link
Copy Markdown
Member

Add numad and numactl to x86_64, aarch64, and ppc64le. numad is not available for s390x, so we will not install the packages in that architecture.

@angelcerveraroldan
Copy link
Copy Markdown
Member Author

angelcerveraroldan commented Mar 12, 2026

Tests seems to be failing due to a kernel warning:

arch/x86/kernel/smpboot.c:332 topology_sane.isra.0+0xc5/0x200

The error may or may not show up depending on the host CPU.

Depends on coreos/coreos-assembler#4479 (which has now been merged)

@angelcerveraroldan angelcerveraroldan added the jira For syncing to Jira. Only works for issues (i.e. not PRs) label Mar 16, 2026
Comment thread tests/kola/numad/config.bu
Comment thread tests/kola/numad/test.sh Outdated
# run a process and check that the daemon is actively monitoring it
podman run --privileged \
--name pod-stress-ng --pid=host \
ghcr.io/colinianking/stress-ng \
Copy link
Copy Markdown
Member

@dustymabe dustymabe Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulling from a namespace/registry we don't control is not something we typically do.

stress-ng is in fedora so we can just build our own container for it, but TBH it would be better to get rid of most of those containers and do something like coreos/coreos-assembler@8dbfe3e in more places (at least for tests that can only run on qemu platform).

Copy link
Copy Markdown
Member Author

@angelcerveraroldan angelcerveraroldan Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have modidfied the test to use rpm-ostree install --apply-live stress-ng instead of pulling the container, would this be better?

I have tested it on x86 (locally, CI in this PR) and ARM (debug-pod) with this change and it seems to work. I still need to test ppc64le.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have modidfied the test to use rpm-ostree install --apply-live stress-ng instead of pulling the container, would this be better?

not exactly. rpm-ostree install won't work on rhel-coreos where there aren't any yum repos configured by default.

I have an idea for how to implement coreos/coreos-assembler@8dbfe3e more generically and use it here, but I think I'm obligated to review coreos/coreos-assembler#4377 and help nikita get that merged first before I make any more changes to the code base there. Let's sync early next week to see if that has merged or not.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, thanks.

@angelcerveraroldan angelcerveraroldan changed the title Add numad and numactl packages COS-4032: Add numad and numactl packages Mar 19, 2026
@angelcerveraroldan
Copy link
Copy Markdown
Member Author

/retest

@angelcerveraroldan angelcerveraroldan force-pushed the support-numa-numad branch 6 times, most recently from 02ed041 to dbfe3a2 Compare March 25, 2026 10:40
Comment thread manifests/system-configuration.yaml Outdated
Comment thread manifests/system-configuration.yaml Outdated
Comment thread tests/kola/numad/test.sh Outdated
dustymabe added a commit to dustymabe/coreos-assembler that referenced this pull request Mar 30, 2026
For testing numad as part of [1]. The thinking is that we'll
bind mount in COSA into the VM and run a container based on the
COSA rootfs that spanws stress-ng.

[1] coreos/fedora-coreos-config#4051
@dustymabe
Copy link
Copy Markdown
Member

@angelcerveraroldan with coreos/coreos-assembler#4513 I was able to test this with:

diff --git a/tests/kola/numad/test.sh b/tests/kola/numad/test.sh
index 5b858a80..d8e008f9 100755
--- a/tests/kola/numad/test.sh
+++ b/tests/kola/numad/test.sh
@@ -7,6 +7,8 @@
 ##   minMemory: 2048
 ##   architectures: "!s390x"
 ##   description: Verify that numad detects nodes and tracks set -euo pipefail
+##   # We use the COSA filesystem to start a container to run stress-ng
+##   bindMountHostRO: ["/,/var/cosaroot"]
 
 set -euo pipefail
 
@@ -27,11 +29,13 @@ if ! lscpu | grep -Eq "NUMA node\(s\):\s*2"; then
 fi
 
 # As part of the test we want to run a somewhat intensive process, so that
-# we can verify that numad is successfully tracking processes.
-rpm-ostree install --apply-live stress-ng
-stress-ng --temp-path /var/tmp         \
-                 --vm 1 --vm-bytes 1024M       \
-                 --timeout 25s
+# we can verify that numad is successfully tracking processes. Here we
+# use the same pattern of using a mounted in COSA as the container root as:
+# https://github.com/coreos/coreos-assembler/blob/8dbfe3ea8b8f571e732e8cc0ab307e983a0be1f3/mantle/cmd/kola/resources/iscsi_butane_setup.yaml#L102-L113
+podman run --privileged --name stress-ng --pid=host              \
+    --volume=root:/root/:nocopy --volume=vartmp:/var/tmp/:nocopy \
+    --workdir /root --rootfs /var/cosaroot                       \
+        stress-ng --temp-path /var/tmp --vm 1 --vm-bytes 1024M --timeout 25s
 
 logfile="/var/log/numad.log"
 for node in 0 1; do

but I will note it failed a few times. I'm not sure if it's because the test is flaky or if running the container this way is causing it to be flaky.

dustymabe added a commit to dustymabe/coreos-assembler that referenced this pull request Mar 30, 2026
For testing numad as part of [1]. The thinking is that we'll
bind mount in COSA into the VM and run a container based on the
COSA rootfs that spanws stress-ng.

[1] coreos/fedora-coreos-config#4051
@angelcerveraroldan
Copy link
Copy Markdown
Member Author

angelcerveraroldan commented Mar 31, 2026

but I will note it failed a few times. I'm not sure if it's because the test is flaky or if running the container this way is causing it to be flaky.

@dustymabe, I dont remeber it being flaky. I have ran the test (just using the old method, where I download the image from within the test) another 80 times in the aarch64 pod, and none of them failed.

dustymabe added a commit to coreos/coreos-assembler that referenced this pull request Apr 1, 2026
For testing numad as part of [1]. The thinking is that we'll
bind mount in COSA into the VM and run a container based on the
COSA rootfs that spanws stress-ng.

[1] coreos/fedora-coreos-config#4051
@angelcerveraroldan angelcerveraroldan force-pushed the support-numa-numad branch 2 times, most recently from 436a05d to e96a332 Compare April 1, 2026 11:49
Comment thread manifests/system-configuration.yaml Outdated
packages:
# https://github.com/coreos/fedora-coreos-tracker/issues/2096
# numad: Daemon that provides placement advice for efficient use of CPUs and memory on systems with NUMA topology.
# numadct: Control NUMA policy for processes or shared memory
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# numadct: Control NUMA policy for processes or shared memory
# numactl: Control NUMA policy for processes or shared memory

dustymabe
dustymabe previously approved these changes Apr 1, 2026
Add the `numad` and `numactl` librairies to the manifest. The library
does not have versions after `el8` for `s390x`, so we will
conditionally install.
@dustymabe dustymabe enabled auto-merge (rebase) April 1, 2026 14:42
@dustymabe dustymabe disabled auto-merge April 1, 2026 16:09
@dustymabe
Copy link
Copy Markdown
Member

I disabled automerge. @angelcerveraroldan feel free to merge when it passes tests if you want.

I wasn't sure if you wanted to wait for and use the potential change in coreos/coreos-assembler#4518 for this new test.

@angelcerveraroldan
Copy link
Copy Markdown
Member Author

I wasn't sure if you wanted to wait for and use the potential change in coreos/coreos-assembler#4518 for this new test.

I did want to wait for that change to merge first, that way if the flakiness turns out to be an issue, it won't be as big an issue.

Add a test that checks that numad can startup and track processes
without any issues.
Copy link
Copy Markdown
Member

@dustymabe dustymabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dustymabe dustymabe merged commit fac4d7a into coreos:testing-devel Apr 9, 2026
3 of 11 checks passed
@angelcerveraroldan angelcerveraroldan deleted the support-numa-numad branch April 10, 2026 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira For syncing to Jira. Only works for issues (i.e. not PRs)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants