Skip to content

RHOAIENG-48973: fix notebook tests and add required auto param for trainer accelerator#1001

Merged
openshift-merge-bot[bot] merged 1 commit into
project-codeflare:mainfrom
pawelpaszki:RHOAIENG-48973
Feb 10, 2026
Merged

RHOAIENG-48973: fix notebook tests and add required auto param for trainer accelerator#1001
openshift-merge-bot[bot] merged 1 commit into
project-codeflare:mainfrom
pawelpaszki:RHOAIENG-48973

Conversation

@pawelpaszki

@pawelpaszki pawelpaszki commented Feb 9, 2026

Copy link
Copy Markdown
Contributor

Issue link

RHOAIENG-48973

What changes have been made

RHOAIENG-48973: fix notebook tests and add required auto param for trainer accelerator

Verification steps

  • for notebook tests - all should pass
  • for accelerator required param:

tests that require submission of a job without accelerator parameter provided will fail (upgrade and mnist job submit), e.g.

tests/upgrade/01_raycluster_sdk_upgrade_test.py::TestMnistJobSubmit::test_mnist_job_submission Using kube-authkit auto-detection for authentication
Setting up authentication with kube-authkit auto-detection...
2026-02-10 08:58:20,342 Using authentication strategy: Kubernetes KubeConfig (/tmp/kubeconfig-1)
2026-02-10 08:58:20,343 Authenticating using kubeconfig: /tmp/kubeconfig-1
2026-02-10 08:58:20,346 Successfully authenticated using kubeconfig
✅ Successfully set up authentication with kube-authkit
Using kube-authkit auto-detection for authentication
Using kube-authkit auto-detection for authentication
Setting up authentication with kube-authkit auto-detection...
2026-02-10 08:58:20,346 Using authentication strategy: Kubernetes KubeConfig (/tmp/kubeconfig-1)
2026-02-10 08:58:20,346 Authenticating using kubeconfig: /tmp/kubeconfig-1
2026-02-10 08:58:20,349 Successfully authenticated using kubeconfig
✅ Successfully set up authentication with kube-authkit
Yaml resources loaded for mnist
No BYOIDC OIDC providers found in cluster Authentication resource
No BYOIDC OIDC providers found in cluster Authentication resource
Using legacy authentication for Ray Dashboard job submission...
2026-02-10 08:58:27,993	INFO dashboard_sdk.py:355 -- Uploading package gcs://_ray_pkg_d9977c9a7d4c9bd8.zip.
2026-02-10 08:58:27,997	INFO packaging.py:588 -- Creating a file package for local module './tests/e2e/'.
Submitted job with ID: raysubmit_2QzNnqbaezMfH6cZ
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
RUNNING
RUNNING
2026-02-10 08:58:31,233	INFO job_manager.py:568 -- Runtime env is setting up.
Running entrypoint for job raysubmit_2QzNnqbaezMfH6cZ: python mnist.py
prior to running the trainer
MASTER_ADDR: is  None
MASTER_PORT: is  None
ACCELERATOR: is  None
STORAGE_BUCKET_EXISTS:  False



GROUP:  1
LOCAL:  1
Traceback (most recent call last):
  File "/tmp/ray/session_2026-02-10_08-49-19_805412_1/runtime_resources/working_dir_files/_ray_pkg_d9977c9a7d4c9bd8/mnist.py", line 245, in <module>
    trainer = Trainer(
              ^^^^^^^^
  File "/tmp/ray/session_2026-02-10_08-49-19_805412_1/runtime_resources/pip/890d6800adb59f0d7b5775a3f71b22405ddd42a1/virtualenv/lib64/python3.12/site-packages/pytorch_lightning/utilities/argparse.py", line 70, in insert_env_defaults
    return fn(self, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/tmp/ray/session_2026-02-10_08-49-19_805412_1/runtime_resources/pip/890d6800adb59f0d7b5775a3f71b22405ddd42a1/virtualenv/lib64/python3.12/site-packages/pytorch_lightning/trainer/trainer.py", line 417, in __init__
    self._accelerator_connector = _AcceleratorConnector(
                                  ^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ray/session_2026-02-10_08-49-19_805412_1/runtime_resources/pip/890d6800adb59f0d7b5775a3f71b22405ddd42a1/virtualenv/lib64/python3.12/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py", line 129, in __init__
    self._check_config_and_set_final_flags(
  File "/tmp/ray/session_2026-02-10_08-49-19_805412_1/runtime_resources/pip/890d6800adb59f0d7b5775a3f71b22405ddd42a1/virtualenv/lib64/python3.12/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py", line 204, in _check_config_and_set_final_flags
    raise ValueError(
ValueError: You selected an invalid accelerator name: `accelerator=None`. Available names are: auto, cuda, tpu, cpu, mps.

Job has completed: 'FAILED'
FAILED

with parameter provided for accelerator, the tests pass:

tests/upgrade/01_raycluster_sdk_upgrade_test.py::TestMnistJobSubmit::test_mnist_job_submission Using kube-authkit auto-detection for authentication
Setting up authentication with kube-authkit auto-detection...
2026-02-10 09:09:10,920 Using authentication strategy: Kubernetes KubeConfig (/tmp/kubeconfig-1)
2026-02-10 09:09:10,921 Authenticating using kubeconfig: /tmp/kubeconfig-1
2026-02-10 09:09:10,924 Successfully authenticated using kubeconfig
✅ Successfully set up authentication with kube-authkit
Using kube-authkit auto-detection for authentication
Using kube-authkit auto-detection for authentication
Setting up authentication with kube-authkit auto-detection...
2026-02-10 09:09:10,925 Using authentication strategy: Kubernetes KubeConfig (/tmp/kubeconfig-1)
2026-02-10 09:09:10,925 Authenticating using kubeconfig: /tmp/kubeconfig-1
2026-02-10 09:09:10,928 Successfully authenticated using kubeconfig
✅ Successfully set up authentication with kube-authkit
Yaml resources loaded for mnist
No BYOIDC OIDC providers found in cluster Authentication resource
No BYOIDC OIDC providers found in cluster Authentication resource
Using legacy authentication for Ray Dashboard job submission...
2026-02-10 09:09:18,760	INFO dashboard_sdk.py:355 -- Uploading package gcs://_ray_pkg_be21486f12aefa83.zip.
2026-02-10 09:09:18,763	INFO packaging.py:588 -- Creating a file package for local module './tests/e2e/'.
Submitted job with ID: raysubmit_FjnqSyCQ43zMrMxM
PENDING
RUNNING
RUNNING
2026-02-10 09:09:21,594	INFO job_manager.py:568 -- Runtime env is setting up.
Running entrypoint for job raysubmit_FjnqSyCQ43zMrMxM: python mnist.py
prior to running the trainer
MASTER_ADDR: is  None
MASTER_PORT: is  None
ACCELERATOR: is  auto
STORAGE_BUCKET_EXISTS:  False



GROUP:  1
LOCAL:  1
💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
----------------------------------------------------------------------------------------------------
distributed_backend=gloo
All distributed processes registered. Starting with 1 processes
----------------------------------------------------------------------------------------------------

Downloading MNIST dataset...
Using default MNIST mirror reference to download datasets...

  0%|          | 0.00/9.91M [00:00<?, ?B/s]
  1%|          | 98.3k/9.91M [00:00<00:11, 854kB/s]
  4%|▎         | 360k/9.91M [00:00<00:05, 1.68MB/s]
 15%|█▍        | 1.44M/9.91M [00:00<00:01, 5.51MB/s]
 59%|█████▉    | 5.83M/9.91M [00:00<00:00, 18.7MB/s]
100%|██████████| 9.91M/9.91M [00:00<00:00, 18.8MB/s]

  0%|          | 0.00/28.9k [00:00<?, ?B/s]
100%|██████████| 28.9k/28.9k [00:00<00:00, 497kB/s]

  0%|          | 0.00/1.65M [00:00<?, ?B/s]
  6%|▌         | 98.3k/1.65M [00:00<00:01, 842kB/s]
 14%|█▍        | 229k/1.65M [00:00<00:01, 1.01MB/s]
 44%|████▎     | 721k/1.65M [00:00<00:00, 2.47MB/s]
100%|██████████| 1.65M/1.65M [00:00<00:00, 4.03MB/s]

  0%|          | 0.00/4.54k [00:00<?, ?B/s]
100%|██████████| 4.54k/4.54k [00:00<00:00, 15.3MB/s]
┏━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃   ┃ Name          ┃ Type               ┃ Params ┃ Mode  ┃ FLOPs ┃
┡━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ 0 │ model         │ Sequential         │ 55.1 K │ train │     0 │
│ 1 │ val_accuracy  │ MulticlassAccuracy │      0 │ train │     0 │
│ 2 │ test_accuracy │ MulticlassAccuracy │      0 │ train │     0 │
└───┴───────────────┴────────────────────┴────────┴───────┴───────┘
Trainable params: 55.1 K                                                        
Non-trainable params: 0                                                         
Total params: 55.1 K                                                            
Total estimated model params size (MB): 0                                       
Modules in train mode: 11                                                       
Modules in eval mode: 0                                                         
Total FLOPs: 0                                                                  

Sanity Checking: |          | 0/? [00:00<?, ?it/s]/tmp/ray/session_2026-02-10_08-49-19_805412_1/runtime_resources/pip/890d6800adb59f0d7b5775a3f71b22405ddd42a1/virtualenv/lib64/python3.12/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:434: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=7` in the `DataLoader` to improve performance.

Sanity Checking: |          | 0/? [00:00<?, ?it/s]
Sanity Checking DataLoader 0:   0%|          | 0/2 [00:00<?, ?it/s]
Sanity Checking DataLoader 0: 100%|██████████| 2/2 [00:00<00:00, 68.34it/s]/tmp/ray/session_2026-02-10_08-49-19_805412_1/runtime_resources/pip/890d6800adb59f0d7b5775a3f71b22405ddd42a1/virtualenv/lib64/python3.12/site-packages/pytorch_lightning/trainer/connectors/logger_connector/result.py:433: It is recommended to use `self.log('val_loss', ..., sync_dist=True)` when logging on epoch level in distributed setting to accumulate the metric across devices.

                                                                           
/tmp/ray/session_2026-02-10_08-49-19_805412_1/runtime_resources/pip/890d6800adb59f0d7b5775a3f71b22405ddd42a1/virtualenv/lib64/python3.12/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:434: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=7` in the `DataLoader` to improve performance.
/tmp/ray/session_2026-02-10_08-49-19_805412_1/runtime_resources/pip/890d6800adb59f0d7b5775a3f71b22405ddd42a1/virtualenv/lib64/python3.12/site-packages/pytorch_lightning/loops/fit_loop.py:317: The number of training batches (16) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.

Training: |          | 0/? [00:00<?, ?it/s]
Training: |          | 0/? [00:00<?, ?it/s]
Epoch 0:   0%|          | 0/16 [00:00<?, ?it/s]
Epoch 0: 100%|██████████| 16/16 [00:00<00:00, 67.05it/s]
Epoch 0: 100%|██████████| 16/16 [00:00<00:00, 67.00it/s, v_num=0]

Validation: |          | 0/? [00:00<?, ?it/s]
Validation: |          | 0/? [00:00<?, ?it/s]
Validation DataLoader 0:   0%|          | 0/79 [00:00<?, ?it/s]
Validation DataLoader 0:  25%|██▌       | 20/79 [00:00<00:00, 78.90it/s]
Validation DataLoader 0:  51%|█████     | 40/79 [00:00<00:00, 76.37it/s]
Validation DataLoader 0:  76%|███████▌  | 60/79 [00:00<00:00, 75.45it/s]
Validation DataLoader 0: 100%|██████████| 79/79 [00:01<00:00, 73.54it/s]
Epoch 0: 100%|██████████| 16/16 [00:01<00:00, 12.02it/s, v_num=0, val_loss=2.180, val_acc=0.347]
Epoch 0: 100%|██████████| 16/16 [00:01<00:00, 12.01it/s, v_num=0, val_loss=2.180, val_acc=0.347]
Epoch 0:   0%|          | 0/16 [00:00<?, ?it/s, v_num=0, val_loss=2.180, val_acc=0.347]         
Epoch 1:   0%|          | 0/16 [00:00<?, ?it/s, v_num=0, val_loss=2.180, val_acc=0.347]
Epoch 1: 100%|██████████| 16/16 [00:00<00:00, 67.51it/s, v_num=0, val_loss=2.180, val_acc=0.347]
Epoch 1: 100%|██████████| 16/16 [00:00<00:00, 67.44it/s, v_num=0, val_loss=2.180, val_acc=0.347]

Validation: |          | 0/? [00:00<?, ?it/s]
Validation: |          | 0/? [00:00<?, ?it/s]
Validation DataLoader 0:   0%|          | 0/79 [00:00<?, ?it/s]
Validation DataLoader 0:  25%|██▌       | 20/79 [00:00<00:00, 68.90it/s]
Validation DataLoader 0:  51%|█████     | 40/79 [00:00<00:00, 70.67it/s]
Validation DataLoader 0:  76%|███████▌  | 60/79 [00:00<00:00, 70.99it/s]
Validation DataLoader 0: 100%|██████████| 79/79 [00:01<00:00, 71.94it/s]
Epoch 1: 100%|██████████| 16/16 [00:01<00:00, 11.84it/s, v_num=0, val_loss=2.000, val_acc=0.423]
Epoch 1: 100%|██████████| 16/16 [00:01<00:00, 11.84it/s, v_num=0, val_loss=2.000, val_acc=0.423]
Epoch 1:   0%|          | 0/16 [00:00<?, ?it/s, v_num=0, val_loss=2.000, val_acc=0.423]         
Epoch 2:   0%|          | 0/16 [00:00<?, ?it/s, v_num=0, val_loss=2.000, val_acc=0.423]
Epoch 2: 100%|██████████| 16/16 [00:00<00:00, 63.55it/s, v_num=0, val_loss=2.000, val_acc=0.423]
Epoch 2: 100%|██████████| 16/16 [00:00<00:00, 63.49it/s, v_num=0, val_loss=2.000, val_acc=0.423]

Validation: |          | 0/? [00:00<?, ?it/s]
Validation: |          | 0/? [00:00<?, ?it/s]
Validation DataLoader 0:   0%|          | 0/79 [00:00<?, ?it/s]
Validation DataLoader 0:  25%|██▌       | 20/79 [00:00<00:00, 75.61it/s]
Validation DataLoader 0:  51%|█████     | 40/79 [00:00<00:00, 73.06it/s]
Validation DataLoader 0:  76%|███████▌  | 60/79 [00:00<00:00, 73.20it/s]
Validation DataLoader 0: 100%|██████████| 79/79 [00:01<00:00, 73.65it/s]
Epoch 2: 100%|██████████| 16/16 [00:01<00:00, 11.92it/s, v_num=0, val_loss=1.790, val_acc=0.536]
Epoch 2: 100%|██████████| 16/16 [00:01<00:00, 11.91it/s, v_num=0, val_loss=1.790, val_acc=0.536]`Trainer.fit` stopped: `max_epochs=3` reached.

Epoch 2: 100%|██████████| 16/16 [00:01<00:00, 11.88it/s, v_num=0, val_loss=1.790, val_acc=0.536]

Job has completed: 'SUCCEEDED'
PASSED

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • Testing is not required for this change

@pawelpaszki pawelpaszki added test-guided-notebooks Run PR check to verify Guided notebooks test-ui-notebooks Run PR check to verify UI notebooks test-additional-notebooks labels Feb 9, 2026
@codecov

codecov Bot commented Feb 9, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.97%. Comparing base (3153fe9) to head (48302fe).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1001      +/-   ##
==========================================
+ Coverage   95.96%   95.97%   +0.01%     
==========================================
  Files          23       23              
  Lines        2203     2211       +8     
==========================================
+ Hits         2114     2122       +8     
  Misses         89       89              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@pawelpaszki

Copy link
Copy Markdown
Contributor Author

/retest

@pawelpaszki pawelpaszki changed the title fix: widget notebook RHOAIENG-48973: fix notebook tests and add required auto param for trainer accelerator Feb 10, 2026
@openshift-ci-robot

openshift-ci-robot commented Feb 10, 2026

Copy link
Copy Markdown
Collaborator

@pawelpaszki: This pull request references RHOAIENG-48973 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Issue link

RHOAIENG-48973

What changes have been made

fix: widget notebook

Verification steps

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • Testing is not required for this change

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot

openshift-ci-robot commented Feb 10, 2026

Copy link
Copy Markdown
Collaborator

@pawelpaszki: This pull request references RHOAIENG-48973 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Issue link

RHOAIENG-48973

What changes have been made

RHOAIENG-48973: fix notebook tests and add required auto param for trainer accelerator

Verification steps

  • for notebook tests - all should pass
  • for accelerator required param:

tests that require submission of a job without accelerator parameter provided will fail (upgrade and mnist job submit), e.g.

tests/upgrade/01_raycluster_sdk_upgrade_test.py::TestMnistJobSubmit::test_mnist_job_submission Using kube-authkit auto-detection for authentication
Setting up authentication with kube-authkit auto-detection...
2026-02-10 08:58:20,342 Using authentication strategy: Kubernetes KubeConfig (/tmp/kubeconfig-1)
2026-02-10 08:58:20,343 Authenticating using kubeconfig: /tmp/kubeconfig-1
2026-02-10 08:58:20,346 Successfully authenticated using kubeconfig
✅ Successfully set up authentication with kube-authkit
Using kube-authkit auto-detection for authentication
Using kube-authkit auto-detection for authentication
Setting up authentication with kube-authkit auto-detection...
2026-02-10 08:58:20,346 Using authentication strategy: Kubernetes KubeConfig (/tmp/kubeconfig-1)
2026-02-10 08:58:20,346 Authenticating using kubeconfig: /tmp/kubeconfig-1
2026-02-10 08:58:20,349 Successfully authenticated using kubeconfig
✅ Successfully set up authentication with kube-authkit
Yaml resources loaded for mnist
No BYOIDC OIDC providers found in cluster Authentication resource
No BYOIDC OIDC providers found in cluster Authentication resource
Using legacy authentication for Ray Dashboard job submission...
2026-02-10 08:58:27,993	INFO dashboard_sdk.py:355 -- Uploading package gcs://_ray_pkg_d9977c9a7d4c9bd8.zip.
2026-02-10 08:58:27,997	INFO packaging.py:588 -- Creating a file package for local module './tests/e2e/'.
Submitted job with ID: raysubmit_2QzNnqbaezMfH6cZ
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
RUNNING
RUNNING
2026-02-10 08:58:31,233	INFO job_manager.py:568 -- Runtime env is setting up.
Running entrypoint for job raysubmit_2QzNnqbaezMfH6cZ: python mnist.py
prior to running the trainer
MASTER_ADDR: is  None
MASTER_PORT: is  None
ACCELERATOR: is  None
STORAGE_BUCKET_EXISTS:  False



GROUP:  1
LOCAL:  1
Traceback (most recent call last):
 File "/tmp/ray/session_2026-02-10_08-49-19_805412_1/runtime_resources/working_dir_files/_ray_pkg_d9977c9a7d4c9bd8/mnist.py", line 245, in <module>
   trainer = Trainer(
             ^^^^^^^^
 File "/tmp/ray/session_2026-02-10_08-49-19_805412_1/runtime_resources/pip/890d6800adb59f0d7b5775a3f71b22405ddd42a1/virtualenv/lib64/python3.12/site-packages/pytorch_lightning/utilities/argparse.py", line 70, in insert_env_defaults
   return fn(self, **kwargs)
          ^^^^^^^^^^^^^^^^^^
 File "/tmp/ray/session_2026-02-10_08-49-19_805412_1/runtime_resources/pip/890d6800adb59f0d7b5775a3f71b22405ddd42a1/virtualenv/lib64/python3.12/site-packages/pytorch_lightning/trainer/trainer.py", line 417, in __init__
   self._accelerator_connector = _AcceleratorConnector(
                                 ^^^^^^^^^^^^^^^^^^^^^^
 File "/tmp/ray/session_2026-02-10_08-49-19_805412_1/runtime_resources/pip/890d6800adb59f0d7b5775a3f71b22405ddd42a1/virtualenv/lib64/python3.12/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py", line 129, in __init__
   self._check_config_and_set_final_flags(
 File "/tmp/ray/session_2026-02-10_08-49-19_805412_1/runtime_resources/pip/890d6800adb59f0d7b5775a3f71b22405ddd42a1/virtualenv/lib64/python3.12/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py", line 204, in _check_config_and_set_final_flags
   raise ValueError(
ValueError: You selected an invalid accelerator name: `accelerator=None`. Available names are: auto, cuda, tpu, cpu, mps.

Job has completed: 'FAILED'
FAILED

with parameter provided for accelerator, the tests pass:

tests/upgrade/01_raycluster_sdk_upgrade_test.py::TestMnistJobSubmit::test_mnist_job_submission Using kube-authkit auto-detection for authentication
Setting up authentication with kube-authkit auto-detection...
2026-02-10 09:09:10,920 Using authentication strategy: Kubernetes KubeConfig (/tmp/kubeconfig-1)
2026-02-10 09:09:10,921 Authenticating using kubeconfig: /tmp/kubeconfig-1
2026-02-10 09:09:10,924 Successfully authenticated using kubeconfig
✅ Successfully set up authentication with kube-authkit
Using kube-authkit auto-detection for authentication
Using kube-authkit auto-detection for authentication
Setting up authentication with kube-authkit auto-detection...
2026-02-10 09:09:10,925 Using authentication strategy: Kubernetes KubeConfig (/tmp/kubeconfig-1)
2026-02-10 09:09:10,925 Authenticating using kubeconfig: /tmp/kubeconfig-1
2026-02-10 09:09:10,928 Successfully authenticated using kubeconfig
✅ Successfully set up authentication with kube-authkit
Yaml resources loaded for mnist
No BYOIDC OIDC providers found in cluster Authentication resource
No BYOIDC OIDC providers found in cluster Authentication resource
Using legacy authentication for Ray Dashboard job submission...
2026-02-10 09:09:18,760	INFO dashboard_sdk.py:355 -- Uploading package gcs://_ray_pkg_be21486f12aefa83.zip.
2026-02-10 09:09:18,763	INFO packaging.py:588 -- Creating a file package for local module './tests/e2e/'.
Submitted job with ID: raysubmit_FjnqSyCQ43zMrMxM
PENDING
RUNNING
RUNNING
2026-02-10 09:09:21,594	INFO job_manager.py:568 -- Runtime env is setting up.
Running entrypoint for job raysubmit_FjnqSyCQ43zMrMxM: python mnist.py
prior to running the trainer
MASTER_ADDR: is  None
MASTER_PORT: is  None
ACCELERATOR: is  auto
STORAGE_BUCKET_EXISTS:  False



GROUP:  1
LOCAL:  1
💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
----------------------------------------------------------------------------------------------------
distributed_backend=gloo
All distributed processes registered. Starting with 1 processes
----------------------------------------------------------------------------------------------------

Downloading MNIST dataset...
Using default MNIST mirror reference to download datasets...

 0%|          | 0.00/9.91M [00:00<?, ?B/s]
 1%|          | 98.3k/9.91M [00:00<00:11, 854kB/s]
 4%|▎         | 360k/9.91M [00:00<00:05, 1.68MB/s]
15%|█▍        | 1.44M/9.91M [00:00<00:01, 5.51MB/s]
59%|█████▉    | 5.83M/9.91M [00:00<00:00, 18.7MB/s]
100%|██████████| 9.91M/9.91M [00:00<00:00, 18.8MB/s]

 0%|          | 0.00/28.9k [00:00<?, ?B/s]
100%|██████████| 28.9k/28.9k [00:00<00:00, 497kB/s]

 0%|          | 0.00/1.65M [00:00<?, ?B/s]
 6%|▌         | 98.3k/1.65M [00:00<00:01, 842kB/s]
14%|█▍        | 229k/1.65M [00:00<00:01, 1.01MB/s]
44%|████▎     | 721k/1.65M [00:00<00:00, 2.47MB/s]
100%|██████████| 1.65M/1.65M [00:00<00:00, 4.03MB/s]

 0%|          | 0.00/4.54k [00:00<?, ?B/s]
100%|██████████| 4.54k/4.54k [00:00<00:00, 15.3MB/s]
┏━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃   ┃ Name          ┃ Type               ┃ Params ┃ Mode  ┃ FLOPs ┃
┡━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ 0 │ model         │ Sequential         │ 55.1 K │ train │     0 │
│ 1 │ val_accuracy  │ MulticlassAccuracy │      0 │ train │     0 │
│ 2 │ test_accuracy │ MulticlassAccuracy │      0 │ train │     0 │
└───┴───────────────┴────────────────────┴────────┴───────┴───────┘
Trainable params: 55.1 K                                                        
Non-trainable params: 0                                                         
Total params: 55.1 K                                                            
Total estimated model params size (MB): 0                                       
Modules in train mode: 11                                                       
Modules in eval mode: 0                                                         
Total FLOPs: 0                                                                  

Sanity Checking: |          | 0/? [00:00<?, ?it/s]/tmp/ray/session_2026-02-10_08-49-19_805412_1/runtime_resources/pip/890d6800adb59f0d7b5775a3f71b22405ddd42a1/virtualenv/lib64/python3.12/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:434: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=7` in the `DataLoader` to improve performance.

Sanity Checking: |          | 0/? [00:00<?, ?it/s]
Sanity Checking DataLoader 0:   0%|          | 0/2 [00:00<?, ?it/s]
Sanity Checking DataLoader 0: 100%|██████████| 2/2 [00:00<00:00, 68.34it/s]/tmp/ray/session_2026-02-10_08-49-19_805412_1/runtime_resources/pip/890d6800adb59f0d7b5775a3f71b22405ddd42a1/virtualenv/lib64/python3.12/site-packages/pytorch_lightning/trainer/connectors/logger_connector/result.py:433: It is recommended to use `self.log('val_loss', ..., sync_dist=True)` when logging on epoch level in distributed setting to accumulate the metric across devices.

                                                                          
/tmp/ray/session_2026-02-10_08-49-19_805412_1/runtime_resources/pip/890d6800adb59f0d7b5775a3f71b22405ddd42a1/virtualenv/lib64/python3.12/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:434: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=7` in the `DataLoader` to improve performance.
/tmp/ray/session_2026-02-10_08-49-19_805412_1/runtime_resources/pip/890d6800adb59f0d7b5775a3f71b22405ddd42a1/virtualenv/lib64/python3.12/site-packages/pytorch_lightning/loops/fit_loop.py:317: The number of training batches (16) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.

Training: |          | 0/? [00:00<?, ?it/s]
Training: |          | 0/? [00:00<?, ?it/s]
Epoch 0:   0%|          | 0/16 [00:00<?, ?it/s]
Epoch 0: 100%|██████████| 16/16 [00:00<00:00, 67.05it/s]
Epoch 0: 100%|██████████| 16/16 [00:00<00:00, 67.00it/s, v_num=0]

Validation: |          | 0/? [00:00<?, ?it/s]
Validation: |          | 0/? [00:00<?, ?it/s]
Validation DataLoader 0:   0%|          | 0/79 [00:00<?, ?it/s]
Validation DataLoader 0:  25%|██▌       | 20/79 [00:00<00:00, 78.90it/s]
Validation DataLoader 0:  51%|█████     | 40/79 [00:00<00:00, 76.37it/s]
Validation DataLoader 0:  76%|███████▌  | 60/79 [00:00<00:00, 75.45it/s]
Validation DataLoader 0: 100%|██████████| 79/79 [00:01<00:00, 73.54it/s]
Epoch 0: 100%|██████████| 16/16 [00:01<00:00, 12.02it/s, v_num=0, val_loss=2.180, val_acc=0.347]
Epoch 0: 100%|██████████| 16/16 [00:01<00:00, 12.01it/s, v_num=0, val_loss=2.180, val_acc=0.347]
Epoch 0:   0%|          | 0/16 [00:00<?, ?it/s, v_num=0, val_loss=2.180, val_acc=0.347]         
Epoch 1:   0%|          | 0/16 [00:00<?, ?it/s, v_num=0, val_loss=2.180, val_acc=0.347]
Epoch 1: 100%|██████████| 16/16 [00:00<00:00, 67.51it/s, v_num=0, val_loss=2.180, val_acc=0.347]
Epoch 1: 100%|██████████| 16/16 [00:00<00:00, 67.44it/s, v_num=0, val_loss=2.180, val_acc=0.347]

Validation: |          | 0/? [00:00<?, ?it/s]
Validation: |          | 0/? [00:00<?, ?it/s]
Validation DataLoader 0:   0%|          | 0/79 [00:00<?, ?it/s]
Validation DataLoader 0:  25%|██▌       | 20/79 [00:00<00:00, 68.90it/s]
Validation DataLoader 0:  51%|█████     | 40/79 [00:00<00:00, 70.67it/s]
Validation DataLoader 0:  76%|███████▌  | 60/79 [00:00<00:00, 70.99it/s]
Validation DataLoader 0: 100%|██████████| 79/79 [00:01<00:00, 71.94it/s]
Epoch 1: 100%|██████████| 16/16 [00:01<00:00, 11.84it/s, v_num=0, val_loss=2.000, val_acc=0.423]
Epoch 1: 100%|██████████| 16/16 [00:01<00:00, 11.84it/s, v_num=0, val_loss=2.000, val_acc=0.423]
Epoch 1:   0%|          | 0/16 [00:00<?, ?it/s, v_num=0, val_loss=2.000, val_acc=0.423]         
Epoch 2:   0%|          | 0/16 [00:00<?, ?it/s, v_num=0, val_loss=2.000, val_acc=0.423]
Epoch 2: 100%|██████████| 16/16 [00:00<00:00, 63.55it/s, v_num=0, val_loss=2.000, val_acc=0.423]
Epoch 2: 100%|██████████| 16/16 [00:00<00:00, 63.49it/s, v_num=0, val_loss=2.000, val_acc=0.423]

Validation: |          | 0/? [00:00<?, ?it/s]
Validation: |          | 0/? [00:00<?, ?it/s]
Validation DataLoader 0:   0%|          | 0/79 [00:00<?, ?it/s]
Validation DataLoader 0:  25%|██▌       | 20/79 [00:00<00:00, 75.61it/s]
Validation DataLoader 0:  51%|█████     | 40/79 [00:00<00:00, 73.06it/s]
Validation DataLoader 0:  76%|███████▌  | 60/79 [00:00<00:00, 73.20it/s]
Validation DataLoader 0: 100%|██████████| 79/79 [00:01<00:00, 73.65it/s]
Epoch 2: 100%|██████████| 16/16 [00:01<00:00, 11.92it/s, v_num=0, val_loss=1.790, val_acc=0.536]
Epoch 2: 100%|██████████| 16/16 [00:01<00:00, 11.91it/s, v_num=0, val_loss=1.790, val_acc=0.536]`Trainer.fit` stopped: `max_epochs=3` reached.

Epoch 2: 100%|██████████| 16/16 [00:01<00:00, 11.88it/s, v_num=0, val_loss=1.790, val_acc=0.536]

Job has completed: 'SUCCEEDED'
PASSED

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • Testing is not required for this change

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

1 similar comment
@openshift-ci-robot

openshift-ci-robot commented Feb 10, 2026

Copy link
Copy Markdown
Collaborator

@pawelpaszki: This pull request references RHOAIENG-48973 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Issue link

RHOAIENG-48973

What changes have been made

RHOAIENG-48973: fix notebook tests and add required auto param for trainer accelerator

Verification steps

  • for notebook tests - all should pass
  • for accelerator required param:

tests that require submission of a job without accelerator parameter provided will fail (upgrade and mnist job submit), e.g.

tests/upgrade/01_raycluster_sdk_upgrade_test.py::TestMnistJobSubmit::test_mnist_job_submission Using kube-authkit auto-detection for authentication
Setting up authentication with kube-authkit auto-detection...
2026-02-10 08:58:20,342 Using authentication strategy: Kubernetes KubeConfig (/tmp/kubeconfig-1)
2026-02-10 08:58:20,343 Authenticating using kubeconfig: /tmp/kubeconfig-1
2026-02-10 08:58:20,346 Successfully authenticated using kubeconfig
✅ Successfully set up authentication with kube-authkit
Using kube-authkit auto-detection for authentication
Using kube-authkit auto-detection for authentication
Setting up authentication with kube-authkit auto-detection...
2026-02-10 08:58:20,346 Using authentication strategy: Kubernetes KubeConfig (/tmp/kubeconfig-1)
2026-02-10 08:58:20,346 Authenticating using kubeconfig: /tmp/kubeconfig-1
2026-02-10 08:58:20,349 Successfully authenticated using kubeconfig
✅ Successfully set up authentication with kube-authkit
Yaml resources loaded for mnist
No BYOIDC OIDC providers found in cluster Authentication resource
No BYOIDC OIDC providers found in cluster Authentication resource
Using legacy authentication for Ray Dashboard job submission...
2026-02-10 08:58:27,993	INFO dashboard_sdk.py:355 -- Uploading package gcs://_ray_pkg_d9977c9a7d4c9bd8.zip.
2026-02-10 08:58:27,997	INFO packaging.py:588 -- Creating a file package for local module './tests/e2e/'.
Submitted job with ID: raysubmit_2QzNnqbaezMfH6cZ
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
PENDING
RUNNING
RUNNING
2026-02-10 08:58:31,233	INFO job_manager.py:568 -- Runtime env is setting up.
Running entrypoint for job raysubmit_2QzNnqbaezMfH6cZ: python mnist.py
prior to running the trainer
MASTER_ADDR: is  None
MASTER_PORT: is  None
ACCELERATOR: is  None
STORAGE_BUCKET_EXISTS:  False



GROUP:  1
LOCAL:  1
Traceback (most recent call last):
 File "/tmp/ray/session_2026-02-10_08-49-19_805412_1/runtime_resources/working_dir_files/_ray_pkg_d9977c9a7d4c9bd8/mnist.py", line 245, in <module>
   trainer = Trainer(
             ^^^^^^^^
 File "/tmp/ray/session_2026-02-10_08-49-19_805412_1/runtime_resources/pip/890d6800adb59f0d7b5775a3f71b22405ddd42a1/virtualenv/lib64/python3.12/site-packages/pytorch_lightning/utilities/argparse.py", line 70, in insert_env_defaults
   return fn(self, **kwargs)
          ^^^^^^^^^^^^^^^^^^
 File "/tmp/ray/session_2026-02-10_08-49-19_805412_1/runtime_resources/pip/890d6800adb59f0d7b5775a3f71b22405ddd42a1/virtualenv/lib64/python3.12/site-packages/pytorch_lightning/trainer/trainer.py", line 417, in __init__
   self._accelerator_connector = _AcceleratorConnector(
                                 ^^^^^^^^^^^^^^^^^^^^^^
 File "/tmp/ray/session_2026-02-10_08-49-19_805412_1/runtime_resources/pip/890d6800adb59f0d7b5775a3f71b22405ddd42a1/virtualenv/lib64/python3.12/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py", line 129, in __init__
   self._check_config_and_set_final_flags(
 File "/tmp/ray/session_2026-02-10_08-49-19_805412_1/runtime_resources/pip/890d6800adb59f0d7b5775a3f71b22405ddd42a1/virtualenv/lib64/python3.12/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py", line 204, in _check_config_and_set_final_flags
   raise ValueError(
ValueError: You selected an invalid accelerator name: `accelerator=None`. Available names are: auto, cuda, tpu, cpu, mps.

Job has completed: 'FAILED'
FAILED

with parameter provided for accelerator, the tests pass:

tests/upgrade/01_raycluster_sdk_upgrade_test.py::TestMnistJobSubmit::test_mnist_job_submission Using kube-authkit auto-detection for authentication
Setting up authentication with kube-authkit auto-detection...
2026-02-10 09:09:10,920 Using authentication strategy: Kubernetes KubeConfig (/tmp/kubeconfig-1)
2026-02-10 09:09:10,921 Authenticating using kubeconfig: /tmp/kubeconfig-1
2026-02-10 09:09:10,924 Successfully authenticated using kubeconfig
✅ Successfully set up authentication with kube-authkit
Using kube-authkit auto-detection for authentication
Using kube-authkit auto-detection for authentication
Setting up authentication with kube-authkit auto-detection...
2026-02-10 09:09:10,925 Using authentication strategy: Kubernetes KubeConfig (/tmp/kubeconfig-1)
2026-02-10 09:09:10,925 Authenticating using kubeconfig: /tmp/kubeconfig-1
2026-02-10 09:09:10,928 Successfully authenticated using kubeconfig
✅ Successfully set up authentication with kube-authkit
Yaml resources loaded for mnist
No BYOIDC OIDC providers found in cluster Authentication resource
No BYOIDC OIDC providers found in cluster Authentication resource
Using legacy authentication for Ray Dashboard job submission...
2026-02-10 09:09:18,760	INFO dashboard_sdk.py:355 -- Uploading package gcs://_ray_pkg_be21486f12aefa83.zip.
2026-02-10 09:09:18,763	INFO packaging.py:588 -- Creating a file package for local module './tests/e2e/'.
Submitted job with ID: raysubmit_FjnqSyCQ43zMrMxM
PENDING
RUNNING
RUNNING
2026-02-10 09:09:21,594	INFO job_manager.py:568 -- Runtime env is setting up.
Running entrypoint for job raysubmit_FjnqSyCQ43zMrMxM: python mnist.py
prior to running the trainer
MASTER_ADDR: is  None
MASTER_PORT: is  None
ACCELERATOR: is  auto
STORAGE_BUCKET_EXISTS:  False



GROUP:  1
LOCAL:  1
💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
----------------------------------------------------------------------------------------------------
distributed_backend=gloo
All distributed processes registered. Starting with 1 processes
----------------------------------------------------------------------------------------------------

Downloading MNIST dataset...
Using default MNIST mirror reference to download datasets...

 0%|          | 0.00/9.91M [00:00<?, ?B/s]
 1%|          | 98.3k/9.91M [00:00<00:11, 854kB/s]
 4%|▎         | 360k/9.91M [00:00<00:05, 1.68MB/s]
15%|█▍        | 1.44M/9.91M [00:00<00:01, 5.51MB/s]
59%|█████▉    | 5.83M/9.91M [00:00<00:00, 18.7MB/s]
100%|██████████| 9.91M/9.91M [00:00<00:00, 18.8MB/s]

 0%|          | 0.00/28.9k [00:00<?, ?B/s]
100%|██████████| 28.9k/28.9k [00:00<00:00, 497kB/s]

 0%|          | 0.00/1.65M [00:00<?, ?B/s]
 6%|▌         | 98.3k/1.65M [00:00<00:01, 842kB/s]
14%|█▍        | 229k/1.65M [00:00<00:01, 1.01MB/s]
44%|████▎     | 721k/1.65M [00:00<00:00, 2.47MB/s]
100%|██████████| 1.65M/1.65M [00:00<00:00, 4.03MB/s]

 0%|          | 0.00/4.54k [00:00<?, ?B/s]
100%|██████████| 4.54k/4.54k [00:00<00:00, 15.3MB/s]
┏━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃   ┃ Name          ┃ Type               ┃ Params ┃ Mode  ┃ FLOPs ┃
┡━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ 0 │ model         │ Sequential         │ 55.1 K │ train │     0 │
│ 1 │ val_accuracy  │ MulticlassAccuracy │      0 │ train │     0 │
│ 2 │ test_accuracy │ MulticlassAccuracy │      0 │ train │     0 │
└───┴───────────────┴────────────────────┴────────┴───────┴───────┘
Trainable params: 55.1 K                                                        
Non-trainable params: 0                                                         
Total params: 55.1 K                                                            
Total estimated model params size (MB): 0                                       
Modules in train mode: 11                                                       
Modules in eval mode: 0                                                         
Total FLOPs: 0                                                                  

Sanity Checking: |          | 0/? [00:00<?, ?it/s]/tmp/ray/session_2026-02-10_08-49-19_805412_1/runtime_resources/pip/890d6800adb59f0d7b5775a3f71b22405ddd42a1/virtualenv/lib64/python3.12/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:434: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=7` in the `DataLoader` to improve performance.

Sanity Checking: |          | 0/? [00:00<?, ?it/s]
Sanity Checking DataLoader 0:   0%|          | 0/2 [00:00<?, ?it/s]
Sanity Checking DataLoader 0: 100%|██████████| 2/2 [00:00<00:00, 68.34it/s]/tmp/ray/session_2026-02-10_08-49-19_805412_1/runtime_resources/pip/890d6800adb59f0d7b5775a3f71b22405ddd42a1/virtualenv/lib64/python3.12/site-packages/pytorch_lightning/trainer/connectors/logger_connector/result.py:433: It is recommended to use `self.log('val_loss', ..., sync_dist=True)` when logging on epoch level in distributed setting to accumulate the metric across devices.

                                                                          
/tmp/ray/session_2026-02-10_08-49-19_805412_1/runtime_resources/pip/890d6800adb59f0d7b5775a3f71b22405ddd42a1/virtualenv/lib64/python3.12/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:434: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=7` in the `DataLoader` to improve performance.
/tmp/ray/session_2026-02-10_08-49-19_805412_1/runtime_resources/pip/890d6800adb59f0d7b5775a3f71b22405ddd42a1/virtualenv/lib64/python3.12/site-packages/pytorch_lightning/loops/fit_loop.py:317: The number of training batches (16) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.

Training: |          | 0/? [00:00<?, ?it/s]
Training: |          | 0/? [00:00<?, ?it/s]
Epoch 0:   0%|          | 0/16 [00:00<?, ?it/s]
Epoch 0: 100%|██████████| 16/16 [00:00<00:00, 67.05it/s]
Epoch 0: 100%|██████████| 16/16 [00:00<00:00, 67.00it/s, v_num=0]

Validation: |          | 0/? [00:00<?, ?it/s]
Validation: |          | 0/? [00:00<?, ?it/s]
Validation DataLoader 0:   0%|          | 0/79 [00:00<?, ?it/s]
Validation DataLoader 0:  25%|██▌       | 20/79 [00:00<00:00, 78.90it/s]
Validation DataLoader 0:  51%|█████     | 40/79 [00:00<00:00, 76.37it/s]
Validation DataLoader 0:  76%|███████▌  | 60/79 [00:00<00:00, 75.45it/s]
Validation DataLoader 0: 100%|██████████| 79/79 [00:01<00:00, 73.54it/s]
Epoch 0: 100%|██████████| 16/16 [00:01<00:00, 12.02it/s, v_num=0, val_loss=2.180, val_acc=0.347]
Epoch 0: 100%|██████████| 16/16 [00:01<00:00, 12.01it/s, v_num=0, val_loss=2.180, val_acc=0.347]
Epoch 0:   0%|          | 0/16 [00:00<?, ?it/s, v_num=0, val_loss=2.180, val_acc=0.347]         
Epoch 1:   0%|          | 0/16 [00:00<?, ?it/s, v_num=0, val_loss=2.180, val_acc=0.347]
Epoch 1: 100%|██████████| 16/16 [00:00<00:00, 67.51it/s, v_num=0, val_loss=2.180, val_acc=0.347]
Epoch 1: 100%|██████████| 16/16 [00:00<00:00, 67.44it/s, v_num=0, val_loss=2.180, val_acc=0.347]

Validation: |          | 0/? [00:00<?, ?it/s]
Validation: |          | 0/? [00:00<?, ?it/s]
Validation DataLoader 0:   0%|          | 0/79 [00:00<?, ?it/s]
Validation DataLoader 0:  25%|██▌       | 20/79 [00:00<00:00, 68.90it/s]
Validation DataLoader 0:  51%|█████     | 40/79 [00:00<00:00, 70.67it/s]
Validation DataLoader 0:  76%|███████▌  | 60/79 [00:00<00:00, 70.99it/s]
Validation DataLoader 0: 100%|██████████| 79/79 [00:01<00:00, 71.94it/s]
Epoch 1: 100%|██████████| 16/16 [00:01<00:00, 11.84it/s, v_num=0, val_loss=2.000, val_acc=0.423]
Epoch 1: 100%|██████████| 16/16 [00:01<00:00, 11.84it/s, v_num=0, val_loss=2.000, val_acc=0.423]
Epoch 1:   0%|          | 0/16 [00:00<?, ?it/s, v_num=0, val_loss=2.000, val_acc=0.423]         
Epoch 2:   0%|          | 0/16 [00:00<?, ?it/s, v_num=0, val_loss=2.000, val_acc=0.423]
Epoch 2: 100%|██████████| 16/16 [00:00<00:00, 63.55it/s, v_num=0, val_loss=2.000, val_acc=0.423]
Epoch 2: 100%|██████████| 16/16 [00:00<00:00, 63.49it/s, v_num=0, val_loss=2.000, val_acc=0.423]

Validation: |          | 0/? [00:00<?, ?it/s]
Validation: |          | 0/? [00:00<?, ?it/s]
Validation DataLoader 0:   0%|          | 0/79 [00:00<?, ?it/s]
Validation DataLoader 0:  25%|██▌       | 20/79 [00:00<00:00, 75.61it/s]
Validation DataLoader 0:  51%|█████     | 40/79 [00:00<00:00, 73.06it/s]
Validation DataLoader 0:  76%|███████▌  | 60/79 [00:00<00:00, 73.20it/s]
Validation DataLoader 0: 100%|██████████| 79/79 [00:01<00:00, 73.65it/s]
Epoch 2: 100%|██████████| 16/16 [00:01<00:00, 11.92it/s, v_num=0, val_loss=1.790, val_acc=0.536]
Epoch 2: 100%|██████████| 16/16 [00:01<00:00, 11.91it/s, v_num=0, val_loss=1.790, val_acc=0.536]`Trainer.fit` stopped: `max_epochs=3` reached.

Epoch 2: 100%|██████████| 16/16 [00:01<00:00, 11.88it/s, v_num=0, val_loss=1.790, val_acc=0.536]

Job has completed: 'SUCCEEDED'
PASSED

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • Testing is not required for this change

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@chipspeak chipspeak left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Pawel, cheers for this!

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Feb 10, 2026
@openshift-ci

openshift-ci Bot commented Feb 10, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chipspeak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 10, 2026
@openshift-merge-bot openshift-merge-bot Bot merged commit e3aecbb into project-codeflare:main Feb 10, 2026
26 of 34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference lgtm Indicates that a PR is ready to be merged. test-additional-notebooks test-guided-notebooks Run PR check to verify Guided notebooks test-ui-notebooks Run PR check to verify UI notebooks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants