Skip to content

[8.0] fix: Singularity issue with non existing SE + JobAgent issue with exception raised during submission#8118

Merged
chrisburr merged 1 commit intoDIRACGrid:rel-v8r0from
aldbr:v8.0_FIX_singularity-ce-storage
Apr 7, 2025
Merged

[8.0] fix: Singularity issue with non existing SE + JobAgent issue with exception raised during submission#8118
chrisburr merged 1 commit intoDIRACGrid:rel-v8r0from
aldbr:v8.0_FIX_singularity-ce-storage

Conversation

@aldbr
Copy link
Copy Markdown
Contributor

@aldbr aldbr commented Apr 4, 2025

I found this double issue while playing with the Site configuration:

  • Initial issue:
2025-04-03T15:36:40,991592Z WorkloadManagement/JobAgent/Singularity INFO: Creating singularity container
2025-04-03T15:36:40,992786Z WorkloadManagement/JobAgent/WorkloadManagement/JobAgent INFO: Found Job LogLevel JDL parameter with value DEBUG

# The issue that triggers the first problem: a SE section is not found
2025-04-03T15:36:41,618254Z WorkloadManagement/JobAgent ERROR: StorageFactory._getConfigStorageName: Failed to get storage options Path /Resources/StorageElements/<NOT_EXISTING_SE> does not exist or it's not a section
  • 1st problem: self.storages is not initialized because the StorageElement object is malformed (SE section not found)
2025-04-03T15:36:41,619437Z WorkloadManagement/JobAgent/WorkloadManagement/JobAgent ERROR: Exception occurred when submitting JobID: 2140558
Traceback (most recent call last):
  File "/cvmfs/lhcbdev.cern.ch/lhcbdirac/versions/v12.0.0a11-1743438349/Linux-aarch64/lib/python3.11/site-packages/DIRAC/WorkloadManagementSystem/Agent/JobAgent.py", line 637, in _submitJob
    result = self.computingElement.submitJob(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/cvmfs/lhcbdev.cern.ch/lhcbdirac/versions/v12.0.0a11-1743438349/Linux-aarch64/lib/python3.11/site-packages/DIRAC/Resources/Computing/SingularityComputingElement.py", line 384, in submitJob
    mountedPath = StorageElement(seName).getStorageParameters(protocol="file")["Value"]["Path"]
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/cvmfs/lhcbdev.cern.ch/lhcbdirac/versions/v12.0.0a11-1743438349/Linux-aarch64/lib/python3.11/site-packages/DIRAC/Resources/Storage/StorageElement.py", line 705, in getStorageParameters
    for storage in self.storages.values():
                   ^^^^^^^^^^^^^
  File "/cvmfs/lhcbdev.cern.ch/lhcbdirac/versions/v12.0.0a11-1743438349/Linux-aarch64/lib/python3.11/site-packages/DIRAC/Resources/Storage/StorageElement.py", line 1368, in __getattr__
    raise AttributeError(f"StorageElement does not have a method '{name}'")
AttributeError: StorageElement does not have a method 'storages'
2025-04-03T15:36:41,620682Z WorkloadManagement/JobAgent/WorkloadManagement/JobAgent ERROR: Job submission failed 2140558
  • 2nd problem: instead of capturing the message of the exception to reschedule the job, we capture the exception itself
2025-04-03T15:36:41,620789Z WorkloadManagement/JobAgent/WorkloadManagement/JobAgent ERROR: Error in DIRAC JobWrapper or inner CE execution: StorageElement does not have a method 'storages'
2025-04-03T15:36:41,620842Z WorkloadManagement/JobAgent/WorkloadManagement/JobAgent WARN: Failure ==> rescheduling (during StorageElement does not have a method 'storages')
2025-04-03T15:36:41,620910Z WorkloadManagement/JobAgent/WorkloadManagement/JobAgent ERROR: Agent exception while calling method <bound method JobAgent.execute of <DIRAC.WorkloadManagementSystem.Agent.JobAgent.JobAgent object at 0x40000e892f10>>
Traceback (most recent call last):
  File "/cvmfs/lhcbdev.cern.ch/lhcbdirac/versions/v12.0.0a11-1743438349/Linux-aarch64/lib/python3.11/site-packages/DIRAC/Core/Base/AgentModule.py", line 314, in am_secureCall
    result = functor(*args)
             ^^^^^^^^^^^^^^
  File "/cvmfs/lhcbdev.cern.ch/lhcbdirac/versions/v12.0.0a11-1743438349/Linux-aarch64/lib/python3.11/site-packages/DIRAC/WorkloadManagementSystem/Agent/JobAgent.py", line 311, in execute
    result = self._checkSubmittedJobs()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/cvmfs/lhcbdev.cern.ch/lhcbdirac/versions/v12.0.0a11-1743438349/Linux-aarch64/lib/python3.11/site-packages/DIRAC/WorkloadManagementSystem/Agent/JobAgent.py", line 700, in _checkSubmittedJobs
    self._rescheduleFailedJob(jobID, result["Message"])
  File "/cvmfs/lhcbdev.cern.ch/lhcbdirac/versions/v12.0.0a11-1743438349/Linux-aarch64/lib/python3.11/site-packages/DIRAC/WorkloadManagementSystem/Agent/JobAgent.py", line 828, in _rescheduleFailedJob
    self.jobs[jobID]["JobReport"].setJobStatus(
  File "/cvmfs/lhcbdev.cern.ch/lhcbdirac/versions/v12.0.0a11-1743438349/Linux-aarch64/lib/python3.11/site-packages/DIRAC/WorkloadManagementSystem/Client/JobReport.py", line 38, in

BEGINRELEASENOTES
*Resources
FIX: do not try to use a malformed StorageElement instance in SingularityCE
*WorkloadManagement
FIX: report the message of the Exception instead of the Exception itself in JobAgent.submitJob
ENDRELEASENOTES

@DIRACGridBot DIRACGridBot added the alsoTargeting:integration Cherry pick this PR to integration after merge label Apr 4, 2025
@chrisburr chrisburr merged commit 3255c8e into DIRACGrid:rel-v8r0 Apr 7, 2025
26 checks passed
@DIRACGridBot DIRACGridBot added the sweep:done All sweeping actions have been done for this PR label Apr 7, 2025
DIRACGridBot pushed a commit to DIRACGridBot/DIRAC that referenced this pull request Apr 7, 2025
…obAgent issue with exception raised during submission
@DIRACGridBot
Copy link
Copy Markdown

Sweep summary

Sweep ran in https://github.com/DIRACGrid/DIRAC/actions/runs/14305428060

Successful:

  • integration

maxnoe pushed a commit to maxnoe/DIRAC that referenced this pull request Feb 5, 2026
…obAgent issue with exception raised during submission
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

alsoTargeting:integration Cherry pick this PR to integration after merge sweep:done All sweeping actions have been done for this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants