Skip to content

[Bug]: Runpod, Kubernetes: it's possible to delete a volume in use #3789

@un-def

Description

@un-def

Steps to reproduce

# volume.dstack.yml
type: volume
name: volume-runpod
backend: runpod
region: eu-nl-1
size: 10GB
# run.dstack.yml
name: dev-environment
ide: vscode
volumes:
  - volume-runpod:/volume
  1. dstack apply -f volume.dstack.yml
  2. dstack apply -f run.dstack.yml
  3. dstack volume delete volume-runpod

Actual behaviour

If run status = submitted:

Error (Volume error)
Volume ['volume-runpod'] is marked for deletion and cannot be attached

If run status = provisioning:

Server processing gets stuck in a loop (see below), the run stays in the provisioning state

Expected behaviour

No response

dstack version

770eaf8

Server logs

If run status = submitted:

INFO     dstack._internal.server.services.volumes:334 Deleting volumes: ['volume-runpod']
INFO     dstack._internal.server.services.events:205 Emitting event: Volume marked for deletion. Event targets:
         volume(88fa96)volume-runpod. Actor: user(efa6c3)dmitry-local-admin
DEBUG    dstack._internal.server.background.pipeline_tasks.base:357 Processing jobs item fcdb64de-c679-43ec-8057-a084f06510d8
DEBUG    dstack._internal.server.background.pipeline_tasks.jobs_submitted:316 job(fcdb64)dev-environment-0-0: provisioning has
         started
WARNING  dstack._internal.server.background.pipeline_tasks.jobs_submitted:809 job(fcdb64)dev-environment-0-0: failed to prepare run
         volumes: ServerClientError("Volume ['volume-runpod'] is marked for deletion and cannot be attached")
INFO     dstack._internal.server.services.events:205 Emitting event: Job status changed SUBMITTED -> TERMINATING. Termination
         reason: VOLUME_ERROR (Volume ['volume-runpod'] is marked for deletion and cannot be attached). Event targets:
         job(fcdb64)dev-environment-0-0. Actor: system
DEBUG    dstack._internal.server.background.pipeline_tasks.base:364 Processed jobs item fcdb64de-c679-43ec-8057-a084f06510d8 in
         0.029
DEBUG    dstack._internal.server.background.pipeline_tasks.base:357 Processing runs item 18611f36-b953-428b-9465-8fba70d1cebd
INFO     dstack._internal.server.services.events:205 Emitting event: Run status changed SUBMITTED -> TERMINATING. Termination
         reason: JOB_FAILED. Event targets: run(18611f)dev-environment. Actor: system

If run status = provisioning:

INFO     dstack._internal.server.services.volumes:334 Deleting volumes: ['volume-runpod']
INFO     dstack._internal.server.services.events:205 Emitting event: Volume marked for deletion. Event targets:
         volume(0f2037)volume-runpod. Actor: user(efa6c3)dmitry-local-admin
DEBUG    dstack._internal.server.background.pipeline_tasks.base:357 Processing runs item 08b2d619-6350-46fc-9e5e-58f109940397
DEBUG    dstack._internal.server.background.pipeline_tasks.base:364 Processed runs item 08b2d619-6350-46fc-9e5e-58f109940397 in
DEBUG    dstack._internal.server.background.pipeline_tasks.base:357 Processing volumes item 0f2037a5-5e28-4fd4-9a4a-b49019b588b6
ERROR    dstack._internal.server.background.pipeline_tasks.volumes:408 Got exception when deleting volume volume-runpod. Please
         terminate it manually to avoid unexpected charges.
         Traceback (most recent call last):
           File "/home/def/dev/dstack/src/dstack/_internal/server/background/pipeline_tasks/volumes.py", line 402, in
         _process_to_be_deleted_volume
             await run_async(
             ...<2 lines>...
             )
           File "/home/def/dev/dstack/src/dstack/_internal/utils/common.py", line 50, in run_async
             return await asyncio.get_running_loop().run_in_executor(None, func_with_args)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           File "/home/def/.local/share/uv/python/cpython-3.13.9-linux-x86_64-gnu/lib/python3.13/concurrent/futures/thread.py",
         line 59, in run
             result = self.fn(*self.args, **self.kwargs)
           File "/home/def/dev/dstack/src/dstack/_internal/core/backends/runpod/compute.py", line 435, in delete_volume
             self.api_client.delete_network_volume(volume_id=volume.volume_id)
             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           File "/home/def/dev/dstack/src/dstack/_internal/core/backends/runpod/api_client.py", line 281, in delete_network_volume
             self._make_request(
             ~~~~~~~~~~~~~~~~~~^
                 {
                 ^
             ...<9 lines>...
                 }
                 ^
             )
             ^
           File "/home/def/dev/dstack/src/dstack/_internal/core/backends/runpod/api_client.py", line 366, in _make_request
             raise RunpodApiClientError(errors=response_json["errors"])
         dstack._internal.core.backends.runpod.api_client.RunpodApiClientError: [{'message': 'You must remove this network volume
         from all pods before deleting it.', 'path': ['deleteNetworkVolume'], 'extensions': {'code': 'RUNPOD'}}]
INFO     dstack._internal.server.services.events:205 Emitting event: Volume deleted. Event targets: volume(0f2037)volume-runpod.
         Actor: system
DEBUG    dstack._internal.server.background.pipeline_tasks.base:364 Processed volumes item 0f2037a5-5e28-4fd4-9a4a-b49019b588b6 in
         0.397
DEBUG    dstack._internal.server.background.pipeline_tasks.base:357 Processing jobs item 1c050618-df47-4088-853e-bcfd62c78a42
ERROR    dstack._internal.server.background.pipeline_tasks.base:361 Unexpected exception when processing item
         Traceback (most recent call last):
           File "/home/def/dev/dstack/src/dstack/_internal/server/background/pipeline_tasks/base.py", line 359, in start
             await self.process(item)
           File "/home/def/dev/dstack/src/dstack/_internal/server/utils/sentry_utils.py", line 28, in wrapper
             return await f(*args, **kwargs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^
           File "/home/def/dev/dstack/src/dstack/_internal/server/background/pipeline_tasks/jobs_running.py", line 301, in process
             result = await _process_running_job(context=context)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           File "/home/def/dev/dstack/src/dstack/_internal/server/background/pipeline_tasks/jobs_running.py", line 424, in
         _process_running_job
             startup_context = await _prepare_startup_context(context=context, result=result)
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           File "/home/def/dev/dstack/src/dstack/_internal/server/background/pipeline_tasks/jobs_running.py", line 477, in
         _prepare_startup_context
             volumes = await get_job_attached_volumes(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
             ...<5 lines>...
             )
             ^
           File "/home/def/dev/dstack/src/dstack/_internal/server/services/jobs/__init__.py", line 475, in get_job_attached_volumes
             job_configured_volumes = await get_job_configured_volumes(
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
             ...<4 lines>...
             )
             ^
           File "/home/def/dev/dstack/src/dstack/_internal/server/services/jobs/__init__.py", line 387, in
         get_job_configured_volumes
             volume_models = await get_job_configured_volume_models(
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
             ...<5 lines>...
             )
             ^
           File "/home/def/dev/dstack/src/dstack/_internal/server/services/jobs/__init__.py", line 432, in
         get_job_configured_volume_models
             raise ResourceNotExistsError(f"Volume {mount_point.name} not found")
         dstack._internal.core.errors.ResourceNotExistsError: Volume ['volume-runpod'] not found

And ResourceNotExistsError is then repeated again and again

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions