Skip to content

Close qmemman client with correct handler#796

Merged
marmarek merged 1 commit intoQubesOS:mainfrom
ben-grande:close-qmemman
Apr 23, 2026
Merged

Close qmemman client with correct handler#796
marmarek merged 1 commit intoQubesOS:mainfrom
ben-grande:close-qmemman

Conversation

@ben-grande
Copy link
Copy Markdown
Contributor

@ben-grande ben-grande commented Apr 6, 2026

Comment thread qubes/vm/dispvm.py Outdated
if qmemman_client:
qmemman_client.close()
if qmemman_task:
qmemman_task.close()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You sure about this? This is just a task that would return qmemman client when completed. Maybe qmemman_task.result().close() would work, but only if the task was completed already. If it was cancelled, I'm not sure how to get the client reference in this code shape...

Copy link
Copy Markdown
Contributor Author

@ben-grande ben-grande Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the upstream docs, it uses create_task() and then task.cancel(). I was reading the break_task...

About result(), I think it works for asyncio.CancelledError, but for other exceptions that didn't cancel the task, it will raise an exception.

I am going to test these possibilities to be sure, by delaying the qmemman response with a ten seconds sleep and making a call to use the preloaded disposable, which would cancel the call. Then on another run, I can make the call fail by reporting that it did fail to free memory from qube.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe qmemman_task.result().close() would work, but only if the task was completed already. If it was cancelled, I'm not sure how to get the client reference in this code shape...

qmeman_task.cancel() is already being use if preload_request_event completes/is set, so I don't think I need to use .result().

About cancelling a task twice,, as asyncio.CancelledError was already raised, yes, that seems weird.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Maybe I left the qmemman_client.close() inside asyncio.CancelledError because the cancelled task might be the breask_task, so in every failed scenario, I want the qmemman_task to be cancelled.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Traceback (most recent call last):
  File "/usr/lib/python3.13/site-packages/qubes/vm/dispvm.py", line 623, in on_domain_pre_paused
    await earliest_task
  File "/usr/lib64/python3.13/asyncio/locks.py", line 213, in wait
    await fut
asyncio.exceptions.CancelledError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/lib/python3.13/site-packages/qubes/events.py", line 243, in fire_event_async
    effect = task.result()
  File "/usr/lib/python3.13/site-packages/qubes/vm/mix/dvmtemplate.py", line 535, in on_domain_preload_dispvm_used
    await asyncio.gather(
    ...<4 lines>...
    )
  File "/usr/lib/python3.13/site-packages/qubes/vm/dispvm.py", line 777, in from_appvm
    dispvm = await cls.gen_disposable(appvm, preload=preload, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/qubes/vm/dispvm.py", line 822, in gen_disposable
    await dispvm.start()
  File "/usr/lib/python3.13/site-packages/qubes/vm/dispvm.py", line 985, in start
    await super().start(**kwargs)
  File "/usr/lib/python3.13/site-packages/qubes/vm/qubesvm.py", line 1553, in start
    await self.fire_event_async(
        "domain-start", start_guid=start_guid
    )
  File "/usr/lib/python3.13/site-packages/qubes/events.py", line 243, in fire_event_async
    effect = task.result()
  File "/usr/lib/python3.13/site-packages/qubes/vm/dispvm.py", line 586, in on_domain_started_dispvm
    await self.pause()
  File "/usr/lib/python3.13/site-packages/qubes/vm/qubesvm.py", line 1754, in pause
    await self.fire_event_async("domain-pre-paused", pre_event=True)
  File "/usr/lib/python3.13/site-packages/qubes/events.py", line 243, in fire_event_async
    effect = task.result()
  File "/usr/lib/python3.13/site-packages/qubes/vm/dispvm.py", line 631, in on_domain_pre_paused
    qmemman_task.close()
    ^^^^^^^^^^^^^^^^^^
AttributeError: '_asyncio.Future' object has no attribute 'close'

This is inside asyncio.CancelledError.

Copy link
Copy Markdown
Member

@marmarek marmarek Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear: I mean QMemmanClient.close() needs to be called at some point, not just the asyncio task (of calling QMemmanClient.set_mem()) cancelled.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You sure about this? This is just a task that would return qmemman client when completed. Maybe qmemman_task.result().close() would work, but only if the task was completed already. If it was cancelled, I'm not sure how to get the client reference in this code shape...

Hm... qmemman_task.result().close() works inside asyncio.CancelledError when cancelling the break_task and when qmemman_task was already cancelled (by completing the break_task). I will add some debugging steps to qmemman to see if it is fine, but I think it is.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up using the recommended .result().done(). I tried to break in some ways by not completing the task (any other exception), by having already cancelled the task and trying to cancel it, but it appears to work (stops the memory request and releases qmemman lock).

Copy link
Copy Markdown
Contributor Author

@ben-grande ben-grande Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear: I mean QMemmanClient.close() needs to be called at some point, not just the asyncio task (of calling QMemmanClient.set_mem()) cancelled.

The disconnection also happens when length of received data is 0, this is why it wasn't hanging on most cases before this PR and with this PR.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 7, 2026

Codecov Report

❌ Patch coverage is 0% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.13%. Comparing base (2488f2b) to head (3930269).
⚠️ Report is 11 commits behind head on main.

Files with missing lines Patch % Lines
qubes/vm/dispvm.py 0.00% 16 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #796      +/-   ##
==========================================
+ Coverage   70.12%   70.13%   +0.01%     
==========================================
  Files          61       61              
  Lines       14001    13997       -4     
==========================================
- Hits         9818     9817       -1     
+ Misses       4183     4180       -3     
Flag Coverage Δ
unittests 70.13% <0.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@qubesos-bot
Copy link
Copy Markdown

qubesos-bot commented Apr 7, 2026

OpenQA test summary

Complete test suite and dependencies: https://openqa.qubes-os.org/tests/overview?distri=qubesos&version=4.3&build=2026042316-4.3&flavor=pull-requests

Test run included the following:

New failures, excluding unstable

Compared to: https://openqa.qubes-os.org/tests/overview?distri=qubesos&version=4.3&build=2026032404-devel&flavor=update

  • system_tests_whonix

    • whonixcheck: wait_serial (wait serial expected)
      # wait_serial expected: qr/ki5k_-\d+-/...
  • system_tests_basic_vm_qrexec_gui

    • TC_20_NonAudio_debian-13-xfce: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

    • TC_20_NonAudio_fedora-43-xfce: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

    • TC_20_NonAudio_whonix-gateway-18: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

    • TC_20_NonAudio_whonix-workstation-18: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

  • system_tests_pvgrub_salt_storage

    • TC_41_HVMGrub_debian-13-xfce: test_001_standalone_vm_dracut (error)
      qubes.exc.QubesVMError: Cannot connect to qrexec agent for 120 seco...

    • TC_41_HVMGrub_fedora-43-xfce: test_000_standalone_vm (error)
      qubes.exc.QubesVMError: Cannot connect to qrexec agent for 120 seco...

    • TC_41_HVMGrub_fedora-43-xfce: test_010_template_based_vm (error)
      qubes.exc.QubesVMError: Cannot connect to qrexec agent for 120 seco...

  • system_tests_usbproxy

    • TC_20_USBProxy_core3_debian-13-xfce: test_090_attach_stubdom (error)
      qubes.exc.QubesVMError: Cannot connect to qrexec agent for 120 seco...
  • system_tests_basic_vm_qrexec_gui_zfs

    • TC_20_NonAudio_debian-13-xfce-pool: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

    • TC_20_NonAudio_fedora-43-xfce-pool: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

    • TC_20_NonAudio_whonix-gateway-18-pool: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

    • TC_20_NonAudio_whonix-workstation-18-pool: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

  • system_tests_audio@hw1

  • system_tests_qwt_win10@hw13

    • windows_clipboard_and_filecopy: unnamed test (unknown)
    • windows_clipboard_and_filecopy: Failed (test died)
      # Test died: no candidate needle with tag(s) 'personal-text-editor'...
  • system_tests_qwt_win11@hw13

    • windows_clipboard_and_filecopy: unnamed test (unknown)
    • windows_clipboard_and_filecopy: Failed (test died)
      # Test died: no candidate needle with tag(s) 'windows-Explorer-empt...
  • system_tests_guivm_gpu_gui_interactive@hw13

    • shutdown: unnamed test (unknown)
    • shutdown: Failed (test died)
      # Test died: no candidate needle with tag(s) 'text-logged-in-root' ...
  • system_tests_whonix@hw1

    • whonixcheck: fail (unknown)
      Whonixcheck for sys-whonix failed...

    • whonixcheck: Failed (test died)
      # Test died: systemcheck failed at qubesos/tests/whonixcheck.pm lin...

  • system_tests_basic_vm_qrexec_gui_btrfs

    • TC_20_NonAudio_debian-13-xfce-pool: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

    • TC_20_NonAudio_fedora-43-xfce-pool: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

    • TC_20_NonAudio_whonix-gateway-18-pool: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

    • TC_20_NonAudio_whonix-workstation-18-pool: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

  • system_tests_basic_vm_qrexec_gui_ext4

    • TC_20_NonAudio_debian-13-xfce-pool: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

    • TC_20_NonAudio_fedora-43-xfce-pool: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

    • TC_20_NonAudio_whonix-gateway-18-pool: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

    • TC_20_NonAudio_whonix-workstation-18-pool: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

  • system_tests_basic_vm_qrexec_gui_xfs

    • TC_20_NonAudio_debian-13-xfce-pool: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

    • TC_20_NonAudio_fedora-43-xfce-pool: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

    • TC_20_NonAudio_whonix-gateway-18-pool: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

    • TC_20_NonAudio_whonix-workstation-18-pool: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

  • system_tests_basic_vm_qrexec_gui@hw7

    • TC_20_NonAudio_debian-13-xfce: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

    • TC_20_NonAudio_fedora-43-xfce: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

    • TC_20_NonAudio_whonix-gateway-18: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

    • TC_20_NonAudio_whonix-workstation-18: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

Failed tests

43 failures
  • system_tests_whonix

    • whonixcheck: fail (unknown)
      Whonixcheck for sys-whonix failed...

    • whonixcheck: Failed (test died)
      # Test died: systemcheck failed at qubesos/tests/whonixcheck.pm lin...

    • whonixcheck: wait_serial (wait serial expected)
      # wait_serial expected: qr/ki5k_-\d+-/...

  • system_tests_basic_vm_qrexec_gui

    • TC_20_NonAudio_debian-13-xfce: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

    • TC_20_NonAudio_fedora-43-xfce: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

    • TC_20_NonAudio_whonix-gateway-18: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

    • TC_20_NonAudio_whonix-workstation-18: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

  • system_tests_pvgrub_salt_storage

    • TC_41_HVMGrub_debian-13-xfce: test_001_standalone_vm_dracut (error)
      qubes.exc.QubesVMError: Cannot connect to qrexec agent for 120 seco...

    • TC_41_HVMGrub_debian-13-xfce: test_010_template_based_vm (error)
      qubes.exc.QubesVMError: Cannot connect to qrexec agent for 120 seco...

    • TC_41_HVMGrub_debian-13-xfce: test_011_template_based_vm_dracut (error)
      qubes.exc.QubesVMError: Cannot connect to qrexec agent for 120 seco...

    • TC_41_HVMGrub_fedora-43-xfce: test_000_standalone_vm (error)
      qubes.exc.QubesVMError: Cannot connect to qrexec agent for 120 seco...

    • TC_41_HVMGrub_fedora-43-xfce: test_010_template_based_vm (error)
      qubes.exc.QubesVMError: Cannot connect to qrexec agent for 120 seco...

  • system_tests_usbproxy

    • TC_20_USBProxy_core3_debian-13-xfce: test_090_attach_stubdom (error)
      qubes.exc.QubesVMError: Cannot connect to qrexec agent for 120 seco...
  • system_tests_basic_vm_qrexec_gui_zfs

    • TC_20_NonAudio_debian-13-xfce-pool: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

    • TC_20_NonAudio_fedora-43-xfce-pool: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

    • TC_20_NonAudio_whonix-gateway-18-pool: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

    • TC_20_NonAudio_whonix-workstation-18-pool: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

  • system_tests_audio@hw1

  • system_tests_qwt_win10@hw13

    • windows_clipboard_and_filecopy: unnamed test (unknown)
    • windows_clipboard_and_filecopy: Failed (test died)
      # Test died: no candidate needle with tag(s) 'personal-text-editor'...
  • system_tests_qwt_win10_seamless@hw13

    • windows_install: Failed (test died)
      # Test died: Install failed with code 1 at qubesos/tests/windows_in...
  • system_tests_qwt_win11@hw13

    • windows_clipboard_and_filecopy: unnamed test (unknown)
    • windows_clipboard_and_filecopy: Failed (test died)
      # Test died: no candidate needle with tag(s) 'windows-Explorer-empt...
  • system_tests_guivm_gpu_gui_interactive@hw13

    • shutdown: unnamed test (unknown)
    • shutdown: Failed (test died)
      # Test died: no candidate needle with tag(s) 'text-logged-in-root' ...
  • system_tests_whonix@hw1

    • whonixcheck: fail (unknown)
      Whonixcheck for sys-whonix failed...

    • whonixcheck: Failed (test died)
      # Test died: systemcheck failed at qubesos/tests/whonixcheck.pm lin...

  • system_tests_basic_vm_qrexec_gui_btrfs

    • TC_20_NonAudio_debian-13-xfce-pool: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

    • TC_20_NonAudio_fedora-43-xfce-pool: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

    • TC_20_NonAudio_whonix-gateway-18-pool: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

    • TC_20_NonAudio_whonix-workstation-18-pool: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

  • system_tests_basic_vm_qrexec_gui_ext4

    • TC_20_NonAudio_debian-13-xfce-pool: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

    • TC_20_NonAudio_fedora-43-xfce-pool: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

    • TC_20_NonAudio_whonix-gateway-18-pool: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

    • TC_20_NonAudio_whonix-workstation-18-pool: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

  • system_tests_basic_vm_qrexec_gui_xfs

    • TC_20_NonAudio_debian-13-xfce-pool: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

    • TC_20_NonAudio_fedora-43-xfce-pool: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

    • TC_20_NonAudio_whonix-gateway-18-pool: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

    • TC_20_NonAudio_whonix-workstation-18-pool: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

  • system_tests_basic_vm_qrexec_gui@hw7

    • TC_20_NonAudio_debian-13-xfce: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

    • TC_20_NonAudio_fedora-43-xfce: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

    • TC_20_NonAudio_whonix-gateway-18: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

    • TC_20_NonAudio_whonix-workstation-18: test_500_gui_agent_env_sync (failure)
      AssertionError: unexpected QUBES_ENV_TEST value from session, got '...

Fixed failures

Compared to: https://openqa.qubes-os.org/tests/170766#dependencies

32 fixed
  • system_tests_network

    • system_tests: Fail (unknown)
      Tests qubes.tests.integ.network failed (exit code 1), details repor...

    • system_tests: Failed (test died)
      # Test died: Some tests failed at qubesos/tests/system_tests.pm lin...

    • VmNetworking_debian-13-xfce: test_203_fake_ip_inter_vm_allow (failure)
      ^... AssertionError: 1 != 0

  • system_tests_pvgrub_salt_storage

    • TC_41_HVMGrub_fedora-42-xfce: test_000_standalone_vm (error)
      qubes.exc.QubesVMError: Cannot connect to qrexec agent for 120 seco...

    • TC_41_HVMGrub_fedora-42-xfce: test_010_template_based_vm (error)
      qubes.exc.QubesVMError: Cannot connect to qrexec agent for 120 seco...

  • system_tests_extra

    • system_tests: Fail (unknown)
      Tests qubes.tests.extra failed (exit code 1), details reported sepa...

    • system_tests: Failed (test died)
      # Test died: Some tests failed at qubesos/tests/system_tests.pm lin...

    • TC_01_InputProxyExclude_debian-13-xfce: test_000_qemu_tablet (error)
      qubes.exc.QubesVMError: Cannot connect to qrexec agent for 120 seco...

  • system_tests_gui_interactive

    • collect_logs: wait_serial (wait serial expected)
      # wait_serial expected: qr/Dhelp-\d+-/...

    • collect_logs: Failed (test died + timed out)
      # Test died: command 'curl --form upload=@journalctl.log --form upn...

  • system_tests_network_ipv6

    • system_tests: Fail (unknown)
      Tests qubes.tests.integ.network_ipv6 failed (exit code 1), details ...

    • system_tests: Failed (test died)
      # Test died: Some tests failed at qubesos/tests/system_tests.pm lin...

    • VmIPv6Networking_fedora-42-xfce: test_113_reattach_after_provider_kill (failure)
      ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^... AssertionError: 1 != 0

  • system_tests_network_updates

    • system_tests: Fail (unknown)
      Tests qubes.tests.integ.dom0_update failed (exit code 1), details r...

    • system_tests: Failed (test died)
      # Test died: Some tests failed at qubesos/tests/system_tests.pm lin...

    • TC_00_Dom0Upgrade_whonix-gateway-18: test_020_install_wrong_sign (error)
      subprocess.CalledProcessError: Command 'timeout=120; while ! tor-ci...

    • TC_11_QvmTemplateMgmtVM_debian-13-xfce: test_000_template_list (failure)
      qvm-template: error: No matching templates to list

    • TC_11_QvmTemplateMgmtVM_debian-13-xfce: test_010_template_install (failure)
      qvm-template: error: Template 'debian-12-minimal' not found.

    • TC_11_QvmTemplateMgmtVM_fedora-42-xfce: test_000_template_list (failure)
      qvm-template: error: No matching templates to list

    • TC_11_QvmTemplateMgmtVM_fedora-42-xfce: test_010_template_install (failure)
      qvm-template: error: Template 'debian-12-minimal' not found.

    • TC_11_QvmTemplateMgmtVM_whonix-gateway-18: test_000_template_list (failure)
      qvm-template: error: No matching templates to list

    • TC_11_QvmTemplateMgmtVM_whonix-gateway-18: test_010_template_install (failure)
      qvm-template: error: Template 'debian-12-minimal' not found.

  • system_tests_kde_gui_interactive

    • collect_logs: wait_serial (wait serial expected)
      # wait_serial expected: qr/bMse8-\d+-/...

    • collect_logs: Failed (test died + timed out)
      # Test died: command 'curl --form upload=@journalctl.log --form upn...

  • system_tests_guivm_vnc_gui_interactive

    • collect_logs: wait_serial (wait serial expected)
      # wait_serial expected: qr/C_fDy-\d+-/...

    • collect_logs: Failed (test died + timed out)
      # Test died: command 'curl --form upload=@journalctl.log --form upn...

  • system_tests_audio

    • system_tests: Fail (unknown)
      Tests qubes.tests.integ.audio failed (exit code 1), details reporte...

    • system_tests: Failed (test died)
      # Test died: Some tests failed at qubesos/tests/system_tests.pm lin...

    • TC_20_AudioVM_Pulse_whonix-workstation-18: test_225_audio_rec_unmuted_hvm (failure)
      AssertionError: too short audio, expected 10s, got 7.59433106575963...

    • TC_20_AudioVM_PipeWire_debian-13-xfce: test_251_audio_playback_audiovm_pipewire_late_start (failure)
      AssertionError: pacat for test-inst-vm1 (xid 48) running(False) in ...

  • system_tests_qwt_win10@hw13

    • windows_install: Failed (test died)
      # Test died: Install failed with code 1 at qubesos/tests/windows_in...
  • system_tests_qwt_win11@hw13

    • windows_install: Failed (test died)
      # Test died: Install failed with code 1 at qubesos/tests/windows_in...

Unstable tests

Details
  • system_tests_guivm_vnc_gui_interactive

    collect_logs/Failed (1/5 times with errors)
    • job 172242 # Test died: command 'curl --form upload=@journalctl.log --form upn...
    collect_logs/wait_serial (1/5 times with errors)
    • job 172242 # wait_serial expected: qr/73DgV-\d+-/...
    collect_logs/wait_serial (1/5 times with errors)
    • job 172242 # wait_serial expected: qr/uOleL-\d+-/...
  • system_tests_guivm_gui_interactive

    collect_logs/Failed (1/5 times with errors)
    • job 172256 # Test died: command 'curl --form upload=@journalctl.log --form upn...
    collect_logs/wait_serial (1/5 times with errors)
    • job 172256 # wait_serial expected: qr/v3Six-\d+-/...
    collect_logs/wait_serial (1/5 times with errors)
    • job 172256 # wait_serial expected: qr/6OJPL-\d+-/...

Performance Tests

Performance degradation:

11 performance degradations
  • debian-13-xfce_exec: 8.47 🔻 ( previous job: 7.30, degradation: 116.01%)
  • debian-13-xfce_socket: 8.86 🔻 ( previous job: 8.02, degradation: 110.39%)
  • whonix-workstation-18_exec-data-duplex: 66.09 🔻 ( previous job: 59.45, degradation: 111.16%)
  • whonix-workstation-18_socket-data-duplex: 98.97 🔻 ( previous job: 80.77, degradation: 122.53%)
  • dom0_root_seq1m_q8t1_read 3:read_bandwidth_kb: 199809.00 🔻 ( previous job: 485002.00, degradation: 41.20%)
  • dom0_root_seq1m_q8t1_write 3:write_bandwidth_kb: 142294.00 🔻 ( previous job: 217546.00, degradation: 65.41%)
  • dom0_root_seq1m_q1t1_read 3:read_bandwidth_kb: 60295.00 🔻 ( previous job: 70705.00, degradation: 85.28%)
  • dom0_root_seq1m_q1t1_write 3:write_bandwidth_kb: 21157.00 🔻 ( previous job: 42537.00, degradation: 49.74%)
  • dom0_root_rnd4k_q32t1_write 3:write_bandwidth_kb: 804.00 🔻 ( previous job: 3011.00, degradation: 26.70%)
  • dom0_varlibqubes_seq1m_q8t1_read 3:read_bandwidth_kb: 47892.00 🔻 ( previous job: 233483.00, degradation: 20.51%)
  • dom0_varlibqubes_rnd4k_q32t1_write 3:write_bandwidth_kb: 7065.00 🔻 ( previous job: 8434.00, degradation: 83.77%)

Remaining performance tests:

100 tests
  • debian-13-xfce_exec-root: 26.82 🔻 ( previous job: 26.58, degradation: 100.92%)
  • debian-13-xfce_socket-root: 8.38 🟢 ( previous job: 8.38, improvement: 99.99%)
  • debian-13-xfce_exec-data-simplex: 64.97 🟢 ( previous job: 66.06, improvement: 98.35%)
  • debian-13-xfce_exec-data-duplex: 58.47 🟢 ( previous job: 61.22, improvement: 95.51%)
  • debian-13-xfce_exec-data-duplex-root: 73.46 🔻 ( previous job: 72.95, degradation: 100.69%)
  • debian-13-xfce_socket-data-duplex: 94.35 🔻 ( previous job: 86.03, degradation: 109.67%)
  • fedora-43-xfce_exec: 9.21
  • fedora-43-xfce_exec-root: 71.47
  • fedora-43-xfce_socket: 7.60
  • fedora-43-xfce_socket-root: 7.36
  • fedora-43-xfce_exec-data-simplex: 58.04
  • fedora-43-xfce_exec-data-duplex: 55.71
  • fedora-43-xfce_exec-data-duplex-root: 97.24
  • fedora-43-xfce_socket-data-duplex: 76.22
  • whonix-gateway-18_exec: 7.19 🟢 ( previous job: 7.69, improvement: 93.48%)
  • whonix-gateway-18_exec-root: 126.65 🟢 ( previous job: 132.16, improvement: 95.83%)
  • whonix-gateway-18_socket: 7.52 🟢 ( previous job: 8.05, improvement: 93.41%)
  • whonix-gateway-18_socket-root: 7.73 🔻 ( previous job: 7.16, degradation: 108.01%)
  • whonix-gateway-18_exec-data-simplex: 65.27 🔻 ( previous job: 64.40, degradation: 101.35%)
  • whonix-gateway-18_exec-data-duplex: 66.48 🔻 ( previous job: 63.65, degradation: 104.45%)
  • whonix-gateway-18_exec-data-duplex-root: 121.23 🟢 ( previous job: 123.30, improvement: 98.32%)
  • whonix-gateway-18_socket-data-duplex: 96.67 🟢 ( previous job: 113.72, improvement: 85.01%)
  • whonix-workstation-18_exec: 8.50 🔻 ( previous job: 8.20, degradation: 103.62%)
  • whonix-workstation-18_exec-root: 144.76 🔻 ( previous job: 138.84, degradation: 104.26%)
  • whonix-workstation-18_socket: 8.71 🔻 ( previous job: 8.19, degradation: 106.28%)
  • whonix-workstation-18_socket-root: 8.45 🟢 ( previous job: 8.92, improvement: 94.68%)
  • whonix-workstation-18_exec-data-simplex: 69.29 🔻 ( previous job: 63.54, degradation: 109.05%)
  • whonix-workstation-18_exec-data-duplex-root: 129.93 🟢 ( previous job: 139.63, improvement: 93.06%)
  • dom0_root_rnd4k_q32t1_read 3:read_bandwidth_kb: 13813.00 🟢 ( previous job: 12342.00, improvement: 111.92%)
  • dom0_root_rnd4k_q1t1_read 3:read_bandwidth_kb: 11996.00 🟢 ( previous job: 1182.00, improvement: 1014.89%)
  • dom0_root_rnd4k_q1t1_write 3:write_bandwidth_kb: 1168.00 🟢 ( previous job: 793.00, improvement: 147.29%)
  • dom0_varlibqubes_seq1m_q8t1_write 3:write_bandwidth_kb: 68674.00 🟢 ( previous job: 34913.00, improvement: 196.70%)
  • dom0_varlibqubes_seq1m_q1t1_read 3:read_bandwidth_kb: 432224.00 🟢 ( previous job: 370521.00, improvement: 116.65%)
  • dom0_varlibqubes_seq1m_q1t1_write 3:write_bandwidth_kb: 177446.00 🟢 ( previous job: 154458.00, improvement: 114.88%)
  • dom0_varlibqubes_rnd4k_q32t1_read 3:read_bandwidth_kb: 101747.00 🟢 ( previous job: 27602.00, improvement: 368.62%)
  • dom0_varlibqubes_rnd4k_q1t1_read 3:read_bandwidth_kb: 7921.00 🟢 ( previous job: 7112.00, improvement: 111.38%)
  • dom0_varlibqubes_rnd4k_q1t1_write 3:write_bandwidth_kb: 4294.00 🔻 ( previous job: 4565.00, degradation: 94.06%)
  • fedora-43-xfce_root_seq1m_q8t1_read 3:read_bandwidth_kb: 303583.00
  • fedora-43-xfce_root_seq1m_q8t1_write 3:write_bandwidth_kb: 204312.00
  • fedora-43-xfce_root_seq1m_q1t1_read 3:read_bandwidth_kb: 327270.00
  • fedora-43-xfce_root_seq1m_q1t1_write 3:write_bandwidth_kb: 63260.00
  • fedora-43-xfce_root_rnd4k_q32t1_read 3:read_bandwidth_kb: 88132.00
  • fedora-43-xfce_root_rnd4k_q32t1_write 3:write_bandwidth_kb: 2164.00
  • fedora-43-xfce_root_rnd4k_q1t1_read 3:read_bandwidth_kb: 8169.00
  • fedora-43-xfce_root_rnd4k_q1t1_write 3:write_bandwidth_kb: 669.00
  • fedora-43-xfce_private_seq1m_q8t1_read 3:read_bandwidth_kb: 413639.00
  • fedora-43-xfce_private_seq1m_q8t1_write 3:write_bandwidth_kb: 143003.00
  • fedora-43-xfce_private_seq1m_q1t1_read 3:read_bandwidth_kb: 281496.00
  • fedora-43-xfce_private_seq1m_q1t1_write 3:write_bandwidth_kb: 111597.00
  • fedora-43-xfce_private_rnd4k_q32t1_read 3:read_bandwidth_kb: 71132.00
  • fedora-43-xfce_private_rnd4k_q32t1_write 3:write_bandwidth_kb: 1578.00
  • fedora-43-xfce_private_rnd4k_q1t1_read 3:read_bandwidth_kb: 8409.00
  • fedora-43-xfce_private_rnd4k_q1t1_write 3:write_bandwidth_kb: 518.00
  • fedora-43-xfce_volatile_seq1m_q8t1_read 3:read_bandwidth_kb: 381161.00
  • fedora-43-xfce_volatile_seq1m_q8t1_write 3:write_bandwidth_kb: 74443.00
  • fedora-43-xfce_volatile_seq1m_q1t1_read 3:read_bandwidth_kb: 349176.00
  • fedora-43-xfce_volatile_seq1m_q1t1_write 3:write_bandwidth_kb: 86304.00
  • fedora-43-xfce_volatile_rnd4k_q32t1_read 3:read_bandwidth_kb: 72969.00
  • fedora-43-xfce_volatile_rnd4k_q32t1_write 3:write_bandwidth_kb: 1584.00
  • fedora-43-xfce_volatile_rnd4k_q1t1_read 3:read_bandwidth_kb: 8272.00
  • fedora-43-xfce_volatile_rnd4k_q1t1_write 3:write_bandwidth_kb: 788.00
  • debian-13-xfce_dom0-dispvm-api (mean:6.39): 76.68 🟢 ( previous job: 81.47, improvement: 94.12%)
  • debian-13-xfce_dom0-dispvm-gui-api (mean:7.732): 92.79 🔻 ( previous job: 92.38, degradation: 100.44%)
  • debian-13-xfce_dom0-dispvm-preload-2-api (mean:3.222): 38.67 🟢 ( previous job: 48.28, improvement: 80.09%)
  • debian-13-xfce_dom0-dispvm-preload-2-delay-0-api (mean:2.809): 33.71 🟢 ( previous job: 44.34, improvement: 76.03%)
  • debian-13-xfce_dom0-dispvm-preload-2-delay-minus-1d2-api (mean:3.204): 38.45 🟢 ( previous job: 54.23, improvement: 70.90%)
  • debian-13-xfce_dom0-dispvm-preload-4-api (mean:2.291): 27.49 🟢 ( previous job: 40.37, improvement: 68.09%)
  • debian-13-xfce_dom0-dispvm-preload-4-delay-0-api (mean:2.583): 30.99 🟢 ( previous job: 44.04, improvement: 70.38%)
  • debian-13-xfce_dom0-dispvm-preload-4-delay-minus-1d2-api (mean:2.424): 29.09 🟢 ( previous job: 45.36, improvement: 64.14%)
  • debian-13-xfce_dom0-dispvm-preload-2-gui-api (mean:4.496): 53.95 🟢 ( previous job: 58.18, improvement: 92.73%)
  • debian-13-xfce_dom0-dispvm-preload-4-gui-api (mean:3.838): 46.06 🔻 ( previous job: 43.54, degradation: 105.79%)
  • debian-13-xfce_dom0-dispvm-preload-6-gui-api (mean:3.421): 41.05 🟢 ( previous job: 47.37, improvement: 86.66%)
  • debian-13-xfce_dom0-vm-api (mean:0.039): 0.47 🔻 ( previous job: 0.46, degradation: 102.40%)
  • debian-13-xfce_dom0-vm-gui-api (mean:0.042): 0.50 🟢 ( previous job: 0.51, improvement: 99.80%)
  • fedora-43-xfce_dom0-dispvm-api (mean:7.394): 88.73
  • fedora-43-xfce_dom0-dispvm-gui-api (mean:9.714): 116.57
  • fedora-43-xfce_dom0-dispvm-preload-2-api (mean:3.79): 45.49
  • fedora-43-xfce_dom0-dispvm-preload-2-delay-0-api (mean:3.587): 43.05
  • fedora-43-xfce_dom0-dispvm-preload-2-delay-minus-1d2-api (mean:3.915): 46.98
  • fedora-43-xfce_dom0-dispvm-preload-4-api (mean:3.0): 36.00
  • fedora-43-xfce_dom0-dispvm-preload-4-delay-0-api (mean:2.741): 32.89
  • fedora-43-xfce_dom0-dispvm-preload-4-delay-minus-1d2-api (mean:3.042): 36.51
  • fedora-43-xfce_dom0-dispvm-preload-2-gui-api (mean:5.331): 63.97
  • fedora-43-xfce_dom0-dispvm-preload-4-gui-api (mean:4.25): 51.01
  • fedora-43-xfce_dom0-dispvm-preload-6-gui-api (mean:3.688): 44.26
  • fedora-43-xfce_dom0-vm-api (mean:0.039): 0.47
  • fedora-43-xfce_dom0-vm-gui-api (mean:0.043): 0.52
  • whonix-workstation-18_dom0-dispvm-api (mean:8.259): 99.11 🟢 ( previous job: 114.77, improvement: 86.35%)
  • whonix-workstation-18_dom0-dispvm-gui-api (mean:10.245): 122.94 🟢 ( previous job: 127.27, improvement: 96.59%)
  • whonix-workstation-18_dom0-dispvm-preload-2-api (mean:4.193): 50.32 🟢 ( previous job: 70.96, improvement: 70.91%)
  • whonix-workstation-18_dom0-dispvm-preload-2-delay-0-api (mean:4.102): 49.23 🟢 ( previous job: 65.29, improvement: 75.40%)
  • whonix-workstation-18_dom0-dispvm-preload-2-delay-minus-1d2-api (mean:5.013): 60.16 🟢 ( previous job: 74.32, improvement: 80.95%)
  • whonix-workstation-18_dom0-dispvm-preload-4-api (mean:3.171): 38.06 🟢 ( previous job: 57.74, improvement: 65.91%)
  • whonix-workstation-18_dom0-dispvm-preload-4-delay-0-api (mean:3.494): 41.93 🟢 ( previous job: 65.76, improvement: 63.76%)
  • whonix-workstation-18_dom0-dispvm-preload-4-delay-minus-1d2-api (mean:3.573): 42.87 🟢 ( previous job: 59.80, improvement: 71.69%)
  • whonix-workstation-18_dom0-dispvm-preload-2-gui-api (mean:5.992): 71.91 🟢 ( previous job: 78.19, improvement: 91.97%)
  • whonix-workstation-18_dom0-dispvm-preload-4-gui-api (mean:4.869): 58.43 🟢 ( previous job: 65.73, improvement: 88.90%)
  • whonix-workstation-18_dom0-dispvm-preload-6-gui-api (mean:3.848): 46.17 🟢 ( previous job: 61.35, improvement: 75.26%)
  • whonix-workstation-18_dom0-vm-api (mean:0.031): 0.37 🟢 ( previous job: 0.58, improvement: 63.89%)
  • whonix-workstation-18_dom0-vm-gui-api (mean:0.046): 0.55 🟢 ( previous job: 0.62, improvement: 87.80%)

Comment thread qubes/vm/dispvm.py Outdated
if qmemman_client:
qmemman_client.close()
if qmemman_task:
qmemman_task.result().close()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if earliest_task == break_task, this doesn't work, because qmemman_task is cancelled. With added try/except around this line I get:

Apr 12 16:26:13 dom0 qubesd[10498]: Traceback (most recent call last):
Apr 12 16:26:13 dom0 qubesd[10498]:   File "/usr/lib/python3.13/site-packages/qubes/vm/dispvm.py", line 634, in on_domain_pre_paused
Apr 12 16:26:13 dom0 qubesd[10498]:     await earliest_task
Apr 12 16:26:13 dom0 qubesd[10498]: asyncio.exceptions.CancelledError
Apr 12 16:26:13 dom0 qubesd[10498]: During handling of the above exception, another exception occurred:
Apr 12 16:26:13 dom0 qubesd[10498]: Traceback (most recent call last):
Apr 12 16:26:13 dom0 qubesd[10498]:   File "/usr/lib/python3.13/site-packages/qubes/vm/dispvm.py", line 643, in on_domain_pre_paused
Apr 12 16:26:13 dom0 qubesd[10498]:     qmemman_task.result().close()
Apr 12 16:26:13 dom0 qubesd[10498]:     ~~~~~~~~~~~~~~~~~~~^^
Apr 12 16:26:13 dom0 qubesd[10498]: asyncio.exceptions.CancelledError

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if earliest_task == break_task, this doesn't work, because qmemman_task is cancelled. With added try/except around this line I get:

On this comment: #796 (comment), I expected it to fail also, but it succeeded.

So what do you mean with added try/except? Is this:

        except asyncio.CancelledeError:
            if qmemman_task:
                try:
                    qmemman_task.result().close()
                except:
                    raise

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing this...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, not just raise , log, exception here seems to get lost.

@marmarek
Copy link
Copy Markdown
Member

Looking at it closer, I think there is an issue with how this is called in the non-requested case. set_mem() will instruct qmemman to reduce memory of the qube. So far so good. But as soon as the client disconnects, qmemman will start balancing memory again, likely giving just reduced qube more memory again. IIUC that disconnect happens currently only implicitly, when garbage collector releases qmemman_client object, which is IMO quite fragile. Theoretically, the qmemman socket should be kept open until after the qube is paused. In practice, the qube is paused fast enough so it doesn't change much.

See how similar qmemman handling is done during qube startup: first qubesd request to free some memory for the new qube, then starts it and only then close the qmemman socket - otherwise qmemman would reuse that freed memory and there wouldn't be space for starting that new qube anymore.

But here, it isn't a single function, it would be two separate event handlers - domain-pre-paused and domain-paused. So, qmemman_client instance would need to be passed from one to the other somehow. Maybe on the VM object as some attribute? If done in set_mem() itself, it would allow fixing also the other issue, as you wouldn't need a result of a (cancelled) task anymore.

@marmarek
Copy link
Copy Markdown
Member

But the flip side of the "store qmemman_client reference in an attribute" is that, if something goes wrong and close() isn't called, garbage collector won't save you anymore. So, it wouldn't be "delayed for 20min" anymore, but qmemman would stop doing anything forever (which includes breaking starting further qubes). Or at least until responsible qube is removed (or maybe until qubesd or qmemman restart).

@ben-grande
Copy link
Copy Markdown
Contributor Author

But here, it isn't a single function, it would be two separate event handlers - domain-pre-paused and domain-paused. So, qmemman_client instance would need to be passed from one to the other somehow. Maybe on the VM object as some attribute? If done in set_mem() itself, it would allow fixing also the other issue, as you wouldn't need a result of a (cancelled) task anymore.

I remember you voicing your concern about qmemman trying to give memory back to the domain before it being paused, when developing this feature last year, so, uh, I was expecting this topic to be revived at some point. I do agree that we are relying on implicit (as it just works) behavior rather than something guaranteed, which would be nicer. I think attribute works well enough for this case.

But the flip side of the "store qmemman_client reference in an attribute" is that, if something goes wrong and close() isn't called, garbage collector won't save you anymore. So, it wouldn't be "delayed for 20min" anymore, but qmemman would stop doing anything forever (which includes breaking starting further qubes). Or at least until responsible qube is removed (or maybe until qubesd or qmemman restart).

The connection can seize when the client sends empty reply:

self.log.info("client disconnected, resuming membalance")
, which is what happens when the task is cancelled / connection broken. So I think attribute can work and I also think that taking qmemman "hostage" till the qube is paused is okay as paused usually takes ten ms and I don't remember a pause failure at all, so qmemman is not indefinitely locked.

@ben-grande
Copy link
Copy Markdown
Contributor Author

ben-grande commented Apr 14, 2026

Set qmemman log level to info.

Letting the preload be be paused (not requesting it before pause)

INFO: vm.disp3373: Preload startup completed 'qubes.WaitForRunningSystem'
INFO: vm.disp3373: Setting qube memory to pref mem
do_balloon_dom(dom_memset={'18': 0})
mem requested for dom '18' is 0, using its pref '398119321' while actual mem is '4194304000' (10.54x)
mem-set domain 18 to 398119321
...
mem-set domain 18 to 398119321
WARNING: vm.disp3373: Cancelling wait for preload
WARNING: vm.disp3373: Cancelled qmemman on asyncio.CancelledError
client disconnected, resuming membalance
balance_when_enough_mem(xen_free_mem=36188021536, total_mem_pref=3760358680, total_available_mem=45537320456)
stat: dom '0' act=4294967296 pref=2101949721 last_target=4294967296
stat: dom '5' act=4194304000 pref=475978547 last_target=4194304000
stat: dom '6' act=4194304000 pref=784311091 last_target=4194304000
stat: dom '18' act=426082304 pref=398119321 last_target=398119321
stat: xenfree=36240450336 memset_reqs=[('0', 4294967296), ('5', 4194304000), ('6', 4194304000), ('18', 4194304000)]
mem-set domain 0 to 4294967296
mem-set domain 5 to 4194304000
mem-set domain 6 to 4194304000
mem-set domain 18 to 4194304000
INFO: vm.disp3373: Preloading completed
INFO: vm.disp3373: Paused preloaded qube
  • We can notice that it does try rebalance domain 18 up to 4194304000 before pause event notifies it has finished. Yes, this should be guarded against...
  • Cancelling the break task does cancel qmemman

Requesting the preload mid domain-pre-pause, mid qmemman pref mem

INFO: vm.disp4946: Preload startup completed 'qubes.WaitForRunningSystem'
INFO: vm.disp4946: Setting qube memory to pref mem
do_balloon_dom(dom_memset={'19': 0})
mem requested for dom '19' is 0, using its pref '399929753' while actual mem is '4194304000' (10.49x)
mem-set domain 19 to 399929753
...
mem-set domain 19 to 399929753
INFO: vm.disp4946: Requesting preloaded qube
INFO: vm.default-dvm: Removing qube(s) from preloaded list because qube was requested: 'disp4946'
INFO: vm.disp4946: Waiting preload completion with '72' seconds timeout
WARNING: vm.disp4946: Cancelling qmemman
INFO: vm.disp4946: Preloading completed
INFO: vm.disp4946: Using preloaded qube
INFO: Updating appmenus for 'disp4946' in 'dom0'
INFO: vm.default-dvm: Received preload event 'used' for dispvm 'disp4946' with a delay of 3.0 second(s)
mem-set domain 19 to 399929753
...
mem-set domain 19 to 399929753
client disconnected, resuming membalance

It just has WARNING: vm.disp4946: Cancelling qmemman, and it doesn't seem to cancel qmeman at that point, just later on due to lack of data received.

@marmarek
Copy link
Copy Markdown
Member

marmarek commented Apr 15, 2026

It just has WARNING: vm.disp4946: Cancelling qmemman, and it doesn't seem to cancel qmeman at that point, just later on due to lack of data received.

I don't think the current implementation of qmemman checks if client disconnected mid-action. Not ideal, but also not a huge problem.

@ben-grande
Copy link
Copy Markdown
Contributor Author

It just has WARNING: vm.disp4946: Cancelling qmemman, and it doesn't seem to cancel qmeman at that point, just later on due to lack of data received.

I don't think the current implementation of qmemman checks if client disconnected mid-action. Not ideal, but also not a huge problem.

I moved the instantiation of QMemmanClient to qubes.vm.dispvm to avoid it hanging from QubesVM.set_mem, and self.sock.close() does not forcefully close. I tried self.sock.shutdown(...) with socket.SHUT_* options but same result, doesn't forcefully close. Depending on the option, it may close the client early, but the server will still be calling mem-set.

I didn't find a nice solution, I saw that something can maybe be done with select or settimeout. setblocking(0) doesn't seem much useful alone, has to be used with other options.

Looking at it closer, I think there is an issue with how this is called in the non-requested case. set_mem() will instruct qmemman to reduce memory of the qube. So far so good. But as soon as the client disconnects, qmemman will start balancing memory again, likely giving just reduced qube more memory again. IIUC that disconnect happens currently only implicitly, when garbage collector releases qmemman_client object, which is IMO quite fragile. Theoretically, the qmemman socket should be kept open until after the qube is paused. In practice, the qube is paused fast enough so it doesn't change much.

See how similar qmemman handling is done during qube startup: first qubesd request to free some memory for the new qube, then starts it and only then close the qmemman socket - otherwise qmemman would reuse that freed memory and there wouldn't be space for starting that new qube anymore.

But here, it isn't a single function, it would be two separate event handlers - domain-pre-paused and domain-paused. So, qmemman_client instance would need to be passed from one to the other somehow. Maybe on the VM object as some attribute? If done in set_mem() itself, it would allow fixing also the other issue, as you wouldn't need a result of a (cancelled) task anymore.

So it seems that the client cannot easily forcefully close the connection nor hold it before ballooning again.

I am testing using this script:

import asyncio
import qubes
import qubes.qmemman.client

async def main():
    app = qubes.Qubes()
    domains = app.domains
    disp = domains["disp3668"]
    break_task = asyncio.create_task(asyncio.sleep(0.0003))
    qmemman_client = qubes.qmemman.client.QMemmanClient()
    #qmemman_task = asyncio.get_running_loop().run_in_executor(None, disp.set_mem)
    qmemman_task = asyncio.get_running_loop().run_in_executor(None, qmemman_client.set_mem, {disp.xid: 0})
    tasks = [break_task, qmemman_task]
    try:
        async for earliest_task in asyncio.as_completed(tasks):
            await earliest_task
            if earliest_task == break_task:
                print("Finished break, cancelling qmemman")
                qmemman_task.cancel()
            else:
                print("Finished qmemman, cancelling break")
                break_task.cancel()
    except asyncio.CancelledError:
        pass
    finally:
        print("Final")
        if qmemman_client.sock:
            qmemman_client.shutdown()

    result = None
    cancelled = False
    try:
        #result = await qmemman_task
        result = qmemman_task.result()
    except asyncio.CancelledError:
        cancelled = True
    finally:
        #qmemman_client.close()
        if not result or cancelled:
            disp.log.warning("Failed to set memory")

if __name__ == "__main__":
    asyncio.run(main())

Where QMemmanClient.shutdown is shutdown+close().

@marmarek
Copy link
Copy Markdown
Member

does not forcefully close

Well, closing the socket does not interrupt the function on the other end of it. qmemman will notice only when it will interact with the socket next time. For interrupting the call, qmemman would need some monitoring of the socket state during do_balloon_dom call. It's not going to be pretty with the current shape of qmemman (might be easier if qmemman would use asyncio, but that's a big change beyond the scope of this PR). But I think you can ignore this problem and accept that interrupting set_mem isn't possible (maybe come back to this issue later). The critical part is to not leave the socket open after the operation completes, regardless of the exit path (so, either after dispvm gets requested, or after pausing it).

@ben-grande
Copy link
Copy Markdown
Contributor Author

maybe come back to this issue later

Right, it ended up being bigger changes that what I expected and I got lost trying to fix it all at once. Will focus just on closing the socket properly. I cannot use task.result() because if the task was cancelled, it returns CancelledError instead of QMemmanClient. This is why I had to instantiate from the DispVM so I could get the class instance to close the socket.

@ben-grande
Copy link
Copy Markdown
Contributor Author

For interrupting the call, qmemman would need some monitoring of the socket state during do_balloon_dom call. It's not going to be pretty with the current shape of qmemman (might be easier if qmemman would use asyncio, but that's a big change beyond the scope of this PR).

What about writing to a file or modify DomainState somehow, so that do_balloon_dom stops looping if a condition is met. This way, we can also prohibit balloon between domain-pre-paused and domain-paused, and allow balloon at the end of domain-paused.

@ben-grande ben-grande force-pushed the close-qmemman branch 3 times, most recently from fed104c to ec3965e Compare April 21, 2026 23:30
@ben-grande
Copy link
Copy Markdown
Contributor Author

Haven't tested latest commit. For another day.

@marmarek
Copy link
Copy Markdown
Member

What about writing to a file or modify DomainState somehow, so that do_balloon_dom stops looping if a condition is met. This way, we can also prohibit balloon between domain-pre-paused and domain-paused, and allow balloon at the end of domain-paused.

We used to do file-based trigger, it's messy. For example if two DispVMs would preload in parallel.

@ben-grande
Copy link
Copy Markdown
Contributor Author

We used to do file-based trigger, it's messy. For example if two DispVMs would preload in parallel.

What about a file that contains the domid, to remove it from balancing/ballooning? Tested now and it's working. Currently using 2 files for different purposes:

  • one for cancelling the current balloon in case the preload was requested
  • one for preventing balance/balloon until the preload is requested, which means it will wait for pause

@marmarek
Copy link
Copy Markdown
Member

What about a file

I just gave one example out of many what could go wrong... No, I don't want file based interface for controlling stuff. If you want more examples:

  • you'd need some locking/synchronization to ensure data is not interpreted mid-write
  • restarting (or crashing) a service at the wrong time will leave inconsistent state (possibly preventing further ballooning)

We've been there. I don't want to keep finding new failure modes for the next several months again...

QMemmanClient is instantiated from the same method that is calling it
as a task, so it guarantees access to the close() method, else, it may
not be able to close the connection properly if the task is cancelled,
as the result() will not contain the instance, but CancelledError.

For: QubesOS/qubes-issues#1512
@ben-grande
Copy link
Copy Markdown
Contributor Author

Dropped the file based condition. Will leave for another future future PR. Opened separate issues:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants