Skip to content

Improve host_list performance#754

Closed
David-5-1 wants to merge 1 commit intofurlongm:mainfrom
David-5-1:host_list_perf
Closed

Improve host_list performance#754
David-5-1 wants to merge 1 commit intofurlongm:mainfrom
David-5-1:host_list_perf

Conversation

@David-5-1
Copy link
Copy Markdown

We had a crash when trying to access Hosts page on Patchman 4 installed with MySQL as MySQL was trying to write more than 10 GB of temporary tables to disk, filling up /tmp and resulting in query crash because of the lack of free space.

This patch is a courtesy of ChatGPT (with minor edits) but seems correct even from the results I get when running it on out Patchman instance. And performance seems decent (< 2 seconds load page with 1000+ hosts).

@furlongm
Copy link
Copy Markdown
Owner

I tried to fix this more generally in #757 to improve count performance everywhere.

If you want to check if it fixes your issues, there are debs/rpms to test it here: https://github.com/furlongm/patchman/releases/tag/v4.0.10-dev1

It will run db migrations though so maybe run on a test/backup db, or keep a backup of your current db prior to installing it.

@David-5-1
Copy link
Copy Markdown
Author

Thank you, I confirm it works fine in our setup with the version you linked so I drop this PR.

@David-5-1 David-5-1 closed this Feb 11, 2026
@furlongm
Copy link
Copy Markdown
Owner

Thanks for confirming - another prerelease here with even more sql optimizations and a few bug fixes. Let me know if you have any issues with it:

https://github.com/furlongm/patchman/releases/tag/v4.0.10-dev3

@furlongm
Copy link
Copy Markdown
Owner

And another release (some more optimizations): https://github.com/furlongm/patchman/releases/tag/v4.0.10-dev4

@David-5-1
Copy link
Copy Markdown
Author

David-5-1 commented Mar 10, 2026

Sorry, I only saw yesterday evening that you had further replied to this thread :/

With the latest stable version, we have a few reports that fail to be processed but I fail to get any logs or details on what went wrong.

And we have poor performances in Host.get_host_repo_packages, with a very long SQL request which takes 105 seconds on one example with :

patchman-manage shell_plus --print-sql
from hosts.tasks import find_host_updates
find_host_updates(3142)

I applyed a patch from chatgpt but I am not completely sure it is correct :

index 2d8f4f1..9fbc123 100644
--- a/hosts/models.py
+++ b/hosts/models.py
@@ -163,21 +163,45 @@ class Host(models.Model):

         def get_host_repo_packages(self):
-        if self.host_repos_only:
-            hostrepos_q = Q(mirror__repo__in=self.repos.all(),
-                            mirror__enabled=True,
-                            mirror__repo__enabled=True,
-                            mirror__repo__hostrepo__enabled=True)
-        else:
-            hostrepos_q = \
-                Q(mirror__repo__osrelease__osvariant__host=self,
-                  mirror__repo__arch=self.arch,
-                  mirror__enabled=True,
-                  mirror__repo__enabled=True) | \
-                Q(mirror__repo__in=self.repos.all(),
-                  mirror__enabled=True,
-                  mirror__repo__enabled=True)
-        return Package.objects.select_related().filter(hostrepos_q).distinct()
+        """
+        Return packages available in repositories assigned to this host.
+
+        This implementation avoids the very expensive multi-table JOIN +
+        DISTINCT used previously, which could take >100s on large repos.
+        """
+
+        # Determine repository IDs enabled for this host
+        if self.host_repos_only:
+            repo_ids = (
+                self.repos
+                .filter(
+                    hostrepo__enabled=True,
+                    enabled=True,
+                    mirror__enabled=True,
+                )
+                .values_list("id", flat=True)
+            )
+        else:
+            repo_ids = Repository.objects.filter(
+                Q(osrelease__osvariant__host=self, arch=self.arch) |
+                Q(hostrepo__host=self),
+                enabled=True,
+            ).values_list("id", flat=True)
+
+        # Restrict to package names already installed on the host
+        # This dramatically reduces the dataset size.
+        host_package_names = self.packages.values_list("name_id", flat=True)
+
+        return (
+            Package.objects
+            .filter(
+                mirrorpackage__mirror__repo_id__in=repo_ids,
+                mirrorpackage__mirror__enabled=True,
+                name_id__in=host_package_names,
+            )
+            .only(
+                "id", "name_id", "arch_id", "version", "release",
+                "epoch", "packagetype", "category_id"
+            )
+        )

But it clearly targets the slow query I had.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants