Improving Pose.cache dictionary getter and setter performance#658
Open
klimaj wants to merge 3 commits intoRosettaCommons:mainfrom
Open
Improving Pose.cache dictionary getter and setter performance#658klimaj wants to merge 3 commits intoRosettaCommons:mainfrom
Pose.cache dictionary getter and setter performance#658klimaj wants to merge 3 commits intoRosettaCommons:mainfrom
Conversation
lyskov
approved these changes
Apr 29, 2026
ajasja
approved these changes
May 4, 2026
Member
|
LGTM :) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR aims to improve the performance of
Pose.cachedictionary data accessors. Several code pathways run with O(N^2) (quadratic time complexity) behavior, and new functionally equivalent fast data accessor methods are introduced to run with O(N) (linear time complexity) behavior:Pose.cache.fast_items()Pose.cache.fast_values()Pose.cache.metrics.fast_items()Pose.cache.metrics.fast_values()Pose.cache.metrics.real.fast_items()Pose.cache.metrics.string.fast_values()Pose.cache.metrics.composite_real.fast_items()Pose.cache.metrics.composite_real.fast_values()Pose.cache.metrics.composite_string.fast_items()Pose.cache.metrics.composite_string.fast_values()Pose.cache.metrics.per_residue_real.fast_items()Pose.cache.metrics.per_residue_real.fast_values()Pose.cache.metrics.per_residue_string.fast_items()Pose.cache.metrics.per_residue_string.fast_values()Pose.cache.metrics.per_residue_probabilities.fast_items()Pose.cache.metrics.per_residue_probabilities.fast_values()Pose.cache.extra.fast_items()Pose.cache.extra.fast_values()Pose.cache.extra.real.fast_items()Pose.cache.extra.real.fast_values()Pose.cache.extra.string.fast_items()Pose.cache.extra.string.fast_values()Pose.cache.energies.fast_items()Pose.cache.energies.fast_values()Users must update their API calls to take advantage of these upgrades:
dict(pose.cache)->dict(pose.cache.fast_items()), etc. These improvements are only really noticable when there are hundreds to thousands of scores cached in thePose.cachedictionary. The basis for the performance improvement is the following:dict(pose.cache)relies on__iter__(returnspose.cache.all) +__getitem__(key)(returnsmaybe_decode(pose.cache.all[key])), where it materializes the full scores dictionary for each key (O(N^2)).dict(pose.cache.fast_items())relies on simplyfor k, v in pose.cache.all.items(); yield k, maybe_decode(v), so the full scores dictionary is materialized once for all keys (O(N)).Pose.scoresdictionary (notescoresnotcache) has always performed with quadratic time complexity (O(N^2)), and does not containPose.scores.fast_items()orPose.scores.fast_values()methods.This PR also makes the
Pose.cache.all_scoresproperty run with O(N) behavior, and removes an unnecessary argument from a private method:self._has_sm_data(pose)->self._has_sm_data().Additionally, this PR provides two new fast setter methods for mappables (avoiding the relatively slow
Pose.cache.metricscleanup after each item is set with__setitem__, and instead only performing one cleanup at the end):Pose.cache.metrics.real.set_mappable()Pose.cache.metrics.string.set_mappable()Micro-updates to the
PyRosettaClusterinterface are made to take advantage of these performance improvements.