This check iterates over the list existing_relationships up to 2 times:
|
def check_if_relationship_exists( |
|
self, relationship: Relationship, existing_relationships: List[Relationship] |
|
) -> bool: |
|
# assume existing relationships are stripped of comments |
|
if relationship in existing_relationships: |
|
return True |
|
relationship_inverted: Relationship = self.invert_relationship(relationship) |
|
if relationship_inverted in existing_relationships: |
|
return True |
|
|
|
return False |
And it's called for every file:
|
for file_spdx_id in contained_files: |
|
try: |
|
contains_relationship = Relationship( |
|
spdx_element_id=package_spdx_id, |
|
relationship_type=RelationshipType.CONTAINS, |
|
related_spdx_element_id=file_spdx_id, |
|
) |
|
except ConstructorTypeErrors as err: |
|
logger.append(err.get_messages()) |
|
continue |
|
if not self.check_if_relationship_exists( |
|
relationship=contains_relationship, existing_relationships=existing_relationships |
|
): |
|
contains_relationships.append(contains_relationship) |
So if F is the number of files and R is the number of relationships, we are doing O(F * R) comparisons of relationships (which seem not cheap individually).
And given that this seems to expect to find a relationship for every file, we know that R is >= F, so this is at least O(F^2).
Looks quadratic to me.
With the caveat that I'm not a python expert, this being a list seems to be the issue:
|
existing_relationships_without_comments: List[Relationship] = self.get_all_relationships_without_comments( |
|
relationships |
|
) |
Is it possible to turn that into a set so these membership tests are O(1) instead of O(N)?
This check iterates over the list
existing_relationshipsup to 2 times:tools-python/src/spdx_tools/spdx/parser/jsonlikedict/relationship_parser.py
Lines 162 to 172 in 8050fd9
And it's called for every file:
tools-python/src/spdx_tools/spdx/parser/jsonlikedict/relationship_parser.py
Lines 144 to 157 in 8050fd9
So if
Fis the number of files andRis the number of relationships, we are doingO(F * R)comparisons of relationships (which seem not cheap individually).And given that this seems to expect to find a relationship for every file, we know that
R is >= F, so this is at leastO(F^2).Looks quadratic to me.
With the caveat that I'm not a python expert, this being a list seems to be the issue:
tools-python/src/spdx_tools/spdx/parser/jsonlikedict/relationship_parser.py
Lines 55 to 57 in 8050fd9
Is it possible to turn that into a set so these membership tests are O(1) instead of O(N)?