Slow for SBOMs with a large number of files + relationships

This check iterates over the list `existing_relationships` up to 2 times: https://github.com/spdx/tools-python/blob/8050fd9c41a92c75ec2ba9eb10ed9a919c375fa9/src/spdx_tools/spdx/parser/jsonlikedict/relationship_parser.py#L162-L172

And it's called for every file: https://github.com/spdx/tools-python/blob/8050fd9c41a92c75ec2ba9eb10ed9a919c375fa9/src/spdx_tools/spdx/parser/jsonlikedict/relationship_parser.py#L144-L157

So if `F` is the number of files and `R` is the number of relationships, we are doing `O(F * R)` comparisons of relationships (which seem not cheap individually).

And given that this seems to expect to find a relationship for every file, we know that `R is >= F`, so this is at least `O(F^2)`.

Looks quadratic to me.

With the caveat that I'm not a python expert, this being a list seems to be the issue: https://github.com/spdx/tools-python/blob/8050fd9c41a92c75ec2ba9eb10ed9a919c375fa9/src/spdx_tools/spdx/parser/jsonlikedict/relationship_parser.py#L55-L57

Is it possible to turn that into a set so these membership tests are O(1) instead of O(N)?

	def check_if_relationship_exists(
	self, relationship: Relationship, existing_relationships: List[Relationship]
	) -> bool:
	# assume existing relationships are stripped of comments
	if relationship in existing_relationships:
	return True
	relationship_inverted: Relationship = self.invert_relationship(relationship)
	if relationship_inverted in existing_relationships:
	return True

	return False

	for file_spdx_id in contained_files:
	try:
	contains_relationship = Relationship(
	spdx_element_id=package_spdx_id,
	relationship_type=RelationshipType.CONTAINS,
	related_spdx_element_id=file_spdx_id,
	)
	except ConstructorTypeErrors as err:
	logger.append(err.get_messages())
	continue
	if not self.check_if_relationship_exists(
	relationship=contains_relationship, existing_relationships=existing_relationships
	):
	contains_relationships.append(contains_relationship)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Slow for SBOMs with a large number of files + relationships #790

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

	existing_relationships_without_comments: List[Relationship] = self.get_all_relationships_without_comments(
	relationships
	)

Uh oh!

Slow for SBOMs with a large number of files + relationships #790

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions