Skip to content

Slow component searches - missing database index on lowercase searches #6064

@bergerst

Description

@bergerst

Current Behavior

Currently, there is only an index COMPONENT_PURL_IDX, which is created on the COMPONENT.PURL field.

However, when you do a search for components via a PURL, the SQL query searches for components based on lower("PURL") instead of the raw value.
https://github.com/DependencyTrack/dependency-track/blob/master/src/main/java/org/dependencytrack/persistence/ComponentQueryManager.java#L283

According to the database statistics, a component search triggers the following queries:

SELECT COUNT("A0"."ID") FROM "COMPONENT" "A0" WHERE LOWER("A0"."PURL") LIKE $1 ESCAPE $2;
SELECT $1 AS "DN_TYPE","A0"."AUTHORS","A0"."BLAKE2B_256","A0"."BLAKE2B_384","A0"."BLAKE2B_512","A0"."BLAKE3","A0"."CLASSIFIER","A0"."COPYRIGHT","A0"."CPE","A0"."DESCRIPTION","A0"."DIRECT_DEPENDENCIES","A0"."EXTENSION","A0"."EXTERNAL_REFERENCES","A0"."FILENAME","A0"."GROUP","A0"."ID","A0"."INTERNAL","A0"."LAST_RISKSCORE","A0"."LICENSE","A0"."LICENSE_EXPRESSION","A0"."LICENSE_URL","A0"."MD5","A0"."NAME","A0"."TEXT","B0"."ACTIVE","B0"."AUTHORS","B0"."CLASSIFIER","B0"."COLLECTION_LOGIC","B0"."COLLECTION_TAG","B0"."CPE","B0"."DESCRIPTION","B0"."DIRECT_DEPENDENCIES","B0"."EXTERNAL_REFERENCES","B0"."GROUP","B0"."ID","B0"."IS_LATEST","B0"."LAST_BOM_IMPORTED","B0"."LAST_BOM_IMPORTED_FORMAT","B0"."LAST_RISKSCORE","B0"."LAST_VULNERABILITY_ANALYSIS","B0"."MANUFACTURER","B0"."NAME","B0"."PUBLISHER","B0"."PURL","B0"."SUPPLIER","B0"."SWIDTAGID","B0"."UUID","B0"."VERSION","A0"."PUBLISHER","A0"."PURL","A0"."PURLCOORDINATES","C0"."ISCUSTOMLICENSE","C0"."FSFLIBRE","C0"."ID","C0"."LICENSEID","C0"."NAME","C0"."ISOSIAPPROVED","C0"."UUID","A0"."SHA1","A0"."SHA_256","A0"."SHA_384","A0"."SHA3_256","A0"."SHA3_384","A0"."SHA3_512","A0"."SHA_512","A0"."SUPPLIER","A0"."SWIDTAGID","A0"."UUID","A0"."VERSION","D0"."LICENSEID" AS "NUCORDER0" FROM "COMPONENT" "A0" INNER JOIN "PROJECT" "B0" ON "A0"."PROJECT_ID" = "B0"."ID" LEFT OUTER JOIN "LICENSE" "C0" ON "A0"."LICENSE_ID" = "C0"."ID" LEFT OUTER JOIN "LICENSE" "D0" ON "A0"."LICENSE_ID" = "D0"."ID" WHERE LOWER("A0"."PURL") LIKE $2 ESCAPE $3 ORDER BY "NUCORDER0" FETCH NEXT $4 ROWS ONLY

Adding this custom index helped reduce the query times from several seconds to 100 milliseconds:

CREATE INDEX xt_component_purl2 ON public."COMPONENT" USING gin (lower(("PURL")::text) gin_trgm_ops)

However, the index uses https://www.postgresql.org/docs/current/pgtrgm.html which needs to be enabled on the database server.

Steps to Reproduce

  1. Fill the database with test data
  2. Switch on query statistics for the database server (not sure how, our database guy did it)
  3. Trigger a query for a component by purl on /api/v1/component/identity
  4. Check the query time

Expected Behavior

The database indexes need to be changed/extended to also work on lowercase searches.

lower() is used on a lot of fields, for example also on group, name, version, CPE and SWID. And Component searches are probably not the only ones affected by the incorrect indexes.

Not sure if the existing indexes should be changed to lower() or if there are still VARCHAR searches somewhere in the code which do not use lower(). (.toLowerCase().matches(...))

Dependency-Track Version

4.14.1

Dependency-Track Distribution

Container Image

Database Server

PostgreSQL

Database Server Version

17.5

Browser

N/A

Checklist

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions