Skip to content

Commit fa63f1a

Browse files
Improve package scan performance (#4606)
* Simplify GemfileHandler path patterns Signed-off-by: Ayan Sinha Mahapatra <asmahapatra@aboutcode.org> * Add multiregex as a dependency Reference: https://github.com/Quantco/multiregex Signed-off-by: Ayan Sinha Mahapatra <asmahapatra@aboutcode.org> * Add initial multiregex implementation Use multiregex to use a cached regex path patterns and datafile handlers mapping to detect package datafiles faster. Reference: #4064 Reference: #4061 Signed-off-by: Ayan Sinha Mahapatra <asmahapatra@aboutcode.org> * Add minimal tests for package cache Signed-off-by: Ayan Sinha Mahapatra <asmahapatra@aboutcode.org> * Cache multiregex matchers instead of patterns Signed-off-by: Ayan Sinha Mahapatra <asmahapatra@aboutcode.org> * Restore binary package manifest scanning Signed-off-by: Ayan Sinha Mahapatra <asmahapatra@aboutcode.org> * Only scan for bianry packages optionally Introduce a new option --binary-packages which looks for package/dependency data in binaries. Signed-off-by: Ayan Sinha Mahapatra <asmahapatra@aboutcode.org> * Do not setup license index on --package-only We do not need the license index in a --package-only scan as this is designed to do a fast package detection only scan which skips the license detection. As license index loading takes a couple seconds in each case, this makes the package only scan much faster. Signed-off-by: Ayan Sinha Mahapatra <asmahapatra@aboutcode.org> * Add a new console script to build the package patterns cache Signed-off-by: Ayan Sinha Mahapatra <asmahapatra@aboutcode.org> * Address review feedback Signed-off-by: Ayan Sinha Mahapatra <asmahapatra@aboutcode.org> * Remove deprecated macos runners Signed-off-by: Ayan Sinha Mahapatra <asmahapatra@aboutcode.org> * Fix test failures Signed-off-by: Ayan Sinha Mahapatra <asmahapatra@aboutcode.org> * Fix misc test failures Signed-off-by: Ayan Sinha Mahapatra <asmahapatra@aboutcode.org> * Update typecode to latest v30.1.0 Signed-off-by: Ayan Sinha Mahapatra <asmahapatra@aboutcode.org> * Avoid using close methods on pdfParser objects Signed-off-by: Ayan Sinha Mahapatra <asmahapatra@aboutcode.org> --------- Signed-off-by: Ayan Sinha Mahapatra <asmahapatra@aboutcode.org>
1 parent b2c4bd0 commit fa63f1a

33 files changed

+554
-106
lines changed

Dockerfile

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,9 +38,11 @@ WORKDIR /scancode-toolkit
3838
COPY . /scancode-toolkit
3939

4040
# Initial configuration using ./configure, scancode-reindex-licenses to build
41-
# the base license index
41+
# the base license index and scancode-reindex-package-patterns to build the
42+
# package patterns cache
4243
RUN ./configure \
43-
&& ./venv/bin/scancode-reindex-licenses
44+
&& ./venv/bin/scancode-reindex-licenses \
45+
&& ./venv/bin/scancode-reindex-package-patterns
4446

4547
# Add scancode to path
4648
ENV PATH=/scancode-toolkit:$PATH

README.rst

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,15 @@
22
ScanCode Toolkit
33
================
44

5-
ScanCode Toolkit is a set of code scanning tools that detect the origin (copyrights), license and vulnerabilities of code, packages and dependencies in a codebase. ScanCode Toolkit is an `AboutCode project <https://aboutcode.org>`_.
5+
ScanCode Toolkit is a set of code scanning tools that detect the origin (copyrights), license and vulnerabilities of code,
6+
packages and dependencies in a codebase. ScanCode Toolkit is an `AboutCode project <https://aboutcode.org>`_.
67

78
Why Use ScanCode Toolkit?
89
=========================
910

10-
ScanCode Toolkit is the leading tool in scanning depth and accuracy, used by hundreds of software teams. You can use ScanCode Toolkit as a command line tool or as a library.
11+
ScanCode Toolkit is the leading tool in scanning depth and accuracy,
12+
used by hundreds of software teams. You can use ScanCode Toolkit
13+
as a command line tool or as a library.
1114

1215
Getting Started
1316
===============
@@ -84,7 +87,7 @@ Benefits of ScanCode
8487
Support
8588
=======
8689

87-
If you have a specific problem, suggestion or bug, please submit a
90+
If you have a specific problem, suggestion or bug, please submit a
8891
`GitHub issue <https://github.com/aboutcode-org/scancode-toolkit/issues>`_.
8992

9093
For quick questions or socializing, join the AboutCode community discussions on `Slack <https://join.slack.com/t/aboutcode-org/shared_invite/zt-3li3bfs78-mmtKG0Qhv~G2dSlNCZW2pA>`_.

azure-pipelines.yml

Lines changed: 0 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -145,14 +145,6 @@ jobs:
145145
test_suites:
146146
all: venv/bin/pytest -n 2 -vvs tests/scancode/test_cli.py --reruns 2
147147

148-
- template: etc/ci/azure-posix.yml
149-
parameters:
150-
job_name: macos13_cpython
151-
image_name: macOS-13
152-
python_versions: ['3.10', '3.11', '3.12', '3.13']
153-
test_suites:
154-
all: venv/bin/pytest -n 2 -vvs tests/scancode/test_cli.py --reruns 2
155-
156148
- template: etc/ci/azure-win.yml
157149
parameters:
158150
job_name: win2025_cpython
@@ -220,14 +212,6 @@ jobs:
220212
test_suites:
221213
all: venv/bin/pip install --upgrade-strategy eager --force-reinstall --upgrade -e .[testing] && venv/bin/pytest -n 2 -vvs tests/scancode/test_cli.py
222214

223-
- template: etc/ci/azure-posix.yml
224-
parameters:
225-
job_name: macos13_cpython_latest_from_pip
226-
image_name: macos-13
227-
python_versions: ['3.10', '3.11', '3.12', '3.13']
228-
test_suites:
229-
all: venv/bin/pip install --upgrade-strategy eager --force-reinstall --upgrade -e .[testing] && venv/bin/pytest -n 2 -vvs tests/scancode/test_cli.py
230-
231215
- template: etc/ci/azure-win.yml
232216
parameters:
233217
job_name: win2019_cpython_latest_from_pip

docs/source/rst-snippets/cli-basic-options.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,12 @@ documenting a program's options. For example:
3333
--system-package Scan ``<input>`` for installed system package
3434
databases.
3535

36+
--package-in-compiled Scan compiled executable binaries such as ELF,
37+
WinpE and Mach-O files, looking for structured
38+
package and dependency metadata. Note that looking for
39+
packages in binaries makes package scan slower.
40+
Currently supported compiled binaries: Go, Rust.
41+
3642
--package-only Faster package scan, scanning ``<input>`` for
3743
system and application packages, only for package
3844
metadata. This option is skipping

etc/release/scancode-create-pypi-wheel.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ set -e
1919

2020
./configure --dev
2121
venv/bin/scancode-reindex-licenses
22+
venv/bin/scancode-reindex-package-patterns
2223

2324
python_tag=$( python -c "import platform;print(f\"cp{''.join(platform.python_version_tuple()[:2])}\")" )
2425

etc/release/scancode-create-release-app-linux.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ cp -r etc/thirdparty $release_dir/etc
6565
# Build the wheel
6666
./configure --dev
6767
venv/bin/scancode-reindex-licenses
68+
venv/bin/scancode-reindex-package-patterns
6869
venv/bin/python setup.py --quiet bdist_wheel --python-tag cp$python_version
6970

7071
cp -r \

etc/release/scancode-create-release-app-macos.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@ cp -r etc/thirdparty $release_dir/etc
6363
# Build the wheel
6464
./configure --dev
6565
venv/bin/scancode-reindex-licenses
66+
venv/bin/scancode-reindex-package-patterns
6667
venv/bin/python setup.py --quiet bdist_wheel --python-tag cp$python_version
6768

6869
cp -r \

etc/release/scancode-create-release-app-windows.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ cp -r etc/thirdparty $release_dir/etc
6262
# Build the wheel
6363
./configure --dev
6464
venv/bin/scancode-reindex-licenses
65+
venv/bin/scancode-reindex-package-patterns
6566
venv/bin/python setup.py --quiet bdist_wheel --python-tag cp$python_version
6667

6768
cp -r \

requirements.txt

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ charset-normalizer==3.4.2
1111
click==8.3.0;python_version>='3.10'
1212
click==8.1.7;python_version<'3.10'
1313
colorama==0.4.6
14-
commoncode==32.4.0
14+
commoncode==32.4.2
1515
construct==2.10.70
1616
container-inspector==33.0.0
1717
cryptography==45.0.4
@@ -40,6 +40,7 @@ license-expression==30.4.4
4040
lxml==5.4.0
4141
MarkupSafe==3.0.2
4242
more-itertools==10.7.0
43+
multiregex==2.0.3
4344
normality==2.6.1
4445
packageurl-python==0.17.1
4546
packaging==25.0
@@ -70,7 +71,7 @@ soupsieve==2.7
7071
spdx-tools==0.8.2
7172
text-unidecode==1.3
7273
tomli==2.3.0
73-
typecode==30.0.2
74+
typecode==30.1.0
7475
typecode-libmagic==5.39.210531
7576
typing-extensions==4.14.0
7677
uritools==5.0.0

setup-mini.cfg

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,7 @@ install_requires =
8989
license_expression >= 30.4.4
9090
lxml >= 5.4.0
9191
MarkupSafe >= 2.1.2
92+
multiregex >= 2.0.3
9293
normality <= 2.6.1
9394
packageurl_python >= 0.9.0
9495
packvers >= 21.0.0
@@ -112,7 +113,7 @@ install_requires =
112113
tomli >= 2; python_version < "3.11"
113114
urlpy
114115
xmltodict >= 0.11.0
115-
typecode >= 30.0.1
116+
typecode >= 30.1.0
116117
# typecode[full] >= 30.0.1
117118
# extractcode[full] >= 31.0.0
118119

@@ -123,7 +124,7 @@ where = src
123124

124125
[options.extras_require]
125126
full =
126-
typecode[full] >= 30.0.0
127+
typecode[full] >= 30.1.0
127128
extractcode[full] >= 31.0.0
128129

129130
dev =
@@ -156,6 +157,7 @@ packages =
156157
console_scripts =
157158
scancode = scancode.cli:scancode
158159
scancode-reindex-licenses = licensedcode.reindex:reindex_licenses
160+
scancode-reindex-package-patterns = packagedcode.cache:cache_package_patterns
159161
scancode-license-data = licensedcode.license_db:dump_scancode_license_data
160162
regen-package-docs = packagedcode.regen_package_docs:regen_package_docs
161163
add-required-phrases = licensedcode.required_phrases:add_required_phrases

0 commit comments

Comments
 (0)