You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: website/docs/getting_started/getting_started-software-identification.md
+50-7Lines changed: 50 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,6 +14,8 @@ Platform Enumeration) and SWID (SoftWare IDentification). Neither has been
14
14
successful and neither was suitable for open source software which now
15
15
represents approximately 80% of software in use according to most surveys.
16
16
17
+
## Package-URL
18
+
17
19
The AboutCode team identified this problem in 2018 in the context of working
18
20
on our ScanCode and VulnerableCode projects. The solution was and is the PURL (Package-URL) specification which has become the most widely used software
19
21
identifier for open source software. PURL is now an Ecma standard - [ECMA-427](https://ecma-tc54.github.io/ECMA-427/), and it is on a fast track to become
@@ -27,6 +29,10 @@ specification which will be submitted to Ecma as a standard in 2026.
27
29
28
30
See the [Package-URL website](https://package-url.github.io/www.packageurl.org/) for more information about PURL and VERS.
29
31
32
+
See the Package-URL (PURL) projects section of the Home page for more
33
+
information about AboutCode tools that provide PURL- and VERS-specific
34
+
capabilities.
35
+
30
36
## Identify software packages and components
31
37
For the basic use case of identifying software packages and components,
32
38
AboutCode offers the DejaCode and ScanCode tools, the PURLDB database and the PURL standard.
@@ -76,7 +82,7 @@ Dependencies (from package manifest files). The scan data also includes
76
82
detailed information about image layers and their file content.
77
83
78
84
If you conclude that the ScanCode.io inventory is accurate, you can
79
-
export the data CycloneDX or SPDX SBOM format, or in JSON or XLSX format
85
+
export the data in CycloneDX or SPDX SBOM format, or in JSON or XLSX format
80
86
for use in another application.
81
87
82
88
If you need to update or enhance the scan data before you produce an SBOM, DejaCode provides several options.
@@ -97,17 +103,54 @@ then:
97
103
used in the ScanCode.io `analyze_docker_image` pipeline for the layer analysis,
98
104
but you can also use it as a command line utility.
99
105
106
+
## Consume or produce SBOMs
107
+
The EU CRA (Cyber Resilience Act) and other regulatory initiatives have
108
+
dramatically raised the importance of SBOMs (Software Bills of Materials) for
109
+
compliance with security risk management laws and regulations. A key challenge in using SBOMs is the reliable identification of software packages so that someone else in your software supply chain (upstream or downstream) will recognize the same package identity. The PURL (Package-URL) standard [ECMA-427](https://ecma-tc54.github.io/ECMA-427/) provides the most popular solution.
110
+
111
+
**DejaCode** and **ScanCode.io** both provide full capabilities to import or export SBOMs in CycloneDX or SPDX format using PURL as the standard software
112
+
identifier.
113
+
114
+
## Match binaries to source
115
+
One of the most difficult software identification tasks is to match the "binary" files that you distribute or deploy (on a device or the cloud) to the corresponding "source" files from your development/build systems. In the
116
+
AboutCode community we consider binary-source matching to be a subset of the
117
+
much larger domain of matching "deploy" files to "devel" files. This matching challenge includes:
118
+
119
+
-[ScanCode.io](https://scancodeio.readthedocs.io/en/latest/) supports "deploy-to-devel" matching with the `map_deploy_to_develop` pipeline.
120
+
This pipeline currently handles:
121
+
122
+
- Matching Linux ELF, Windows, MacOS or Rust binaries to source
123
+
- Matching Go binaries to source
124
+
- Matching Java `jar` or `class` files to corresponding Java, Kotlin or
125
+
Scala source files
126
+
- Matching minified JavaScript to corresponding TS or JS files
127
+
- And other use cases
128
+
129
+
-[MatchCode Toolkit](https://github.com/aboutcode-org/matchcode-toolkit/blob/main/README.rst) is a Python library that provides the file and directory fingerprinting functionality for ScanCode Toolkit and ScanCode.io using
130
+
the HaloHash algorithm. You can use the **MatchCode Toolkit** as a library.
131
+
132
+
- ScanCode uses several AboutCode libraries to analyze "deploy" files
133
+
including:
134
+
-[binary-inspector](https://github.com/aboutcode-org/binary-inspector/blob/main/README.rst) extracts symbols from binaries in ELF, Mach-O, WinPe and
135
+
other formats
136
+
-[elf-inspector](https://github.com/aboutcode-org/elf-inspector/blob/main/README.rst) collects data from ELF binaries
137
+
-[go-inspector](https://github.com/aboutcode-org/go-inspector/blob/main/README.rst) extracts dependencies and symbols from Go binaries
138
+
-[rust-inspector](https://github.com/aboutcode-org/rust-inspector/blob/main/README.rst) extracts dependencies and symbols from Rust binaries
139
+
-[source-inspector]https://github.com/aboutcode-org/source-inspector/blob/main/README.rst() collects code symbols, strings and comments from source files
140
+
141
+
These are all Python utilities that can also be used independently.
100
142
101
-
## Match binaries to sources
102
-
103
-
143
+
## Identify software dependencies
144
+
There are many use cases that include identification of package software
0 commit comments