Skip to content

Commit 49bf1f9

Browse files
committed
Update getting_started-software-identification.md
Added Analyze Containers section Signed-off-by: Michael Herzog <mjherzog@nexb.com>
1 parent f5b508b commit 49bf1f9

1 file changed

Lines changed: 59 additions & 11 deletions

File tree

website/docs/getting_started/getting_started-software-identification.md

Lines changed: 59 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Commercial software suppliers, of course, each have a set of software
1010
identifiers for their own products but these only work within their
1111
particular customer base. There have been two prominent attempts to
1212
standardize software identifiers for proprietary software - CPE (Common
13-
Platform Ennumeration) and SWID (SoftWare IDentification). Neither has been
13+
Platform Enumeration) and SWID (SoftWare IDentification). Neither has been
1414
successful and neither was suitable for open source software which now
1515
represents approximately 80% of software in use according to most surveys.
1616

@@ -22,11 +22,10 @@ an ISO standard.
2222
Our team also identified a related problem - after you have a standard way
2323
to identify software packages, what is a standard way to compare software
2424
package versions to determine whether a reported vulnerability affects the
25-
version that you use. Our solution is the VERS (VErsion Range Specifier)
25+
version that you use? Our solution is the VERS (VErsion Range Specifier)
2626
specification which will be submitted to Ecma as a standard in 2026.
2727

28-
See https://package-url.github.io/www.packageurl.org/ for more
29-
information about PURL and VERS.
28+
See the [Package-URL website](https://package-url.github.io/www.packageurl.org/) for more information about PURL and VERS.
3029

3130
## Identify software packages and components
3231
For the basic use case of identifying software packages and components,
@@ -41,27 +40,76 @@ container.
4140

4241
- [ScanCode.io](https://scancodeio.readthedocs.io/en/latest/) is an
4342
application to scan codebases, packages, containers or other software collections. ScanCode.io uses a specific pipeline for scanning or analyzing
44-
each software target and provides a database with UI and API access to your scans. ScanCode.io is usually a good place to get started in the AboutCode ecosystem. You normally run ScanCode.io as a Docker container.
43+
each software target and provides a database with UI and API access to your scans. ScanCode.io is usually a good place to get started in the AboutCode ecosystem. You normally run ScanCode.io as a Docker container. You can export
44+
scan data in many formats including: JSON, XLSX, CycloneDX SBOM, SPDX SBOM, or
45+
an attribution-notice.
4546

4647
- [ScanCode Toolkit](https://scancode-toolkit.readthedocs.io/en/stable/) is a
4748
library (and command line utility) that provides the scanning engine for
4849
ScanCode.io. Its primary functions are to identify:
49-
- Software licenses based on matching license notices and texts to the
50-
[ScanCode LicenseDB](https://scancode-licensedb.aboutcode.org/help.html)
50+
- Software licenses based on matching license notices and texts to ScanCode
51+
license detection rules
5152
- Software origin based on copyright or author notices, email addresses, URLs and other clues
52-
- Software codebase structure including directories and files with exentensive file information such as size, MIME type, file type, programming language,
53+
- Software codebase structure including directories and files with extensive file information such as size, MIME type, file type, programming language,
5354
checksums (MD5,SHA1,SHA256,SHA512) and more
5455

56+
- [ScanCode LicenseDB](https://scancode-licensedb.aboutcode.org/index.html)
57+
provides license text and metadata for 2,470 open source and other third-party
58+
licenses (and growing). Each license has an SPDX license identifier using the `Licenseref-scancode` namespace for licenses that are not yet included in the
59+
SPDX License List.
60+
5561
- [PURLDB](https://purldb.readthedocs.io/en/stable/) provides tools to create and manage a database of package metadata keyed by PURL. You can use PURLDB
56-
data via API to enrich your package and SBOM data in DejaCode, ScanCode.io.,
62+
data via API to enrich your package and SBOM data in DejaCode, ScanCode.io,
5763
or your own application. The AboutCode team also currently hosts a public [PURLDB](https://public.purldb.io/api/) service with REST API.
5864

65+
## Analyze containers
66+
The analysis of containers to produce inventories or SBOMs for the software
67+
contents has become a very common and high priority task due to the high and
68+
increasing volume of software deployed on containers and the large volume of
69+
software deployed in most containers. For this use case, the primary AboutCode
70+
tools and data are:
71+
72+
- [ScanCode.io](https://scancodeio.readthedocs.io/en/latest/) provides the
73+
`analyze_docker_image` pipeline for container analysis. This will produce a
74+
software inventory for Resources (all files), Packages (package metadata),
75+
Dependencies (from package manifest files). The scan data also includes
76+
detailed information about image layers and their file content.
77+
78+
If you conclude that the ScanCode.io inventory is accurate, you can
79+
export the data CycloneDX or SPDX SBOM format, or in JSON or XLSX format
80+
for use in another application.
81+
82+
If you need to update or enhance the scan data before you produce an SBOM, DejaCode provides several options.
83+
84+
- [DejaCode](https://dejacode.readthedocs.io/en/latest/) is highly integrated
85+
with ScanCode so that you can easily import ScanCode scan data from ScanCode
86+
Tookit or ScanCode.io into DejaCode as a **Product**. In DejaCode, you can
87+
then:
88+
89+
- Enrich the package data from PURLDB
90+
- Apply your license usage policies
91+
- Apply your vulnerability risk policies
92+
- Update the Product package and component data as needed
93+
- Generate an SBOM in CycloneDX or SPDX format
94+
- Generate an attribution notice
95+
96+
- [container-inspector](https://github.com/aboutcode-org/container-inspector/blob/main/README.rst) is a static analysis tool to analyze the structure of software components in a container image. container-inspector is
97+
used in the ScanCode.io `analyze_docker_image` pipeline for the layer analysis,
98+
but you can also use it as a command line utility.
99+
100+
101+
## Match binaries to sources
102+
103+
104+
59105

60106
## Identify software dependencies
107+
inspectors
108+
61109

62110
## Consume or produce SBOMs
63111

64-
## Analyze containers
65112

66-
## Match binaries to sources
113+
114+
67115

0 commit comments

Comments
 (0)