You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Remove trailing whitespaces
- Ensure one blank line before and after headings/lists
- Enforce one H1 heading in a document, the rest will be H2, H3, etc.
- Standardize the use of bullet markup (using all `-`, instead of mixing `-` and `*`)
Signed-off-by: Arthit Suriyawongkul <arthit@gmail.com>
Copy file name to clipboardExpand all lines: DOCUMENTATION.md
+19-3Lines changed: 19 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,22 +1,27 @@
1
1
# Code architecture documentation
2
2
3
3
## Package Overview
4
+
4
5
Beneath the top-level package `spdx_tools` you will find three sub-packages:
6
+
5
7
-`spdx`, which contains the code to create, parse, write and validate SPDX documents of versions 2.2 and 2.3
6
8
-`spdx3`, which will contain the same feature set for versions 3.x once they are released
7
9
-`common`, which contains code that is shared between the different versions, such as type-checking and `spdx_licensing`.
8
10
9
11
## `spdx`
12
+
10
13
The `spdx` package contains the code dealing with SPDX-2 documents.
11
14
The subpackages serve the purpose to divide the code into logically independent chunks. Shared code can be found in the top-level modules here.
12
15
`model`, `parser`, `validation` and `writer` constitute the four main components of this library and are further described below.
13
16
`clitools` serves as the entrypoint for the command `pyspdxtools`.
14
17
`jsonschema` and `rdfschema` contain code specific to the corresponding serialization format.
15
18
16
19
### `model`
20
+
17
21
The internal data model closely follows the [official SPDX-2.3 specification](https://spdx.github.io/spdx-spec/v2.3/).
18
22
19
23
Entrypoint to the model is the `Document` class, which has the following attributes:
24
+
20
25
-`creation_info`: a single instance of the `CreationInfo` class
21
26
-`packages`: a list of `Package` objects
22
27
-`files`: a list of `File` objects
@@ -35,6 +40,7 @@ A custom extension of the `@dataclass` annotation is used that is called `@datac
35
40
Apart from all the usual `dataclass` functionality, this implements fields of a class as properties with their own getter and setter methods.
36
41
This is used in particular to implement type checking when properties are set.
37
42
Source of truth for these checks are the attribute definitions at the start of the respective class that must specify the correct type hint.
43
+
38
44
The `beartype` library is used to check type conformity (`typeguard` was used in the past but has been replaced since due to performance issues).
39
45
In case of a type mismatch a `TypeError` is raised. To ensure that all possible type errors are found during the construction of an object,
40
46
a custom `__init__()` that calls `check_types_and_set_values()` is part of every class.
@@ -43,26 +49,31 @@ This function tries to set all values provided by the constructor and collects a
43
49
For the SPDX values `NONE` and `NOASSERTION` the classes `SpdxNone` and `SpdxNoAssertion` are used, respectively. Both can be instantiated without any arguments.
44
50
45
51
### `parser`
52
+
46
53
The parsing and writing modules are split into subpackages according to the serialization formats: `json`, `yaml`, `xml`, `tagvalue` and `rdf`.
47
54
As the first three share the same tree structure that can be parsed into a dictionary, their shared logic is contained in the `jsonlikedict` package.
48
55
One overarching concept of all parsers is the goal of dealing with parsing errors (like faulty types or missing mandatory fields) as long as possible before failing.
49
56
Thus, the `SPDXParsingError` that is finally raised collects as much information as possible about all parsing errors that occurred.
50
57
51
58
#### `tagvalue`
59
+
52
60
Since Tag-Value is an SPDX-specific format, there exist no readily available parsers for it.
53
-
This library implements its own deserialization code using the `ply` library's `lex` module for lexing and the `yacc` module for parsing.
61
+
This library implements its own deserialization code using the `ply` library's `lex` module for lexing and the `yacc` module for parsing.
54
62
55
63
#### `rdf`
64
+
56
65
The `rdflib` library is used to deserialize RDF graphs from XML format.
57
-
The graph is then being parsed and translated into the internal data model.
66
+
The graph is then being parsed and translated into the internal data model.
58
67
59
68
#### `json`, `yaml`, `xml`
69
+
60
70
In a first step, all three of JSON, YAML and XML formats are deserialized into a dictionary representing their tree structure.
61
71
This is achieved via the `json`, `yaml` and `xmltodict` packages, respectively.
62
72
Special note has to be taken in the XML case which does not support lists and numbers.
63
73
The logic concerning the translation from these dicts to the internal data model can be found in the `jsonlikedict` package.
64
74
65
75
### `writer`
76
+
66
77
For serialization purposes, only non-null fields are written out.
67
78
All writers expect a valid SPDX document from the internal model as input.
68
79
To ensure this is actually the case, the standard behaviour of every writer function is to call validation before the writing process.
@@ -71,18 +82,21 @@ Also by default, all list properties in the model are scanned for duplicates whi
71
82
This can be disabled by setting the `drop_duplicates` boolean to false.
72
83
73
84
#### `tagvalue`
85
+
74
86
The ordering of the tags follows the [example in the official specification](https://github.com/spdx/spdx-spec/blob/development/v2.3.1/examples/SPDXTagExample-v2.3.spdx).
75
87
76
88
#### `rdf`
89
+
77
90
The RDF graph is constructed from the internal data model and serialized to XML format afterward, using the `rdflib` library.
78
91
79
92
#### `json`, `yaml`, `xml`
93
+
80
94
As all three of JSON, YAML and XML formats share the same tree structure, the first step is to generate the dictionary representing that tree.
81
95
This is achieved by the `DocumentConverter` class in the `jsonschema` package.
82
96
Subsequently, the dictionary is serialized using the `json`, `yaml` and `xmltodict` packages, respectively.
83
97
84
-
85
98
### `validation`
99
+
86
100
The `validation` package takes care of all nonconformities with the SPDX specification that are not due to incorrect typing.
87
101
This mainly includes checks for correctly formatted strings or the actual existence of references SPDXIDs.
88
102
Entrypoint is the `document_validator` module with the `validate_full_spdx_document()` function.
@@ -93,6 +107,7 @@ Validation and reference checking of SPDXIDs (and possibly external document ref
93
107
For the validation of license expressions we utilise the `license-expression` library's `validate` and `parse` functions, which take care of checking license symbols against the [SPDX license list](https://spdx.org/licenses/).
94
108
95
109
Invalidities are captured in instances of a custom `ValidationMessage` class. This has two attributes:
110
+
96
111
-`validation_message` is a string that describes the actual problem
97
112
-`validation_context` is a `ValidationContext` object that helps to pinpoint the source of the problem by providing the faulty element's SPDXID (if it has one), the parent SPDXID (if that is known), the element's type and finally the full element itself.
98
113
It is left open to the implementer which of this information to use in the following evaluation of the validation process.
@@ -101,6 +116,7 @@ Every validation function returns a list of `ValidationMessage` objects, which a
101
116
That is, if an empty list is returned, the document is valid.
102
117
103
118
## `spdx3`
119
+
104
120
Due to the SPDX-3 model still being in development, this package is still a work in progress.
105
121
However, as the basic building blocks of parsing, writing, creation and validation are still important in the new version,
106
122
the `spdx3` package is planned to be structured similarly to the `spdx` package.
0 commit comments