Commit 263bee2
committed
Support equivalent words in license detection #4190
Handle similar words in license detection by allowing multiple
"legalese words" to have the same token id.
Regenerate the tokens ids accordingly.
Convert Index.tokens_by_tid to a computed property, available on demand.
Convert tokens_by_tid to a dictionary from a list.
Ensure that all code relying on the tokens_by_tid is updated as needed.
All locations were used only for testing and debugging.
Deprecate all rules that are duplicated under this new regime, where
tokens like "license" and "licence" are not treated as identical.
Update test suite to test the detection of all deprecated licenses and
rules as a sanity check. A rule with "relevance" set to 0 is not tested
if deprecated, as some rules are deprecated because they are false
positive and should no longer be detected. Also improved the validation
and loading of rules relevance, including the case for zero relevance.
Update ambiguous or conflicting rules as needed.
In particular ensure that all rules in the style of "MIT or GPL"
without a GPL version are now reported consistently as:
"mit or gpl-1.0-plus"
Add new rules as needed to resolve failing tests and improve accuracy.
Improve deprecated support for rules and licenses, adding a new
"replaced_by" list attribute that lists the new expressions that must be
detected from scanning the deprecated license or rule text.
Reference: #4190
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>1 parent e830934 commit 263bee2
1,945 files changed
Lines changed: 16175 additions & 10593 deletions
File tree
- etc/scripts/licenses
- src
- formattedcode
- licensedcode
- data
- licenses
- rules
- packagedcode
- scancode
- tests
- formattedcode/data
- common
- csv/livescan
- yaml
- licensedcode
- data
- cache/data/rules
- datadriven
- external
- fossology-licenses
- fossology-tests
- BSD
- Dual-license
- LGPL
- MirOS
- Non-profit
- UnclassifiedLicense
- glc
- slic-tests
- 6
- lic1
- lic2
- github_keys
- lic3
- lic4
- unknown
- licenses_reference_reporting
- plugin_license
- license_reference
- mock_index
- packagedcode
- data
- debian/copyright
- debian-2019-11-15/main
- a
- appstream
- asterisk
- d
- devscripts
- dovecot
- g
- ghostscript
- glib2.0
- n
- ncbi-tools6
- nextcloud-desktop
- o/open-infrastructure-compute-tools
- p/perl
- s/slirp4netns
- instance
- license_detection
- reference-at-manifest
- reference-to-package
- m2
- c3p0/c3p0/0.9.0.4
- depman
- javassist/javassist/3.4.GA
- javax/persistence/persistence-api/1.0
- jboss/jboss-archive-browsing/5.0.0alpha-200607201-119
- org
- apache
- commons/commons-jaxrs/1.22
- maven/plugins/maven-dependency-plugin/2.0
- codehaus/mojo/maven-buildnumber-plugin/0.9.6
- hibernate
- hibernate-annotations
- 3.2.1.ga
- 3.3.1.GA
- hibernate-commons-annotations/3.0.0.ga
- hibernate-entitymanager
- 3.2.1.ga
- 3.3.2.GA
- hibernate
- 3.2.1.ga
- 3.2.6.ga
- maven2/logback-access
- plugin
- pypi
- metadata/v10
- source-package
- scancode
- summarycode/data
- classify
- plugin_consolidate
- score
- summary
- conflicting_license_categories
- end-2-end
- holders
- license_ambiguity
- multiple_package_data
- summary_without_holder
- with_package_data
- without_package_data
- tallies
- end-2-end
- full_tallies
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
149 | 149 | | |
150 | 150 | | |
151 | 151 | | |
152 | | - | |
| 152 | + | |
153 | 153 | | |
154 | 154 | | |
155 | 155 | | |
| |||
159 | 159 | | |
160 | 160 | | |
161 | 161 | | |
162 | | - | |
| 162 | + | |
163 | 163 | | |
164 | 164 | | |
165 | 165 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
23 | 24 | | |
24 | 25 | | |
25 | | - | |
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
276 | 276 | | |
277 | 277 | | |
278 | 278 | | |
279 | | - | |
| 279 | + | |
280 | 280 | | |
281 | 281 | | |
282 | 282 | | |
| |||
316 | 316 | | |
317 | 317 | | |
318 | 318 | | |
319 | | - | |
320 | 319 | | |
321 | 320 | | |
322 | 321 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
4 | 12 | | |
5 | 13 | | |
6 | 14 | | |
| |||
Lines changed: 4 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
3 | 6 | | |
4 | 7 | | |
5 | 8 | | |
6 | 9 | | |
7 | | - | |
8 | 10 | | |
9 | 11 | | |
10 | | - | |
| 12 | + | |
11 | 13 | | |
12 | 14 | | |
13 | 15 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
3 | 6 | | |
4 | 7 | | |
5 | 8 | | |
6 | 9 | | |
7 | | - | |
8 | | - | |
9 | 10 | | |
| 11 | + | |
10 | 12 | | |
11 | 13 | | |
12 | | - | |
13 | 14 | | |
14 | 15 | | |
15 | 16 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
| 4 | + | |
| 5 | + | |
4 | 6 | | |
5 | 7 | | |
6 | 8 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
3 | 6 | | |
4 | 7 | | |
5 | | - | |
6 | 8 | | |
7 | 9 | | |
8 | 10 | | |
| |||
Lines changed: 3 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
3 | 6 | | |
4 | 7 | | |
5 | 8 | | |
6 | 9 | | |
7 | 10 | | |
8 | 11 | | |
9 | | - | |
10 | 12 | | |
11 | 13 | | |
12 | 14 | | |
| |||
Lines changed: 3 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
3 | 6 | | |
4 | 7 | | |
5 | 8 | | |
| |||
8 | 11 | | |
9 | 12 | | |
10 | 13 | | |
11 | | - | |
12 | 14 | | |
13 | 15 | | |
14 | 16 | | |
| |||
0 commit comments