You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+13Lines changed: 13 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -95,6 +95,19 @@ We recognize the following properties:
95
95
96
96
We use different supervised classifiers, header analysis, regular expressions, the GitHub/Gitlab API to retrieve all these fields (more than one technique may be used for each field) and language specific metadata parsers (e.g., for package files). Each extraction records its provenance, with the confidence and technique used on each step. For more information check the [output format description](https://somef.readthedocs.io/en/latest/output/)
97
97
98
+
### Confidence values in header analysis
99
+
100
+
When extracting metadata using header analysis, SOMEF assigns a confidence value based on the length
101
+
of the header. Shorter headers are more likely to be a good fit for a category, while longer headers
102
+
may contain additional context that makes the classification less reliable:
103
+
104
+
| Header length | Confidence |
105
+
|---------------|------------|
106
+
| 1–3 words | 1.0 |
107
+
| 4–6 words | 0.8 |
108
+
| 7–10 words | 0.5 |
109
+
| 11+ words | 0.1 |
110
+
98
111
## Documentation
99
112
100
113
See full documentation at [https://somef.readthedocs.io/en/latest/](https://somef.readthedocs.io/en/latest/)
Copy file name to clipboardExpand all lines: docs/output.md
+14Lines changed: 14 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -133,6 +133,19 @@ The following table summarized the properties used to describe a `category`:
133
133
|**source**| No | Url | URL of the source file used for the extraction. |
134
134
|**technique**| Yes | String | Technique used for the extraction. One of the following list: Supervised classification, header analysis, regular expression, GitHub API, File exploration, Code parsing |
135
135
136
+
### Confidence values in header analysis
137
+
138
+
When extracting metadata using header analysis, SOMEF assigns a confidence value based on the length
139
+
of the header. Shorter headers are more likely to be a good fit for a category, while longer headers
140
+
may contain additional context that makes the classification less reliable:
141
+
142
+
| Header length | Confidence |
143
+
|---------------|------------|
144
+
| 1–3 words | 1.0 |
145
+
| 4–6 words | 0.8 |
146
+
| 7–10 words | 0.5 |
147
+
| 11+ words | 0.1 |
148
+
136
149
### Result
137
150
Field returning the extracted output from the code repository. An example can be seen below for a citation found in BibteX format in a README file of a code repository:
138
151
@@ -446,6 +459,7 @@ The table below summarizes the mapping between the SOMEF internal JSON structure
446
459
|`logo`|`logo`| Project logo URL |
447
460
|`maintainer`|`maintainer`| Project maintainers |
448
461
|`name`|`name`| Software name |
462
+
|`schema:owner`|`owner`| Software owner |
449
463
|`programmingLanguage`|`programming_languages`| Languages used |
450
464
|`readme`|`readme_url`| README file URL |
451
465
|`referencePublication`|`citation` (Papers) || References to the main publication associated with this software component (as per author preference) *1*|
0 commit comments