| title | Contribute to Parsers |
|---|---|
| description | How to contribute to parsers |
| draft | false |
| weight | 1 |
All commands assume that you're located at the root of the django-DefectDojo cloned repo.
- You have forked https://github.com/DefectDojo/django-DefectDojo and cloned locally.
- Checkout
devand make sure you're up to date with the latest changes. - It's advised that you create a dedicated branch for your development, such as
git checkout -b parser-name.
It is easiest to use the docker compose deployment as it has hot-reload capbility for uWSGI. Set up your environment to use the dev environment:
$ docker/setEnv.sh dev
Please have a look at DOCKER.md for more details.
You will want to build your docker images locally, and eventually pass in your local user's uid to be able to write to the image (handy for database migration files). Assuming your user's uid is 1000, then:
{{< highlight bash >}} $ docker compose build --build-arg uid=1000 {{< /highlight >}}
| File | Purpose |
|---|---|
dojo/tools/<parser_dir>/__init__.py |
Empty file for class initialization |
dojo/tools/<parser_dir>/parser.py |
The meat. This is where you write your actual parser. The class name must be the Python module name without underscores plus Parser. Example: When the name of the Python module is dependency_check, the class name shall be DependencyCheckParser |
unittests/scans/<parser_dir>/{many_vulns,no_vuln,one_vuln}.json |
Sample files containing meaningful data for unit tests. The minimal set. |
unittests/tools/test_<parser_name>_parser.py |
Unit tests of the parser. |
dojo/settings/settings.dist.py |
If you want to use a modern hashcode based deduplication algorithm |
docs/content/en/connecting_your_tools/parsers/<file/api>/<parser_file>.md |
Documentation, what kind of file format is required and how it should be obtained |
Parsers are loaded dynamicaly with a factory pattern. To have your parser loaded and works correctly, you need to implement the contract.
- your parser MUST be in a sub-module of module
dojo.tools- ex:
dojo.tools.my_tool.parsermodule
- ex:
- your parser MUST be a class in this sub-module.
- ex:
dojo.tools.my_tool.parser.MyToolParser
- ex:
- The name of this class MUST be the Python module name without underscores and with
Parsersuffix.- ex:
dojo.tools.my_tool.parser.MyToolParser
- ex:
- This class MUST have an empty constructor or no constructor
- This class MUST implement 4 methods:
def get_scan_types(self)This function return a list of all the scan_type supported by your parser. This identifiers are used internally. Your parser can support more than one scan_type. For example some parsers use different identifier to modify the behavior of the parser (aggregate, filter, etc...)def get_label_for_scan_types(self, scan_type):This function return a string used to provide some text in the UI (short label)def get_description_for_scan_types(self, scan_type):This function return a string used to provide some text in the UI (long description)def get_findings(self, file, test)This function return a list of findings
- If your parser have more than 1 scan_type (for detailled mode) you MUST implement
def set_mode(self, mode)method - The parser instance is re-used over all imports performed for this scan_type, so do not store any data at class level
Example:
class MyToolParser(object):
def get_scan_types(self):
return ["My Tool Scan", "My Tool Scan detailed"]
def get_label_for_scan_types(self, scan_type):
if scan_type == "My Tool Scan":
return "My Tool XML Scan aggregated by ..."
else:
return "My Tool XML Scan"
def get_description_for_scan_types(self, scan_type):
return "Aggregates findings per cwe, title, description, file_path. SonarQube output file can be imported in HTML format. Generate with https://github.com/soprasteria/sonar-report version >= 1.1.0"
def requires_file(self, scan_type):
return False
# mode:
# None (default): aggregates vulnerabilites per sink filename (legacy behavior)
# 'detailed' : No aggregation
mode = None
def set_mode(self, mode):
self.mode = mode
def get_findings(self, file, test):
<...>DefectDojo has a limited number of API parsers. While we won't remove these connectors, adding API connectors has been problematic and thus we cannot accept new API parsers / connectors from the community at this time for supportability reasonsing. To maintain a high quality API connector, it is necessary to have a license to the tool. To get that license requires partnership with the author or vendor. We're close to announcing a new program to help address this and bring API connectors to DefectDojo.
Use the template parser to quickly generate the files required. To get started you will need to install cookiecutter.
{{< highlight bash >}} $ pip install cookiecutter {{< /highlight >}}
Then generate your scanner parser from the root of django-DefectDojo:
{{< highlight bash >}} $ cookiecutter https://github.com/DefectDojo/cookiecutter-scanner-parser {{< /highlight >}}
Read more on the template configuration variables.
Here is a list of considerations that will make the parser robust for both common cases and edge cases.
We use 2 modules to handle endpoints:
hyperlinkdojo.modelswith a specific class to handle processing around URLs to create endpointsEndpoint.
All the existing parser use the same code to parse URL and create endpoints.
Using Endpoint.from_uri() is the best way to create endpoints.
If you really need to parse an URL, use hyperlink module.
Good example:
if "url" in item:
endpoint = Endpoint.from_uri(item["url"])
finding.unsaved_endpoints = [endpoint]Very bad example:
u = urlparse(item["url"])
endpoint = Endpoint(host=u.host)
finding.unsaved_endpoints = [endpoint]Various file formats are handled through libraries. In order to keep DefectDojo slim and also don't extend the attack surface, keep the number of libraries used minimal and take other parsers as an example.
As xml is by default an unsecure format, the information parsed from various xml output has to be parsed in a secure way. Within an evaluation, we determined that defusedXML is the library which we will use in the future to parse xml files in parsers as this library is rated more secure. Thus, we will only accept PRs with the defusedxml library.
Parsers may have many fields, out of which many of them may be optional.
It better to not set attribute if you don't have data instead of filling with values like NA, No data etc...
Check class dojo.models.Finding
Always make sure you include checks to avoid potential KeyError errors (e.g. field does not exist), for those fields you are not absolutely certain will always be in file that will get uploaded. These translate to 500 error, and do not look good.
Good example:
if "mykey" in data:
finding.cwe = data["mykey"]Data can have CVSS vectors or scores. Defect Dojo use the cvss module provided by RedHat Security.
There's also a helper method to validate the vector and extract the base score and severity from it.
from dojo.utils import parse_cvss_data
cvss_vector = <get CVSS3 or CVSS4 vector from the report>
cvss_data = parse_cvss_data(cvss_vector)
if cvss_data:
finding.severity = cvss_data["severity"]
finding.cvssv3 = cvss_data["cvssv3"]
finding.cvssv4 = cvss_data["cvssv4"]
# we don't set any score fields as those will be overwritten by Defect DojoNot all values have to be used as scan reports usuyall provide their own value for severity.
And sometimes also for cvss_score. Defect Dojo will not overwrite any cvss3_score or cvss4_score.
If no score is set, Defect Dojo will use the cvss library to calculate the score.
The response also has the detected major version of the CVSS vector in cvss_data["major_version"].
If you need more manual processing, you can parse the CVSS vector directly.
Example of use:
import cvss.parser
from cvss import CVSS2, CVSS3, CVSS4
# TEMPORARY: Use Defect Dojo implementation of `parse_cvss_from_text` white waiting for https://github.com/RedHatProductSecurity/cvss/pull/75 to be released
vectors = dojo.utils.parse_cvss_from_text("CVSS:3.0/S:C/C:H/I:H/A:N/AV:P/AC:H/PR:H/UI:R/E:H/RL:O/RC:R/CR:H/IR:X/AR:X/MAC:H/MPR:X/MUI:X/MC:L/MA:X")
if len(vectors) > 0 and type(vectors[0]) is CVSS3:
print(vectors[0].severities()) # this is the 3 severities
cvssv3 = vectors[0].clean_vector()
severity = vectors[0].severities()[0]
vectors[0].compute_base_score()
cvssv3_score = vectors[0].scores()[0]
finding.severity = severity
finding.cvssv3_score = cvssv3_scoreDo not do something like this:
def get_severity(self, cvss, cvss_version="2.0"):
cvss = float(cvss)
cvss_version = float(cvss_version[:1])
# If CVSS Version 3 and above
if cvss_version >= 3:
if cvss > 0 and cvss < 4:
return "Low"
elif cvss >= 4 and cvss < 7:
return "Medium"
elif cvss >= 7 and cvss < 9:
return "High"
elif cvss >= 9:
return "Critical"
else:
return "Informational"
# If CVSS Version prior to 3
else:
if cvss > 0 and cvss < 4:
return "Low"
elif cvss >= 4 and cvss < 7:
return "Medium"
elif cvss >= 7 and cvss <= 10:
return "High"
else:
return "Informational"
By default a new parser uses the 'legacy' deduplication algorithm documented at https://documentation.defectdojo.com/usage/features/#deduplication-algorithms
Please use a pre-defined deduplication algorithm where applicable. When using the unique_id_from_tool or vuln_id_from_tool fields in the hash code configuration, it's important that these are uqniue for the finding and constant over time across subsequent scans. If this is not the case, the values can still be useful to set on the finding model without using them for deduplication.
The values must be coming from the report directly and must not be something that is calculated by the parser internally.
Each parser must have unit tests, at least to test for 0 vuln, 1 vuln and many vulns. You can take a look at how other parsers have them for starters. The more quality tests, the better.
It's important to add checks on attributes of findings. For ex:
with self.subTest(i=0):
finding = findings[0]
self.assertEqual("test title", finding.title)
self.assertEqual(True, finding.active)
self.assertEqual(True, finding.verified)
self.assertEqual(False, finding.duplicate)
self.assertIn(finding.severity, Finding.SEVERITIES)
self.assertEqual("CVE-2020-36234", finding.vulnerability_ids[0])
self.assertEqual(261, finding.cwe)
self.assertEqual("CVSS:3.1/AV:N/AC:L/PR:H/UI:R/S:C/C:L/I:L/A:N", finding.cvssv3)
self.assertIn("security", finding.tags)
self.assertIn("network", finding.tags)
self.assertEqual("3287f2d0-554f-491b-8516-3c349ead8ee5", finding.unique_id_from_tool)
self.assertEqual("TEST1", finding.vuln_id_from_tool)In order to make certain that file handles are closed properly, please use the with pattern to open files. Instead of:
testfile = open("path_to_file.json")
...
testfile.close()use:
with open("path_to_file.json") as testfile:
...This ensures the file is closed at the end of the with statement, even if an exception occurs somewhere in the block.
Django uses a separate test database for running unit tests called test_defectdojo. It's automatically created and initialized with a basic set of test data.
This local command will launch the unit test for your new parser
{{< highlight bash >}} $ docker compose exec uwsgi bash -c 'python manage.py test unittests.tools.<your_unittest_py_file>.<main_class_name> -v2' {{< /highlight >}}
or like this:
{{< highlight bash >}} $ ./run-unittest.sh --test-case unittests.tools.<your_unittest_py_file>.<main_class_name> {{< /highlight >}}
Example for the aqua parser:
{{< highlight bash >}} $ docker compose exec uwsgi bash -c 'python manage.py test unittests.tools.test_aqua_parser.TestAquaParser -v2' {{< /highlight >}}
or like this:
{{< highlight bash >}} $ ./run-unittest.sh --test-case unittests.tools.test_aqua_parser.TestAquaParser {{< /highlight >}}
If you want to run all parser unit tests, simply run $ docker-compose exec uwsgi bash -c 'python manage.py test -p "test_*_parser.py" -v2'
Some types of parsers create a list of endpoints that are vulnerable (they are stored in finding.unsaved_endpoints). DefectDojo requires storing endpoints in a specific format (which follow RFCs). Endpoints that do not follow this format can be stored but they will be marked as broken (red flag 🚩in UI). To be sure your parse store endpoints in the correct format run the .clean() function for all endpoints in unit tests
findings = parser.get_findings(testfile, Test())
for finding in findings:
for endpoint in finding.unsaved_endpoints:
endpoint.clean()Not only parser but also importer should be tested.
patch method from unittest.mock is usualy usefull for simulating API responses.
It is highly recommeded to use it.
In the event where you'd have to change the model, e.g. to increase a database column size to accomodate a longer string of data to be saved
-
Change what you need in
dojo/models.py -
Create a new migration file in dojo/db_migrations by running and including as part of your PR
{{< highlight bash >}} $ docker compose exec uwsgi bash -c 'python manage.py makemigrations -v2' {{< /highlight >}}
If you want to be able to accept a new type of file for your parser, take a look at dojo/forms.py around line 436 (at the time of this writing) or locate the 2 places (for import and re-import) where you find the string attrs={"accept":.
Formats currently accepted: .xml, .csv, .nessus, .json, .html, .js, .zip.
Of course, nothing prevents you from having more files than the parser.py file. It's python :-)
If you want to take a look at previous parsers that are now part of DefectDojo, take a look at https://github.com/DefectDojo/django-DefectDojo/pulls?q=is%3Apr+sort%3Aupdated-desc+label%3A%22Import+Scans%22+is%3Aclosed
Please add a new .md file in [docs/content/en/connecting_your_tools/parsers] with the details of your new parser. Include the following content headings:
- Acceptable File Type(s) - please include how to generate this type of file from the related tool, as some tools have multiple methods or require specific commands.
- An example unit test block, if applicable.
- A link to the relevant unit tests folder so that users can quickly navigate there from Documentation.
- A link to the scanner itself - (e.g. GitHub or vendor link)
Here is an example of a completed Parser documentation page: https://github.com/DefectDojo/django-DefectDojo/blob/master/docs/content/en/connecting_your_tools/parsers/file/acunetix.md