Skip to content

ReversingLabs SpectraAssure rl-json parser for DefectDojo#12579

Merged
mtesauro merged 8 commits into
DefectDojo:devfrom
rl-maartenb:dev
Jun 20, 2025
Merged

ReversingLabs SpectraAssure rl-json parser for DefectDojo#12579
mtesauro merged 8 commits into
DefectDojo:devfrom
rl-maartenb:dev

Conversation

@rl-maartenb

Copy link
Copy Markdown
Contributor

Feature: ReversingLabs SpectraAssure rl-json parser for DefectDojo

Documentation and unittests provded according to the plugin documentation

This adds a new parser for the rl-json format produced by the SpectraAssure cli or portal.

@dryrunsecurity

dryrunsecurity Bot commented Jun 10, 2025

Copy link
Copy Markdown

DryRun Security

This pull request contains a potential resource exhaustion risk in the ReversingLabs SpectraAssure JSON parsing method, where large JSON inputs could consume excessive memory during parsing, though the risk is currently classified as non-blocking.

Resource Exhaustion Risk in dojo/tools/reversinglabs_spectraassure/rlJsonInfo/__init__.py
Vulnerability Resource Exhaustion Risk
Description Large JSON input could consume excessive memory during parsing. The json.load() method reads the entire file into memory, which could lead to a potential denial of service if an extremely large file is processed.

import copy
import datetime
import json
import logging
import sys
from typing import Any
from .cve_info_node import CveInfoNode
logger = logging.getLogger(__name__)
"""
# rl-json report
Note:
This is all ReversingLabs terminology.
DefectDojo also has `components`,
but that reflects to purl's of rl:components or rl:dependencies depending on where the cve was detected.
A description of the `rl.json report` but cut up in usable parts.
See also [rl-json-schema](https://docs.secure.software/cli/rl-json-schema) .
## Metadata
The Main metadata components in the rl-json-report metadata file are: (2025-06).
- assessments <br>
A summary of key risks or safety concerns found in your software.
Detected risks are grouped into categories according to their shared characteristics.
Every policy is mapped to a risk category.
When that policy is violated, an issue is reported to cause risk in that category.
- components <br>
Components detected and processed in the analyzed software, represented as a map of unique component IDs.
For every component-id,
the same information is listed as for the whole software package in the report.info.file object.
- cryptography <br>
Cryptographic assets detected in the analyzed software.
- dependencies <br>
Dependencies detected and processed in the analyzed software,
represented as a map of dependency IDs.
For every dependency-id,
the same information is listed as for the whole software package in the report.info.file.identity object.
- indicators <br>
Behavior indicators for the analyzed software as identified by the Spectra Assure engine.
- licenses <br>
A list of licenses found in the analyzed software package.
- ml_models <br>
Machine learning model card with information about the ML models detected in the analyzed software package.
- secrets <br>
Sensitive information (secrets) detected in the analyzed software package.
- services <br>
Networking services detected in the analyzed software package.
In the context of Spectra Assure reports, services are network locations that the analyzed software reaches out to.
- violations <br>
Policy violations detected in the analyzed software package.
- vulnerabilities <br>
Known vulnerabilities affecting analyzed software components and dependencies.
CVE nomenclature is preferred,
but alternatives may be used if the CVE number is not available for the detected vulnerability.
## Chains
Data is chained so that items point to relevant other items like:
digraph "rl-json-report-components" {
rankdir=LR
// the toplevel entrypoint
report
// first level sub keys
info
metadata
// info sub items
detections
disabled
file
inhibitors
properties
statistics
unpacking
warnings
// metadata sub items
assessments
components
cryptography
dependencies
indicators
licenses
secrets
services
violations
vulnerabilities
algorithms
certificates
materials
// EDGES
edge [color=black]
report -> info
report -> metadata
edge [color=blue]
info -> detections
info -> disabled
info -> file
info -> statistics -> quality
info -> properties
info -> inhibitors
info -> unpacking
info -> warnings
edge [color=red]
metadata -> assessments
metadata -> components
metadata -> cryptography
metadata -> dependencies
metadata -> indicators
metadata -> licenses
metadata -> secrets
metadata -> services
metadata -> violations
metadata -> vulnerabilities
edge [color=brown]
cryptography -> algorithms
cryptography -> certificates
cryptography -> materials
edge [color=green,style=dotted]
algorithms -> components
certificates -> components
materials -> components
secrets -> components
services -> components
violations -> components
dependencies -> vulnerabilities
licenses -> violations
components -> dependencies
vulnerabilities -> violations
}
## Extracting Findings
Components are extracted files embedded in the main file that was provided to the scanner.
For example zip archives, iso images, docker images, windows installers, rpm's and so forth
are all files that when scanned produce a collection of components (embedded files in the main file scanned).
The current focus for extracting findings is vulnerabilities (cve's) on items,
where items can be:
1. `component` -> `vulnerability` <br>
In the case of components without dependencies the vulnerability is detected directly on the extracted component file.
2. `component` -> `depdendency` -> `vulnerability` <br>
In the case where a vulnerability is detected on a dependency,
we need the full chain in order to preserve the full path of detection.
"""
class RlJsonInfo:
SCAN_TOOL_NAME: str = "ReversingLabs SpectraAssure"
info: dict[str, Any]
# we currently only use components, dependencies and vulnerabilities
known_metadata_sub_keys: list[str] = [
"assessments",
"components", # we use this
"cryptography",
"dependencies", # we use this
"indicators",
"licenses",
"ml_models",
"services",
"secrets",
"violations",
"vulnerabilities", # we use this
]
assessments: dict[str, Any]
components: dict[str, Any]
cryptography: dict[str, Any]
dependencies: dict[str, Any]
indicators: dict[str, Any]
licenses: dict[str, Any]
ml_models: dict[str, Any]
services: dict[str, Any]
secrets: dict[str, Any]
violations: dict[str, Any]
vulnerabilities: dict[str, Any]
_rest: dict[str, Any] # after extracting and removing known sub key data, what remains goes here
sverity_map: dict[int, str] = {
1: "Info",
2: "Low",
3: "Medium",
4: "High",
5: "Critical",
}
common_tags_map: dict[str, str] = {
"FIXABLE": "Fix Available",
"EXISTS": "Exploit Exists",
"MALWARE": "Exploited by Malware",
"MANDATE": "Patching Mandated",
"UNPROVEN": "CVE Discovered",
}
# sort order, to align with Spectra Assure Portal
# 1: Fix Available
# 2: Exploit exists
# 3: Exploited my malware
# 4: Patch mandated
impact_sort_order: list[str] = [
"Fix Available",
"Exploit Exists",
"Exploited by Malware",
"Patching Mandated",
"CVE Discovered",
]
# dict:cve, comp_uuid, dep_uuid | None -> CveInfoNode
# for cve on components we get the info with path: cve.comp_uuid.None
# for cve on dependency on component we het the info with path: cve.dep_uuid.comp_uuid
_results: dict[str, dict[str, dict[str | None, CveInfoNode]]]
def __init__(
self,
file_handle: Any,
) -> None:
self.file_name: str = file_handle.name
logger.debug("file: %s", self.file_name)
self.data: dict[str, Any] = json.load(file_handle)
self._results = {}
self._get_info()
self._get_meta()
self._get_rest()
def _get_info(
self,
) -> None:
logger.debug("")
report = self.data.get("report", {})
key = "info"
if key in report:
self.info = copy.deepcopy(report.get(key, {}))
del report[key]
def _get_meta(
self,
) -> None:
logger.debug("")
report = self.data.get("report", {})
metadata = report.get("metadata", {})
# make all the known meta sub keys into instance dicts
for name in self.known_metadata_sub_keys:
if name in metadata:
setattr(
self,
name,
copy.deepcopy(metadata.get(name, {})),
)
del metadata[name]
if len(metadata) == 0:
del report["metadata"]
if len(report) == 0:
del self.data["report"]
def _get_rest(
self,
) -> None:
logger.debug("")
self._rest = copy.deepcopy(self.data)
self.data = {}
def _find_sha256_in_components(
self,
sha256: str,
) -> bool:
logger.debug("")
for component in self.components.values():
comp_sha256 = self._get_sha256(data=component)
if comp_sha256 == sha256:
return True
return False
def _add_to_results(
self,
cve: str,
comp_uuid: str,
dep_uuid: str | None,
cve_info_node_instance: CveInfoNode | None,
) -> None:
logger.debug("")
if cve_info_node_instance is None:
return
# prep empty keys
if cve not in self._results:
self._results[cve] = {}
if comp_uuid not in self._results[cve]:
self._results[cve][comp_uuid] = {}
# put the data in
if dep_uuid not in self._results[cve][comp_uuid]:
self._results[cve][comp_uuid][dep_uuid] = cve_info_node_instance
def _get_sha256(
self,
data: dict[str, Any],
) -> str:
logger.debug("")
# all components are derived from unpacked files and so have a hash set
# we need the sha256
key = "sha256"
h = data.get("hashes", [])
for item in h:
if item[0] == key:
return str(item[1])
logger.error("no '%s' found for this item %s", key, data)
return ""
def _score_to_severity(
self,
score: float,
) -> str:
logger.debug("")
if score >= 9:
return self.sverity_map[5]
if score >= 7:
return self.sverity_map[4]
if score >= 4:
return self.sverity_map[3]
if score > 0:
return self.sverity_map[2]
return self.sverity_map[1]
def _use_path_or_name(
self,
*,
data: dict[str, Any],
purl: str,
name_first: bool = False,
prefer_path: bool = True,
) -> str:
logger.debug("")
# path or name may be empty so look for the non empty one
# with name_first we first look at the name
# with prefer path we use path if it is not empty
# if we have a valid purl
# prefer to derive the name from the purl
path = data.get("path", "")
name = data.get("name", "")
if name_first and len(name) > 0:
return str(name)
if prefer_path and len(path) > 0:
return str(path)
if purl and len(purl) > 0 and "@" in purl:
s = purl
if "/" in s:
ii = purl.index("/")
s = purl[ii + 1 :]
aa = s.split("@")
name = aa[0]
# version = aa[1]
return str(name)
fallback = ""
if name_first is False:
if path != "":
return str(path)
if name != "":
return str(name)
return fallback
if name != "":
return str(name)
if path != "":
return str(path)
return fallback
def _get_tags_from_cve(self, this_cve: dict[str, Any]) -> list[str]:
tags: list[str] = []
exploit = this_cve.get("exploit", [])
if len(exploit) == 0:
return tags # we have no exploit info so no tags
# turn cve exploit info into tags
for key in exploit:
tag = self.common_tags_map.get(key)
if tag is None:
logger.warning("missing tag for key: %s", key)
continue
tags.append(tag)
return tags
def _make_impact_from_tags(
self,
tags: list[str],
impact: str | None,
) -> str:
if impact is None:
impact = ""
for tag in self.impact_sort_order:
if tag in tags:
impact += tag + "\n"
return impact
def _make_new_cve_info_node(
self,
cve: str,
comp_uuid: str,
dep_uuid: str | None,
active: Any,
) -> CveInfoNode | None:
"""Collect all info we can extract from the cve and put in in the CveInfoNode"""
logger.debug("")
this_cve = self.vulnerabilities.get(cve)
if this_cve is None:
logger.error("missing cve info for: %s", cve)
return None
cve_info_node_instance = CveInfoNode()
cve_info_node_instance.cve = cve
cve_info_node_instance.comp_uuid = comp_uuid
cve_info_node_instance.dep_uuid = dep_uuid
cve_info_node_instance.active = bool(active)
f_info: dict[str, Any] = self.info.get("file", {})
cve_info_node_instance.original_file = str(f_info.get("name", ""))
cve_info_node_instance.original_file_sha256 = self._get_sha256(f_info)
cve_info_node_instance.scan_date = datetime.datetime.fromisoformat(self._rest["timestamp"]).date()
cve_info_node_instance.scan_tool = self.SCAN_TOOL_NAME
cve_info_node_instance.scan_tool_version = self._rest.get("version", "no_scan_tool_version_specified")
cve_info_node_instance.cvss_version = int(this_cve.get("cvss", {}).get("version", "0"))
score = float(this_cve.get("cvss", {}).get("baseScore", "0.0"))
cve_info_node_instance.score = score
cve_info_node_instance.score_severity = self._score_to_severity(score=score)
cve_info_node_instance.tags = self._get_tags_from_cve(this_cve)
cve_info_node_instance.impact = self._make_impact_from_tags(
cve_info_node_instance.tags,
cve_info_node_instance.impact,
)
return cve_info_node_instance
def _get_component_purl(
self,
component: dict[str, Any],
) -> str:
return str(component.get("identity", {}).get("purl", ""))
def _get_dependency_purl(
self,
dependency: dict[str, Any],
) -> str:
return str(dependency.get("purl", ""))
def _do_one_cve_component_without_dependencies(
self,
comp_uuid: str,
component: dict[str, Any],
cve: str,
active: Any,
) -> CveInfoNode | None:
# one: component -> cve
# the cve part (now we have one component and one vulnerability)
logger.debug("comp: %s; cve: %s", comp_uuid, cve)
cve_info_node_instance = self._make_new_cve_info_node(
cve=cve,
active=active,
comp_uuid=comp_uuid,
dep_uuid=None,
)
if cve_info_node_instance is None:
return None
ident = component.get("identity", {})
c_purl = self._get_component_purl(component=component)
cve_info_node_instance.component_file_path = self._use_path_or_name(data=component, purl=c_purl)
cve_info_node_instance.component_file_sha256 = self._get_sha256(data=component)
cve_info_node_instance.component_file_purl = c_purl
cve_info_node_instance.component_file_version = ident.get("version", "")
cve_info_node_instance.component_file_name = component.get("name", "")
cve_info_node_instance.component_type = "component"
cve_info_node_instance.component_name = self._use_path_or_name(data=component, purl=c_purl, name_first=True)
cve_info_node_instance.component_version = ident.get("version", "")
cve_info_node_instance.component_purl = c_purl
cve_info_node_instance.make_title_cin(cve=cve)
cve_info_node_instance.make_description_cin(cve=cve, purl=c_purl)
cve_info_node_instance.vuln_id_from_tool = cve
logger.debug("%s", cve_info_node_instance)
return cve_info_node_instance
def _get_all_active_cve_on_components_without_dependencies(
self,
) -> None:
# all: component -> cve
# the component part, could have many vulnerabilities
logger.debug("")
for comp_uuid, component in self.components.items():
v = component.get("identity", {}).get("vulnerabilities", None)
if v is None:
logger.info("no vulnerabilities for component: %s", comp_uuid)
continue
for cve in v.get("active", []):
cve_info_node_instance = self._do_one_cve_component_without_dependencies(
comp_uuid=comp_uuid,
component=component,
cve=cve,
active=True,
)
self._add_to_results(
cve=cve,
comp_uuid=comp_uuid,
dep_uuid=None,
cve_info_node_instance=cve_info_node_instance,
)
# =========================================================
# component -> dependency -> cve
def _do_one_cve_component_dependency(
self,
comp_uuid: str,
component: dict[str, Any],
dep_uuid: str,
dependency: dict[str, Any],
cve: str,
active: Any,
) -> CveInfoNode | None:
# one: component -> dependency -> cve
# the cve part (now we have one component, one dependency, one vulnerability)
logger.debug("comp: %s; dep: %s; cve: %s", comp_uuid, dep_uuid, cve)
cve_info_node_instance = self._make_new_cve_info_node(
cve=cve,
active=active,
comp_uuid=comp_uuid,
dep_uuid=dep_uuid,
)
if cve_info_node_instance is None:
return None
ident = component.get("identity", {})
c_purl = self._get_component_purl(component=component)
cve_info_node_instance.component_file_path = self._use_path_or_name(data=component, purl=c_purl)
cve_info_node_instance.component_file_sha256 = self._get_sha256(data=component)
cve_info_node_instance.component_file_purl = c_purl
cve_info_node_instance.component_file_version = ident.get("version", "")
cve_info_node_instance.component_file_name = component.get("name", "")
cve_info_node_instance.component_type = "dependency"
cve_info_node_instance.component_name = dependency.get(
"product",
f"no_{cve_info_node_instance.component_type}_product_provided",
)
cve_info_node_instance.component_version = dependency.get(
"version",
f"no_{cve_info_node_instance.component_type}_version_provided",
)
d_purl = self._get_dependency_purl(dependency=dependency)
cve_info_node_instance.component_purl = d_purl
cve_info_node_instance.make_title_cin(cve=cve)
cve_info_node_instance.make_description_cin(cve=cve, purl=d_purl)
cve_info_node_instance.vuln_id_from_tool = cve
dep_purl = dependency.get("purl", "")
dep_name = dependency.get("product", "")
dep_version = dependency.get("version", "")
# if we have a dependency purl then purl, otherwise component product + version
tail = dep_purl
if len(tail) == 0:
tail = f"{dep_name}@{dep_version}"
logger.debug("%s", cve_info_node_instance)
return cve_info_node_instance
def _get_one_active_cve_component_dependency(
self,
comp_uuid: str,
component: dict[str, Any],
dep_uuid: str,
) -> None:
# one: component -> dependency -> cve
# the dependency (could have many vulnerabilties)
logger.debug("")
dependency = self.dependencies.get(dep_uuid)
if dependency is None:
logger.error("missing dependency: %s", dep_uuid)
return
# -------------------------------
v = dependency.get("vulnerabilities")
if v is None:
logger.info("no vulnerabilities for dependency: %s", dep_uuid)
return
# -------------------------------
for cve in v.get("active"):
cve_info_node_instance = self._do_one_cve_component_dependency(
comp_uuid=comp_uuid,
component=component,
dep_uuid=dep_uuid,
dependency=dependency,
cve=cve,
active=True,
)
self._add_to_results(
cve=cve,
comp_uuid=comp_uuid,
dep_uuid=dep_uuid,
cve_info_node_instance=cve_info_node_instance,
)
def _get_all_active_cve_on_components_with_dependencies(
self,
) -> None:
# all: component -> dependency -> cve
# the component part
logger.debug("")
for comp_uuid, component in self.components.items():
d = component.get("identity", {}).get("dependencies", None)
if d is None:
logger.info("no dependencies for component: %s", comp_uuid)
continue
for dep_uuid in d:
# returns one dep_uuid, multiple cve (if any cve)
self._get_one_active_cve_component_dependency(
comp_uuid=comp_uuid,
component=component,
dep_uuid=dep_uuid,
)
def _verify_file_is_also_component(
self,
) -> bool:
logger.debug("")
# this is normally always true, but we verify it anyway.
# the file mentioned in the info part of the report must also be a component.
file_is_component: bool = False
f_info: dict[str, Any] = self.info.get("file", {})
file_sha256 = self._get_sha256(f_info)
file_is_component = self._find_sha256_in_components(file_sha256)
if file_is_component is False:
logger.error("file cannot be found as component: %s", f_info)
return file_is_component
# ==== PUBLIC ======
def get_results_list(self) -> list[CveInfoNode]:
# self.results[cve][comp_uuid][dep_uuid] -> cve_info_node_instance
cve_info_node_list: list[CveInfoNode] = []
for components in self._results.values():
for component in components.values():
for cve_info_node_instance in component.values():
cve_info_node_list.append(cve_info_node_instance)
return cve_info_node_list
def print_results_to_file_or_stdout(
self,
file_handle: Any = sys.stdout,
) -> None:
def default(o: Any) -> Any:
if type(o) is CveInfoNode:
return o.__dict__
if type(o) is datetime.date:
return o.isoformat() # YYYY-MM-DD
if type(o) is datetime.datetime:
return o.isoformat() # YYYY-MM-DD T hh:mm:ss <tz info>
msg: str = f"unsupported type: {type(o)}"
raise Exception(msg)
results: list[Any] = self.get_results_list()
print(
json.dumps(
results,
indent=4,
sort_keys=True,
default=default,
),
file=file_handle,
)
def get_cve_active_all(self) -> None:
"""
0: verify that the info -> file sha256 comes back as a component,
so we can forget about it as it will be processed as a component
A: walk over components with active vulnerabilities
B: walk over components -> dependencies with active vulnerabilities
"""
logger.debug("")
self.file_is_component = self._verify_file_is_also_component()
self._get_all_active_cve_on_components_without_dependencies()
self._get_all_active_cve_on_components_with_dependencies()


All finding details can be found in the DryRun Security Dashboard.

@rl-maartenb

Copy link
Copy Markdown
Contributor Author

after bot complain added 'title: ...' to the doc file

@valentijnscholten valentijnscholten added this to the 2.48.0 milestone Jun 11, 2025
@rl-maartenb

Copy link
Copy Markdown
Contributor Author

fixed the readme after the checker complained

@rl-maartenb

Copy link
Copy Markdown
Contributor Author

my local hugo extended test now shows no errors


logger = logging.getLogger(__name__)

WHAT = "ReversingLabs Spectra Assure"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give this a clearer name?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect you mean on the left side of the '=', yes will fix that

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

cin.score = score
cin.score_severity = self._score_to_severity(score=score)

# TODO: tags

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still a TODO?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry my mistake, tags have been added, i will remove the todo line

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if len(tail) == 0:
tail = f"{dep_name}@{dep_version}"

cin.unique_id_from_tool = f"{cin.component_file_sha256}:{cve}:{tail}"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the unique_id_from_tool value should be something that is in the report, not a computed/constructed value. Can you look at this?

@rl-maartenb rl-maartenb Jun 13, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is more tricky.

In order for unique-id from tool to be valuable for detecting duplicates across future uploads we have to distinguish between what we call components (unpacked-files), vulnerabilities (cve) and component-dependencies in our report.

We detect vulnerabilities (cve) directly on components (unpacked-files) and we detect vulnerabilities (cve) on dependencies of components.
Components (unpacked files) are files in our report and so have a unique sha256. Dependencies may occur on multiple files(components) in the scan (as a result of unpacking, zip files, msi installers and such).
Dependencies usually have a package_url, or if they dont they have a version and name or product string.
So to uniquely identify one vulnerability (cve) on one item (component or dependency) we need to combine items.

  • for vulnerabilities only on components (without dependencies) we need the sha256 and the cve to uniquely point to the file and the cve.
  • for vulnerabilities on dependencies of compnents we need the sha256 of the components the package-url of the dependency and the cve of the vulneramility.
    Once we have that we can fully use deduplication on future scans and imports into DefectDojo.

The report file internally uses 'uuid' to condense the report items but they are only unique inside this one report-file so have no meaning for deduplication. As such we do not have one single uniqe id in our report that uniquely identifies one DefectDojo-Finding.

A good example is the provided multi language installer in 'unittests/scans/reversinglabs_spectraassure/HxDSetup_2.5.0.exe-report.rl.json'
as the installer supports 18 different languages we see 18 embedded files. each file has dependencies and many depdendencies have vulnerabilities we report.

I hope this helps to explain why we have to construct a unique id.
i would guess the alternative would be not to privide a unique-id.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs review

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would guess the alternative would be not to privide a unique-id.

I suspect this may be the case. The unique_id_from_tool is intended to track the same vulnerability fro report to report. Ideally, this is handled by the vendor, but in cases where it is not, DefectDojo will essentially construct something similar you are doing, but not at the parser level. Instead, those fields will be captured in the deduplication settings through the hash code fields. It is essentially the same thing you are doing, just with DefectDojo fields after the findings are created by the importer/reimporter.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uniq_id from tool removed

@Maffooch Maffooch left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structure of this parser is quite a bit different than other parsers that are found in DefectDojo. This may make it bit more difficult to maintain going forward

The trivy parser could be a good example to mimic

@rl-maartenb

Copy link
Copy Markdown
Contributor Author

Will have a look at the trivy parser but i already see that it will not match our data, we use a significant amount or references to other items in our reports.

I see that most data from other reports produce a relativly flat list of results, however we produce compacted dicts with cross references.

The actual parsing happens in:

I can naturally move that code into parser.py if that would suit you better.

In general our report requires some significant study to see how the network of references work before doing any maintenance on the parser as it is easy to overlook things or make the wrong assuptions.

@Maffooch

Copy link
Copy Markdown
Contributor

In general our report requires some significant study to see how the network of references work before doing any maintenance on the parser as it is easy to overlook things or make the wrong assuptions.

Wow, you weren't kidding! 2400 lines for a single vulnerability is quite a bit of data.

This confirms my fear that if, for reason, you're not around to maintain this parser if something changes in the report format producee by Reversing Labs, making changes will be a heavy task. I suppose the request for simplification is more important than I realized.

Rather than asking you to reengineer the parsing logic, I think we can get away with documenting the parser. How does it sound to heavily comment the code in both files, and then write a short README in the same directory as the parser.py file that explains why things are split and which classes are responsible for what?

@valentijnscholten

Copy link
Copy Markdown
Member

I'd like to request to make the variable names more clear as well. I had trouble following the code with variables like r, rr, cin.

@rl-maartenb

Copy link
Copy Markdown
Contributor Author

@valentijnscholten: will do

@Maffooch: good idea, will do

@rl-maartenb

Copy link
Copy Markdown
Contributor Author

both done, short names replaced with more meaningful ones, the actual parser in rlJsonInfo now explains better what we use currently from the rl-json report and how things are chained inside the report.

@rl-maartenb

Copy link
Copy Markdown
Contributor Author

not sure if this is related to my changes
Unit tests, Attempt #2: Some jobs were not successful
https://github.com/DefectDojo/django-DefectDojo/actions/runs/15733336564/job/44360465014

Run helm install \
  helm install \
    --timeout 800s \
    --wait \
    --wait-for-jobs \
    defectdojo \
    ./helm/defectdojo \
    --set django.ingress.enabled=true \
    --set imagePullPolicy=Never \
    --set initializer.keepSeconds="-1" \
     --set postgresql.enabled=true --set createPostgresqlSecret=true  \
     --set redis.enabled=true --set celery.broker=redis --set createRedisSecret=true  \
    --set createSecret=true \
    --set tag=debian
  shell: /usr/bin/bash -e {0}
  env:
    DD_HOSTNAME: defectdojo.default.minikube.local
    HELM_REDIS_BROKER_SETTINGS:  --set redis.enabled=true --set celery.broker=redis --set createRedisSecret=true 
    HELM_PG_DATABASE_SETTINGS:  --set postgresql.enabled=true --set createPostgresqlSecret=true 
    MINIKUBE_HOME: /home/runner/work/_temp
    pgsql:  --set postgresql.enabled=true --set createPostgresqlSecret=true 
    redis:  --set redis.enabled=true --set celery.broker=redis --set createRedisSecret=true 
walk.go:7[5](https://github.com/DefectDojo/django-DefectDojo/actions/runs/15733336564/job/44360465014#step:9:5): found symbolic link in path: /home/runner/work/django-DefectDojo/django-DefectDojo/helm/defectdojo/README.md resolves to /home/runner/work/django-DefectDojo/django-DefectDojo/readme-docs/KUBERNETES.md. Contents of linked file included and used
Error: INSTALLATION FAILED: context deadline exceeded
Error: Process completed with exit code 1.

@rl-maartenb rl-maartenb requested a review from Maffooch June 18, 2025 18:52
@Maffooch Maffooch requested a review from hblankenship June 20, 2025 15:30

@mtesauro mtesauro left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

@mtesauro mtesauro merged commit 6e90f26 into DefectDojo:dev Jun 20, 2025
147 of 148 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants