Skip to content

Commit 0f3942c

Browse files
committed
feat(parsers): add Xygeni JSON parser (SAST, SCA, Secrets)
Add a single first-party parser at dojo/tools/xygeni/ that handles three Xygeni JSON report kinds (SAST, SCA, Secrets) by dispatching on metadata.scanType. Mirrors the multi-scan-type pattern of rusty_hog, anchore_grype, checkmarx and sonarqube. Pre-approval: #14755
1 parent a5dd701 commit 0f3942c

14 files changed

Lines changed: 26739 additions & 0 deletions

File tree

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
---
2+
title: "Xygeni"
3+
toc_hide: true
4+
---
5+
### About Xygeni
6+
[Xygeni](https://xygeni.io) is a Software Supply Chain Security platform whose
7+
scanners produce JSON reports for code vulnerabilities (SAST), open-source
8+
dependency vulnerabilities (SCA), hard-coded secrets, IaC flaws, web-application
9+
vulnerabilities (DAST), CI/CD and SCM misconfigurations, and malicious or
10+
suspect components.
11+
12+
This parser handles three Xygeni scan kinds in phase 1: **SAST**, **SCA**, and
13+
**Secrets**. All three share a common `metadata` envelope; the parser
14+
dispatches on `metadata.scanType`.
15+
16+
### Scan Types
17+
| Scan type | `metadata.scanType` | Xygeni CLI command (typical) |
18+
| ------------------------ | ------------------- | ---------------------------- |
19+
| `Xygeni SAST Scan` | `sast` | `xygeni scan --scan-type=sast --format=json` |
20+
| `Xygeni SCA Scan` | `deps` | `xygeni scan --scan-type=deps --format=json` |
21+
| `Xygeni Secrets Scan` | `secrets` | `xygeni scan --scan-type=secrets --format=json` |
22+
23+
See the Xygeni documentation at <https://docs.xygeni.io> for installation and
24+
the full set of CLI options.
25+
26+
### Acceptable JSON Format
27+
All three scan types share the same envelope:
28+
29+
~~~
30+
{
31+
"metadata": {
32+
"uuid": "...",
33+
"timestamp": "2026-04-26T07:08:29Z",
34+
"projectName": "...",
35+
"scanType": "sast" | "deps" | "secrets",
36+
"format": "<scanType>-xygeni",
37+
"reportProperties": {
38+
"tool.name": "Xygeni",
39+
"tool.version": "..."
40+
}
41+
},
42+
...
43+
}
44+
~~~
45+
46+
The kind-specific payload then follows:
47+
48+
- **SAST**`vulnerabilities[]` — each entry carries `detector` (the rule id),
49+
`severity`, `location.{filepath, beginLine, endLine, code}`, `cwe` /
50+
`cwes[]`, `tags[]`, `explanation`, `uniqueHash`, `issueId`, and an optional
51+
`codeFlows[]` block describing source / sink frames and the data path.
52+
- **SCA**`dependencies[]` — each dependency has `name`, `version`,
53+
`ecosystem`, and a nested `vulnerabilities[]` of CVE/GHSA advisories with
54+
`cve`, `cwes`, `fixedVersion`, `aliases`, `overallCvssScore`, `references`,
55+
`description`, `uniqueHash`, `issueId`.
56+
- **Secrets**`secrets[]` — each entry has `type` (e.g.
57+
`aws_access_key`), `detector`, `severity`, `location` (same shape as SAST),
58+
`description`, `tags`, `uniqueHash`, `issueId`. The `secret` value and
59+
`location.code` are already redacted by the Xygeni CLI before serialisation.
60+
61+
### Sample Scan Data
62+
Sample Xygeni JSON reports can be found
63+
[here](https://github.com/DefectDojo/django-DefectDojo/tree/master/unittests/scans/xygeni).
64+
65+
### Default Deduplication Hashcode Fields
66+
The parser sets `unique_id_from_tool` from each finding's vendor-stable
67+
`uniqueHash`, so re-importing the same Xygeni report does not duplicate
68+
findings. `vuln_id_from_tool` is set from `issueId`.

dojo/tools/xygeni/__init__.py

Whitespace-only changes.

dojo/tools/xygeni/_common.py

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
"""Shared helpers for the Xygeni multi-scan-type parser."""
2+
3+
import re
4+
5+
SEVERITY_MAP = {
6+
"critical": "Critical",
7+
"high": "High",
8+
"medium": "Medium",
9+
"low": "Low",
10+
"info": "Info",
11+
}
12+
13+
_CWE_TAG_RE = re.compile(r"^CWE[:\-]?(\d+)$", re.IGNORECASE)
14+
15+
16+
def map_severity(value):
17+
"""Map a Xygeni lowercase severity to a DefectDojo severity. Unknown values become Info."""
18+
if value is None:
19+
return "Info"
20+
return SEVERITY_MAP.get(str(value).lower(), "Info")
21+
22+
23+
def parse_cwe(cwes=None, cwe=None, tags=None):
24+
"""
25+
Resolve a CWE integer from any of the Xygeni representations.
26+
27+
Preference order:
28+
1. The numeric ``cwe`` field on the finding.
29+
2. The first ``"CWE-N"`` entry in ``cwes``.
30+
3. The first ``"CWE:N"`` / ``"cwe:N"`` entry in ``tags``.
31+
"""
32+
if isinstance(cwe, int):
33+
return cwe
34+
for entry in cwes or []:
35+
match = _CWE_TAG_RE.match(str(entry))
36+
if match:
37+
return int(match.group(1))
38+
for entry in tags or []:
39+
match = _CWE_TAG_RE.match(str(entry))
40+
if match:
41+
return int(match.group(1))
42+
return None
43+
44+
45+
def extract_scan_type(data):
46+
"""Read ``metadata.scanType`` from a Xygeni report. Raises ``ValueError`` if absent."""
47+
if not isinstance(data, dict):
48+
msg = "Xygeni report root must be a JSON object"
49+
raise TypeError(msg)
50+
metadata = data.get("metadata") or {}
51+
scan_type = metadata.get("scanType")
52+
if not scan_type:
53+
msg = "Xygeni report is missing required 'metadata.scanType' field"
54+
raise ValueError(msg)
55+
return str(scan_type).lower()

dojo/tools/xygeni/parser.py

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
"""
2+
Parser for Xygeni JSON reports.
3+
4+
Xygeni (https://xygeni.io) is a Software Supply Chain Security platform.
5+
It emits a separate JSON report per scanner kind (SAST, SCA, secrets, IaC,
6+
CI/CD misconfig, DAST, suspect dependencies, code tampering). All reports
7+
share a common ``metadata`` envelope with a ``scanType`` discriminator.
8+
9+
Phase 1 of this parser handles SAST, SCA, and Secrets. Additional scan
10+
types are dispatched-on the same way and can be added incrementally.
11+
"""
12+
13+
import json
14+
import logging
15+
16+
from dojo.tools.xygeni._common import extract_scan_type
17+
from dojo.tools.xygeni.sast import parse_sast
18+
from dojo.tools.xygeni.sca import parse_sca
19+
from dojo.tools.xygeni.secrets import parse_secrets
20+
21+
logger = logging.getLogger(__name__)
22+
23+
24+
SCAN_TYPE_SAST = "Xygeni SAST Scan"
25+
SCAN_TYPE_SCA = "Xygeni SCA Scan"
26+
SCAN_TYPE_SECRETS = "Xygeni Secrets Scan"
27+
28+
# Map from the ``metadata.scanType`` value emitted by the Xygeni CLI to the
29+
# per-kind handler. Keys are lowercase, matching ``extract_scan_type``.
30+
_HANDLERS = {
31+
"sast": parse_sast,
32+
"deps": parse_sca,
33+
"secrets": parse_secrets,
34+
}
35+
36+
37+
class XygeniParser:
38+
39+
"""Single parser dispatching on ``metadata.scanType`` across Xygeni scan kinds."""
40+
41+
def get_scan_types(self):
42+
return [SCAN_TYPE_SAST, SCAN_TYPE_SCA, SCAN_TYPE_SECRETS]
43+
44+
def get_label_for_scan_types(self, scan_type):
45+
return scan_type
46+
47+
def get_description_for_scan_types(self, scan_type):
48+
if scan_type == SCAN_TYPE_SAST:
49+
return "Xygeni SAST JSON report (code vulnerabilities). Generated with 'xygeni scan --scan-type=sast'."
50+
if scan_type == SCAN_TYPE_SCA:
51+
return "Xygeni SCA JSON report (open-source dependency vulnerabilities). Generated with 'xygeni scan --scan-type=deps'."
52+
if scan_type == SCAN_TYPE_SECRETS:
53+
return "Xygeni Secrets JSON report (hard-coded secrets). Generated with 'xygeni scan --scan-type=secrets'."
54+
return "Xygeni JSON report."
55+
56+
def get_findings(self, file, test):
57+
data = json.load(file)
58+
kind = extract_scan_type(data)
59+
handler = _HANDLERS.get(kind)
60+
if handler is None:
61+
msg = (
62+
f"Unsupported Xygeni scanType '{kind}'. "
63+
f"Phase 1 supports: {sorted(_HANDLERS)}."
64+
)
65+
raise ValueError(msg)
66+
logger.debug("Xygeni parser dispatching on scanType=%s", kind)
67+
return handler(data, test)

dojo/tools/xygeni/sast.py

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
"""Parse Xygeni SAST reports into DefectDojo Findings."""
2+
3+
from dojo.models import Finding
4+
from dojo.tools.xygeni._common import map_severity, parse_cwe
5+
6+
7+
def parse_sast(data, test):
8+
"""Convert a Xygeni SAST JSON report into a list of Findings."""
9+
return [_build_finding(vuln, test) for vuln in data.get("vulnerabilities") or []]
10+
11+
12+
def _build_finding(vuln, test):
13+
location = vuln.get("location") or {}
14+
file_path = location.get("filepath")
15+
line = location.get("beginLine")
16+
code = location.get("code")
17+
18+
description_parts = []
19+
if vuln.get("explanation"):
20+
description_parts.append(str(vuln["explanation"]))
21+
if code:
22+
description_parts.append(f"```\n{code}\n```")
23+
24+
code_flow_text = _render_code_flows(vuln.get("codeFlows") or [])
25+
if code_flow_text:
26+
description_parts.append(code_flow_text)
27+
28+
finding = Finding(
29+
test=test,
30+
title=str(vuln.get("detector") or "Xygeni SAST finding"),
31+
description="\n\n".join(description_parts) if description_parts else "",
32+
severity=map_severity(vuln.get("severity")),
33+
file_path=file_path,
34+
line=line,
35+
cwe=parse_cwe(cwes=vuln.get("cwes"), cwe=vuln.get("cwe"), tags=vuln.get("tags")),
36+
static_finding=True,
37+
dynamic_finding=False,
38+
unique_id_from_tool=vuln.get("uniqueHash"),
39+
vuln_id_from_tool=vuln.get("issueId"),
40+
)
41+
42+
_apply_code_flow_fields(finding, vuln.get("codeFlows") or [])
43+
return finding
44+
45+
46+
def _render_code_flows(code_flows):
47+
"""Render Xygeni codeFlows[] into a human-readable markdown block for Finding.description."""
48+
if not code_flows:
49+
return ""
50+
51+
flow = code_flows[0]
52+
lines = ["**Data flow**"]
53+
for frame in flow.get("frames") or []:
54+
kind = frame.get("kind") or "step"
55+
loc = frame.get("location") or {}
56+
filepath = loc.get("filepath", "?")
57+
line = loc.get("beginLine", "?")
58+
snippet = (loc.get("code") or "").strip()
59+
lines.append(f"- **{kind}** {filepath}:{line} — `{snippet}`")
60+
return "\n".join(lines) if len(lines) > 1 else ""
61+
62+
63+
def _apply_code_flow_fields(finding, code_flows):
64+
"""Populate Finding.sast_source_* / sast_sink_object from the first code flow's first source/sink."""
65+
if not code_flows:
66+
return
67+
frames = code_flows[0].get("frames") or []
68+
source = next((f for f in frames if f.get("kind") == "source"), None)
69+
sink = next((f for f in frames if f.get("kind") == "sink"), None)
70+
71+
if source:
72+
loc = source.get("location") or {}
73+
finding.sast_source_file_path = loc.get("filepath")
74+
finding.sast_source_line = loc.get("beginLine")
75+
if source.get("injectionPoint"):
76+
finding.sast_source_object = source["injectionPoint"]
77+
78+
if sink:
79+
finding.sast_sink_object = sink.get("injectionPoint") or (sink.get("location") or {}).get("code")

dojo/tools/xygeni/sca.py

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
"""Parse Xygeni SCA (dependency-vulnerability) reports into DefectDojo Findings."""
2+
3+
from dojo.models import Finding
4+
from dojo.tools.xygeni._common import map_severity, parse_cwe
5+
6+
7+
def parse_sca(data, test):
8+
"""
9+
Convert a Xygeni SCA JSON report into a list of Findings.
10+
11+
The Xygeni SCA report stores findings nested inside ``dependencies[]`` —
12+
each dependency may carry a ``vulnerabilities[]`` array of CVE/GHSA
13+
advisories. This parser emits one Finding per nested vulnerability.
14+
"""
15+
findings = []
16+
for dep in data.get("dependencies") or []:
17+
findings.extend(
18+
_build_finding(dep, vuln, test) for vuln in dep.get("vulnerabilities") or []
19+
)
20+
return findings
21+
22+
23+
def _build_finding(dep, vuln, test):
24+
component_name = dep.get("name")
25+
component_version = dep.get("version")
26+
27+
title = str(vuln.get("cve") or vuln.get("id") or "Xygeni SCA finding")
28+
29+
fixed_version = vuln.get("fixedVersion")
30+
mitigation = None
31+
if fixed_version and component_name:
32+
mitigation = f"Upgrade {component_name} to version {fixed_version} or later."
33+
elif fixed_version:
34+
mitigation = f"Upgrade to version {fixed_version} or later."
35+
36+
references = "\n".join(str(r) for r in (vuln.get("references") or []) if r) or None
37+
38+
cvss_score = vuln.get("overallCvssScore")
39+
if cvss_score is None or cvss_score < 0:
40+
cvss_score = None
41+
42+
finding = Finding(
43+
test=test,
44+
title=title,
45+
description=str(vuln.get("description") or ""),
46+
severity=map_severity(vuln.get("severity")),
47+
cwe=parse_cwe(cwes=vuln.get("cwes")),
48+
cvssv3_score=cvss_score,
49+
mitigation=mitigation,
50+
references=references,
51+
component_name=component_name,
52+
component_version=component_version,
53+
static_finding=True,
54+
dynamic_finding=False,
55+
unique_id_from_tool=vuln.get("uniqueHash"),
56+
vuln_id_from_tool=vuln.get("issueId"),
57+
)
58+
59+
if vuln.get("cve"):
60+
finding.cve = vuln["cve"]
61+
62+
finding.unsaved_vulnerability_ids = _collect_vulnerability_ids(vuln)
63+
return finding
64+
65+
66+
def _collect_vulnerability_ids(vuln):
67+
"""Return a deduplicated list of CVE/GHSA-style aliases for a Xygeni SCA vulnerability."""
68+
ids = []
69+
seen = set()
70+
for value in (vuln.get("cve"), *(vuln.get("aliases") or [])):
71+
if not value:
72+
continue
73+
token = str(value)
74+
if token not in seen:
75+
seen.add(token)
76+
ids.append(token)
77+
return ids

0 commit comments

Comments
 (0)