Skip to content

Commit 3867313

Browse files
committed
fix: use trailing-quote heuristic for multi-line record boundary detection
Replace field-count-based record boundary detection in the Qualys VMDR nonstandard CSV parser with a trailing-quote heuristic. The old approach re-parsed accumulated rows each iteration and failed on malformed quote patterns (e.g. #table cols=""3"") that produce incorrect field counts. The new _is_record_end_line() helper counts trailing quotes: exactly 3 means record end, 4+ means record end only if preceded by a comma (empty field). This is O(1) per line and correctly handles all known Qualys export patterns. Also fixes pre-existing ruff lint issues in the state machine parser. Authored by T. Walker - DefectDojo
1 parent 1fbd35d commit 3867313

10 files changed

Lines changed: 278 additions & 500 deletions

File tree

docs/content/supported_tools/parsers/file/qualys_vmdr.md

Lines changed: 54 additions & 108 deletions
Original file line numberDiff line numberDiff line change
@@ -21,13 +21,13 @@ To generate these files from Qualys VMDR:
2121
5. Choose either QID-centric or CVE-centric export option
2222
6. Upload the downloaded CSV file to DefectDojo
2323

24-
## Default Deduplication Hashcode Fields
24+
## Default Deduplication
2525

26-
By default, DefectDojo identifies duplicate Findings using these [hashcode fields](https://docs.defectdojo.com/en/working_with_findings/finding_deduplication/about_deduplication/):
26+
The parser uses `DEDUPE_ALGO_UNIQUE_ID_FROM_TOOL_OR_HASH_CODE`, which tries `unique_id_from_tool` (populated with the Qualys QID) first and falls back to hashcode deduplication.
2727

28-
- title
29-
- severity
30-
- unique_id_from_tool (QID)
28+
**Hashcode fields:** `title`, `component_name`, `vuln_id_from_tool`
29+
30+
For more information, see [About Deduplication](https://docs.defectdojo.com/en/working_with_findings/finding_deduplication/about_deduplication/).
3131

3232
### Sample Scan Data
3333

@@ -40,102 +40,77 @@ Sample Qualys VMDR scans can be found in the [sample scan data folder](https://g
4040

4141
## QID Format (Primary Export)
4242

43-
### Total Fields in QID CSV
44-
45-
- Total data fields: 41
46-
- Total data fields parsed: 14
47-
- Total data fields NOT parsed: 27
48-
49-
### QID Format Field Mapping Details
43+
### QID Format Field Mapping
5044

5145
<details>
5246
<summary>Click to expand Field Mapping Table</summary>
5347

54-
| Source Field | DefectDojo Field | Parser File | Notes |
55-
| ------------ | ---------------- | ----------- | ----- |
56-
| Title | title | qid_parser.py | Truncated to 150 characters |
57-
| Severity | severity | qid_parser.py | Mapped: 1=Info, 2=Low, 3=Medium, 4=High, 5=Critical |
58-
| Severity | severity_justification | qid_parser.py | Preserved as "Qualys Severity: X" |
59-
| QID | unique_id_from_tool | qid_parser.py | Qualys vulnerability identifier for deduplication |
60-
| First Detected | date | qid_parser.py | Parsed to date object |
61-
| Status | active | qid_parser.py | True if "ACTIVE", False otherwise |
62-
| Solution | mitigation | qid_parser.py | Remediation guidance |
63-
| Threat | impact | qid_parser.py | Threat description |
64-
| Asset Name | component_name | qid_parser.py | Asset/server name |
65-
| Category | service | qid_parser.py | Vulnerability category |
66-
| Asset IPV4 | unsaved_endpoints | qid_parser.py | Multiple endpoints if comma-separated |
67-
| Asset IPV6 | unsaved_endpoints | qid_parser.py | Fallback if no IPv4 |
68-
| Asset Tags | unsaved_tags | qid_parser.py | Split on comma |
69-
| Results | description | qid_parser.py | Included in description |
48+
| Source Field | DefectDojo Field | Notes |
49+
| ------------ | ---------------- | ----- |
50+
| Title | title | Truncated to 500 characters |
51+
| Severity | severity | Mapped: 1=Info, 2=Low, 3=Medium, 4=High, 5=Critical |
52+
| Severity | severity_justification | Preserved as "Qualys Severity: X" |
53+
| QID | unique_id_from_tool | Native Qualys vulnerability identifier |
54+
| QID | vuln_id_from_tool | Also used as vulnerability ID |
55+
| First Detected | date | Parsed to date object |
56+
| Status | active | True if "ACTIVE", False otherwise |
57+
| Solution | mitigation | Remediation guidance |
58+
| Threat | impact | Threat description |
59+
| Asset Name | component_name | Asset/server name |
60+
| Category | service | Vulnerability category |
61+
| Asset IPV4 | unsaved_endpoints | Multiple endpoints if comma-separated |
62+
| Asset IPV6 | unsaved_endpoints | Fallback if no IPv4 |
63+
| Asset Tags | unsaved_tags | Split on comma |
64+
| Results | description | Included in structured description |
7065

7166
</details>
7267

73-
### Additional Finding Field Settings (QID Format)
74-
75-
<details>
76-
<summary>Click to expand Additional Settings Table</summary>
77-
78-
| Finding Field | Default Value | Parser File | Notes |
79-
|---------------|---------------|-------------|-------|
80-
| static_finding | True | qid_parser.py | Vulnerability scan data |
81-
| dynamic_finding | False | qid_parser.py | Not dynamic testing |
68+
### Additional Finding Settings (QID Format)
8269

83-
</details>
70+
| Finding Field | Default Value | Notes |
71+
|---------------|---------------|-------|
72+
| static_finding | True | Vulnerability scan data |
73+
| dynamic_finding | False | Not dynamic testing |
8474

8575
## CVE Format (Extended Export)
8676

87-
### Total Fields in CVE CSV
88-
89-
- Total data fields: 41
90-
- Total data fields parsed: 17
91-
- Total data fields NOT parsed: 24
92-
93-
### CVE Format Field Mapping Details
77+
### CVE Format Field Mapping
9478

9579
<details>
9680
<summary>Click to expand Field Mapping Table</summary>
9781

98-
| Source Field | DefectDojo Field | Parser File | Notes |
99-
| ------------ | ---------------- | ----------- | ----- |
100-
| CVE | vuln_id_from_tool | cve_parser.py | CVE identifier (e.g., CVE-2021-44228) |
101-
| CVE | unsaved_vulnerability_ids | cve_parser.py | Also added to Vulnerability IDs for CVE tracking |
102-
| CVE-Description | description | cve_parser.py | Prepended to description |
103-
| CVSSv3.1 Base (nvd) | cvssv3_score | cve_parser.py | Numeric CVSS score |
104-
| Title | title | cve_parser.py | Truncated to 150 characters |
105-
| Severity | severity | cve_parser.py | Mapped: 1=Info, 2=Low, 3=Medium, 4=High, 5=Critical |
106-
| Severity | severity_justification | cve_parser.py | Preserved as "Qualys Severity: X" |
107-
| QID | unique_id_from_tool | cve_parser.py | Qualys vulnerability identifier for deduplication |
108-
| First Detected | date | cve_parser.py | Parsed to date object |
109-
| Status | active | cve_parser.py | True if "ACTIVE", False otherwise |
110-
| Solution | mitigation | cve_parser.py | Remediation guidance |
111-
| Threat | impact | cve_parser.py | Threat description |
112-
| Asset Name | component_name | cve_parser.py | Asset/server name |
113-
| Category | service | cve_parser.py | Vulnerability category |
114-
| Asset IPV4 | unsaved_endpoints | cve_parser.py | Multiple endpoints if comma-separated |
115-
| Asset IPV6 | unsaved_endpoints | cve_parser.py | Fallback if no IPv4 |
116-
| Asset Tags | unsaved_tags | cve_parser.py | Split on comma |
117-
| Results | description | cve_parser.py | Included in description |
82+
| Source Field | DefectDojo Field | Notes |
83+
| ------------ | ---------------- | ----- |
84+
| CVE | vuln_id_from_tool | CVE identifier (e.g., CVE-2021-44228) |
85+
| CVE | unsaved_vulnerability_ids | Also added for CVE tracking |
86+
| CVE-Description | description | Prepended to structured description |
87+
| CVSSv3.1 Base (nvd) | cvssv3_score | Numeric CVSS score |
88+
| Title | title | Truncated to 500 characters |
89+
| Severity | severity | Mapped: 1=Info, 2=Low, 3=Medium, 4=High, 5=Critical |
90+
| Severity | severity_justification | Preserved as "Qualys Severity: X" |
91+
| QID | unique_id_from_tool | Native Qualys vulnerability identifier |
92+
| First Detected | date | Parsed to date object |
93+
| Status | active | True if "ACTIVE", False otherwise |
94+
| Solution | mitigation | Remediation guidance |
95+
| Threat | impact | Threat description |
96+
| Asset Name | component_name | Asset/server name |
97+
| Category | service | Vulnerability category |
98+
| Asset IPV4 | unsaved_endpoints | Multiple endpoints if comma-separated |
99+
| Asset IPV6 | unsaved_endpoints | Fallback if no IPv4 |
100+
| Asset Tags | unsaved_tags | Split on comma |
101+
| Results | description | Included in structured description |
118102

119103
</details>
120104

121-
### Additional Finding Field Settings (CVE Format)
122-
123-
<details>
124-
<summary>Click to expand Additional Settings Table</summary>
125-
126-
| Finding Field | Default Value | Parser File | Notes |
127-
|---------------|---------------|-------------|-------|
128-
| static_finding | True | cve_parser.py | Vulnerability scan data |
129-
| dynamic_finding | False | cve_parser.py | Not dynamic testing |
105+
### Additional Finding Settings (CVE Format)
130106

131-
</details>
107+
| Finding Field | Default Value | Notes |
108+
|---------------|---------------|-------|
109+
| static_finding | True | Vulnerability scan data |
110+
| dynamic_finding | False | Not dynamic testing |
132111

133112
## Special Processing Notes
134113

135-
### Date Processing
136-
137-
The parser uses dateutil.parser to handle Qualys date formats (e.g., "Feb 03, 2026 07:00 AM"). The First Detected field is used for the finding date.
138-
139114
### Severity Conversion
140115

141116
Qualys severity levels (1-5 numeric scale) are converted to DefectDojo severity levels:
@@ -147,40 +122,11 @@ Qualys severity levels (1-5 numeric scale) are converted to DefectDojo severity
147122

148123
The original Qualys severity is preserved in the severity_justification field as "Qualys Severity: X".
149124

150-
### Description Construction
151-
152-
The parser combines multiple fields to create a comprehensive markdown description:
153-
- Title
154-
- QID
155-
- Category
156-
- Threat
157-
- RTI (Real-Time Intelligence)
158-
- Operating System
159-
- Results
160-
- Last Detected
161-
162-
For CVE format, the description also includes:
163-
- CVE identifier
164-
- CVE Description from NVD
165-
166-
### Title Format
167-
168-
Finding titles use the vulnerability name directly from the Title field, truncated to 150 characters with "..." suffix if longer.
169-
170125
### Endpoint Handling
171126

172127
The parser creates Endpoint objects from IP addresses:
173128
- Multiple IPv4 addresses (comma-separated) create multiple endpoints
174129
- Falls back to IPv6 if no IPv4 address is present
175-
- Each endpoint represents an affected asset
176-
177-
### Deduplication
178-
179-
DefectDojo uses the `unique_id_from_tool` field populated with the Qualys QID for deduplication. This ensures the same vulnerability type is deduplicated within an asset's scope.
180-
181-
### Tags Handling
182-
183-
Asset Tags are extracted and split by commas. Each tag is added to the finding's unsaved_tags list for categorization and filtering in DefectDojo.
184130

185131
### Format Detection
186132

dojo/settings/settings.dist.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1513,6 +1513,7 @@ def saml2_attrib_map_format(din):
15131513
"n0s1 Scanner": ["description"],
15141514
"IriusRisk Threats Scan": ["title", "component_name"],
15151515
"Orca Security Alerts": ["title", "component_name"],
1516+
"Qualys VMDR": ["title", "component_name", "vuln_id_from_tool"],
15161517
}
15171518

15181519
# Override the hardcoded settings here via the env var
@@ -1782,6 +1783,7 @@ def saml2_attrib_map_format(din):
17821783
"OpenReports": DEDUPE_ALGO_HASH_CODE,
17831784
"IriusRisk Threats Scan": DEDUPE_ALGO_HASH_CODE,
17841785
"Orca Security Alerts": DEDUPE_ALGO_HASH_CODE,
1786+
"Qualys VMDR": DEDUPE_ALGO_UNIQUE_ID_FROM_TOOL_OR_HASH_CODE,
17851787
}
17861788

17871789
# Override the hardcoded settings here via the env var
Lines changed: 10 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,74 +1,47 @@
1-
"""
2-
CVE format parser for Qualys VMDR exports.
3-
4-
This module handles the CVE-centric CSV export format where findings
5-
include CVE identifiers and CVSS scores from NVD.
6-
"""
7-
81
from dojo.models import Finding
92
from dojo.tools.qualys_vmdr.helpers import (
103
build_description_cve,
114
build_severity_justification,
5+
is_qualys_null,
126
map_qualys_severity,
137
parse_cvss_score,
148
parse_endpoints,
159
parse_qualys_csv_content,
1610
parse_qualys_date,
1711
parse_tags,
12+
strip_html,
1813
truncate_title,
1914
)
2015

2116

2217
class QualysVMDRCVEParser:
2318

24-
"""Parse Qualys VMDR CVE format exports."""
25-
2619
def parse(self, content):
27-
"""
28-
Parse CVE format CSV content and return findings.
29-
30-
Args:
31-
content: String containing the full CSV content
32-
33-
Returns:
34-
list[Finding]: List of DefectDojo Finding objects
35-
36-
"""
3720
findings = []
38-
39-
rows = parse_qualys_csv_content(content, skip_metadata_lines=3)
40-
21+
rows = parse_qualys_csv_content(content)
4122
for row in rows:
4223
finding = self._create_finding(row)
4324
if finding:
4425
findings.append(finding)
45-
4626
return findings
4727

4828
def _create_finding(self, row):
49-
"""
50-
Create a Finding object from a CSV row.
51-
52-
Args:
53-
row: Dictionary containing CSV row data
54-
55-
Returns:
56-
Finding: DefectDojo Finding object
57-
58-
"""
5929
title = truncate_title(row.get("Title", ""))
6030
severity = map_qualys_severity(row.get("Severity"))
6131
severity_justification = build_severity_justification(row.get("Severity"))
6232

33+
cve = row.get("CVE", "")
34+
qid = row.get("QID", "")
35+
6336
finding = Finding(
6437
title=title,
6538
severity=severity,
6639
severity_justification=severity_justification,
6740
description=build_description_cve(row),
6841
mitigation=row.get("Solution", ""),
69-
impact=row.get("Threat", ""),
70-
unique_id_from_tool=row.get("QID", ""),
71-
vuln_id_from_tool=row.get("CVE", ""),
42+
impact=strip_html(row.get("Threat", "")),
43+
unique_id_from_tool="" if is_qualys_null(qid) else qid,
44+
vuln_id_from_tool="" if is_qualys_null(cve) else cve,
7245
date=parse_qualys_date(row.get("First Detected")),
7346
active=(row.get("Status", "").upper() == "ACTIVE"),
7447
component_name=row.get("Asset Name", ""),
@@ -87,9 +60,7 @@ def _create_finding(self, row):
8760
)
8861
finding.unsaved_tags = parse_tags(row.get("Asset Tags", ""))
8962

90-
# Add CVE to vulnerability_ids for proper CVE tracking
91-
cve = row.get("CVE", "")
92-
if cve:
63+
if not is_qualys_null(cve):
9364
finding.unsaved_vulnerability_ids = [cve]
9465

9566
return finding

0 commit comments

Comments
 (0)