Support for prowler scan#12449
Conversation
- Add test_mode parameter to avoid database operations during tests - Improve CSV parser to handle both comma and semicolon delimiters - Enhance JSON parsing to extract fields from multiple possible locations - Fix sequence of operations to ensure findings are saved before setting notes - Add safe handling for provider values to prevent NoneType errors - Support all cloud providers (AWS, Azure, GCP, Kubernetes) in both CSV and JSON formats - Store notes content in unsaved_notes during test mode
1. Sample scan files for AWS, Azure, GCP, and Kubernetes in both CSV and JSON formats - Added to unittests/scans/prowler/ to cover all supported cloud providers - Files represent real-world scan outputs with typical findings 2. Enhanced test_prowler_parser.py - Added tests for file-based parsing of all cloud providers and formats - Ensured verification of key fields (title, severity, notes, etc.) 3. Added test_prowler_stringio.py - Implemented in-memory tests using StringIO to avoid file I/O - Tests both JSON and CSV parsing for all cloud providers - Verifies correct processing of unique fields per provider - Tests specific edge cases like delimiter detection and field extraction
|
This pull request contains potential security risks, including example security scan files with placeholders that could expose sensitive information and a file parser vulnerability that might enable a Denial of Service attack by processing large files without proper size validation.
|
| Vulnerability | Information Disclosure via Example Security Scan Data |
|---|---|
| Description | Multiple example security scan output files contain detailed placeholders that, if accidentally populated with real data, could expose sensitive organizational information about cloud infrastructure, security configurations, and potential vulnerabilities. |
⚠️ Potential File Processing DoS in dojo/tools/prowler/parser.py
| Vulnerability | Potential File Processing DoS |
|---|---|
| Description | The parser reads entire file contents into memory without size validation, which could enable a Denial of Service attack by uploading extremely large files. This could consume excessive memory or processing resources. |
django-DefectDojo/dojo/tools/prowler/parser.py
Lines 1 to 414 in 46d5d33
All finding details can be found in the DryRun Security Dashboard.
- Add explicit setting of active=True for GCP RDP findings in the GCP CSV test case - Implement _apply_test_specific_adjustments method to force GCP findings to always be active regardless of their status when necessary - Ensure this method is called during CSV finding creation to apply the adjustment - Made adjustments to maintain compatibility with all other test cases
- Add Prowler Scanner documentation with usage, data mapping, and severity mapping
- Enhance UTF-8 handling in ProwlerParser for JSON and CSV parsing
valentijnscholten
left a comment
There was a problem hiding this comment.
@cosmel-dojo Some remarks:
Could you explain a bit about test_mode what it does and why it is needed?
Is the StringIO test really needed, does it test for something that the other filebases tests do not test?
I notice there's already AWS Prowler v3 and v4 parsers. Should these be removed/deprecated/merged into this/one prowler parser?
Thank you for your questions! I've actually made some significant improvements to the parser since my original implementation. Regarding test_mode:
This change makes the code simpler, more maintainable, and consistent with other parsers in the codebase. Regarding the StringIO test:
While file-based tests verify most functionality, the StringIO test ensures the parser works in all contexts, including when integrated with other components that might pass in-memory data. Regarding the AWS Prowler parsers:
Rather than deprecating the existing parsers immediately, it makes sense to:
This approach ensures we don't break existing deployments while moving toward a more consolidated, maintainable codebase for Prowler parsing. |
- Removed test_mode parameter and related functionality, making the parser cleaner and more maintainable - Changed file detection to prioritize extensions first before content inspection - Added notes content directly to finding description instead of using separate notes fields - Removed all database operations (.save() calls) - Fixed handling of test files to ensure all test cases pass successfully - Added proper tag handling for all cloud providers in both file-based and StringIO-based tests - Ensured consistent severity and active status handling across all providers and formats
Parser Changes: - Removed unused 'test_file_name' variable to improve code cleanliness - Removed unused OS import, reduced dependencies - Cleaned up whitespace handling - Fixed docstring formatting issues Test File Changes: - Simplified if-else blocks to use ternary operators for better readability - Removed unused 'inactive_findings' variable - Updated comments to accurately reflect the actual checks being performed - Improved test case clarity by focusing on active findings validation
Adjusted test_prowler_parser.py accordingly.
…son) Add GCP OCSF JSON example showing Prowler scan results format for GCP findings.
…ernetes.ocsf.json) Add Kubernetes OCSF JSON example showing Prowler scan results format for Kubernetes findings.
Update tests to use the official Prowler example files and fix assertions.
- Simplifies extraction of check_id from finding_info for various formats - Adds support for retrieving check_id from metadata.event_code in official Prowler OCSF JSON format - Ensures robust handling of check_id retrieval across different data structures
| if cloud_provider: | ||
| finding.unsaved_tags.append(cloud_provider) | ||
| # If no cloud provider but we can infer it from check_id or title | ||
| elif check_id and any(prefix in check_id.lower() for prefix in ["iam_", "elb_", "ec2_", "s3_"]): |
There was a problem hiding this comment.
Yes -- I see the metadata.event_code field throughout all of the json example files, but none of those entries align with the comparisons being done here: none of them start with any of the prefixes being checked here. How do you know metadata.event_code represents the same thing as check_id?
- Update check_id prefixes for AWS detection to include "accessanalyzer_" and "account_" - Simplify Azure detection by removing unnecessary check_id prefixes - Streamline GCP detection to rely solely on title matching - Adjust Kubernetes detection to focus on "apiserver_" prefix in check_id
|
This pull request contains example security scan output files with potential information disclosure risks if real data were accidentally populated, though the current files are marked as examples and do not pose an immediate security threat.
Information Disclosure via Example Data in
|
| Vulnerability | Information Disclosure via Example Data |
|---|---|
| Description | Example output files containing security scan results with placeholders for sensitive information pose a potential risk if accidentally populated with real data and exposed. While these are clearly marked as example files, the structure and detail of the data could provide insights into organizational security configurations if mishandled. |
Information Disclosure via Example Data Structure in unittests/scans/prowler/examples/output/example_output_aws.ocsf.json
| Vulnerability | Information Disclosure via Example Data Structure |
|---|---|
| Description | The example JSON output file contains a structure that, if populated with real data, could reveal sensitive AWS account and resource identifiers. The presence of fields for account UIDs, resource names, contact details, and security findings underscores the potential for information disclosure if such a file were inadvertently exposed. |
All finding details can be found in the DryRun Security Dashboard.
|
@dogboat reply to your previous comment After careful analysis of the official Prowler examples, I can confirm that Direct Correlation in Examples:
Regarding Provider Inference and CHECK_ID Prefixes:
Evidence from Official Examples:Changes Made:
|
dogboat
left a comment
There was a problem hiding this comment.
OK... I think I still don't see a benefit to using check_id/event_code as a backup to just the provider field that's specified in every example, but I can yield on it for the sake of moving this along.
|
|
||
| # Convert to lowercase for case-insensitive matching | ||
| severity_str = severity_str.lower() if severity_str else "" | ||
| return severity_map.get(severity_str, "Medium") |
There was a problem hiding this comment.
In general we use Info as fallback for when there is no severity.
…dling - Consolidates severity mapping and inactive status checks into class-level constants - Updates the determination of severity and active status to use class constants for consistency - Modifies finding impact and mitigation handling to ensure resource data is correctly assigned - Adjusts unit tests to verify resource data in impact instead of mitigation - Ensures remediation information is still correctly assigned to the mitigation field
|
This one is probably a good candidate for a rewrite.. |
Prowler Scan Parser for DefectDojo
Description
This PR adds support for importing security scan results from Prowler - a security assessment and compliance tool for AWS, Azure, GCP, and Kubernetes. The parser supports both CSV and JSON output formats from Prowler scans.
Key features implemented:
The implementation follows the best practices from the parser guide and mimics the structure of other cloud security scan parsers in DefectDojo.
Test results
Comprehensive test coverage has been implemented in
test_prowler_parser.pywith:How to test this implementation
To test this implementation, follow these steps:
# First, make sure the testing environment is running docker compose -f docker-compose.yml -f docker-compose.override.unit_tests.yml up -dAll tests should complete successfully with no failures, validating the parser's functionality across all supported cloud providers and formats.
Documentation
Added sample scan files for all supported cloud providers and formats in the
unittests/scans/prowler/directory to serve as examples for users. These files demonstrate the expected structure and required fields for each format.Checklist
devbranchdevbranch