Task 3: Compliance Infrastructure Discovery#28
Conversation
- Add HostComplianceDiscoveryService for detecting compliance tooling - Discover Python environments with pip and venv support - Detect OpenSCAP tools and SCAP content capabilities - Check privilege escalation (sudo, su, doas) availability - Discover compliance scanners (Lynis, InSpec, AIDE, etc.) - Assess filesystem capabilities (extended attributes, SELinux labels) - Identify audit tools (auditd, rsyslog, journalctl) and their status - Compile supported compliance frameworks list - Add compliance capability assessment endpoint - Create REST API endpoints for individual and bulk discovery - Include supported compliance frameworks reference endpoint - Add database fields for enhanced host compatibility tracking 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
|
| from .routes import auth, hosts, scans, content, scap_content, monitoring, users, audit, host_groups, scan_templates, webhooks, mfa | ||
| from .routes.system_settings_unified import router as system_settings_router | ||
| from .routes import credentials, api_keys, remediation_callback, integration_metrics, bulk_operations, compliance, rule_scanning, capabilities | ||
| from .routes import credentials, api_keys, remediation_callback, integration_metrics, bulk_operations, compliance, rule_scanning, capabilities, host_compliance_discovery |
Check notice
Code scanning / CodeQL
Unused import
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 10 months ago
To fix this, remove the unused capabilities import from the import statement on line 22 in backend/app/main.py. Carefully edit only the import line, preserving the rest of the imports. If any other names on the line are also unused and flagged by CodeQL, they should be removed as well, but for this task, focus only on removing capabilities. No new imports or code changes are needed. The edit should only affect the line containing the import statement.
| @@ -19,7 +19,7 @@ | ||
| from .database import engine, create_tables, get_db | ||
| from .routes import auth, hosts, scans, content, scap_content, monitoring, users, audit, host_groups, scan_templates, webhooks, mfa | ||
| from .routes.system_settings_unified import router as system_settings_router | ||
| from .routes import credentials, api_keys, remediation_callback, integration_metrics, bulk_operations, compliance, rule_scanning, capabilities, host_compliance_discovery | ||
| from .routes import credentials, api_keys, remediation_callback, integration_metrics, bulk_operations, compliance, rule_scanning, host_compliance_discovery | ||
| # Import security routes only if available | ||
| try: | ||
| from .routes import automated_fixes |
| ComplianceDiscoveryResponse containing discovered compliance information | ||
| """ | ||
| # Check permissions | ||
| check_permission(current_user, "hosts:read") |
Check failure
Code scanning / CodeQL
Wrong number of arguments in a call
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 10 months ago
To fix the error, we must ensure the call to check_permission supplies the required third argument (in addition to current_user and the permission string "hosts:read"). Typically, a per-resource permission check will require the resource in question, in this case likely the host (whose permission is being checked). In the code, host is assigned after the permission check. The best solution is to first retrieve the host object, check if it exists and raise 404 if not, then call check_permission passing current_user, the permission string, and the host object. The rest of the code can remain unchanged, preserving existing functionality.
Required changes:
- Move the permission check to be after the host is retrieved (after line 89, once
hostis assured to exist). - Update the call to
check_permissionto include thehostas the third argument.
| @@ -73,8 +73,6 @@ | ||
| Returns: | ||
| ComplianceDiscoveryResponse containing discovered compliance information | ||
| """ | ||
| # Check permissions | ||
| check_permission(current_user, "hosts:read") | ||
|
|
||
| try: | ||
| # Convert string UUID to UUID object | ||
| @@ -88,6 +86,9 @@ | ||
| detail=f"Host with ID {host_id} not found" | ||
| ) | ||
|
|
||
| # Check permissions | ||
| check_permission(current_user, "hosts:read", host) | ||
|
|
||
| # Perform compliance discovery | ||
| compliance_service = HostComplianceDiscoveryService() | ||
| discovery_results = compliance_service.discover_compliance_infrastructure(host) |
| detail=f"Invalid host ID format: {str(e)}" | ||
| ) | ||
| except Exception as e: | ||
| logger.error(f"Compliance discovery failed for host {host_id}: {str(e)}") |
Check failure
Code scanning / CodeQL
Log Injection
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 10 months ago
To fix this problem, we should sanitize the user-provided value before logging it. For log files that are plain text, we should remove any newline characters (\n, \r, \r\n) from host_id before including it in any log data, following the recommendations. The fix should only affect the log entry at line 111, not affect the functionality of the API or the returned error message. We simply sanitize host_id for logging. The most robust fix is to define a helper function (such as sanitize_for_log) to centralize the sanitization logic, ensuring consistency and maintainability. This function can be placed within this file near the top or close to where it's used.
Required changes:
- Add a small utility function to sanitize log inputs.
- In the exception handler on line 111, before using
host_idin the log message, sanitize it using the helper.
| @@ -17,6 +17,12 @@ | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
| def sanitize_for_log(value: str) -> str: | ||
| """Sanitize user-provided input before logging to prevent log injection.""" | ||
| if not isinstance(value, str): | ||
| value = str(value) | ||
| return value.replace("\r", "").replace("\n", "") | ||
|
|
||
| router = APIRouter(prefix="/host-compliance-discovery", tags=["Host Compliance Discovery"]) | ||
|
|
||
|
|
||
| @@ -108,7 +114,8 @@ | ||
| detail=f"Invalid host ID format: {str(e)}" | ||
| ) | ||
| except Exception as e: | ||
| logger.error(f"Compliance discovery failed for host {host_id}: {str(e)}") | ||
| safe_host_id = sanitize_for_log(host_id) | ||
| logger.error(f"Compliance discovery failed for host {safe_host_id}: {str(e)}") | ||
| raise HTTPException( | ||
| status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, | ||
| detail=f"Compliance discovery failed: {str(e)}" |
| BulkComplianceDiscoveryResponse with results for all hosts | ||
| """ | ||
| # Check permissions | ||
| check_permission(current_user, "hosts:read") |
Check failure
Code scanning / CodeQL
Wrong number of arguments in a call
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 10 months ago
To fix this error, we must ensure that the call to check_permission at line 134 supplies all required positional arguments. We can infer from the error that check_permission expects at least three required arguments. The first two are already being provided: current_user and the string "hosts:read". The third argument should be provided based on the intended usage, which likely relates to the resource being accessed, such as a host ID. In this context, we're checking permissions for bulk compliance discovery over a set of hosts, so it makes sense for the third argument to be request.host_ids or possibly the list of IDs being operated upon or the request object itself, depending on the definition of check_permission.
The single best fix is to add the third required argument to the check_permission call, choosing the most logical value based on context. Since the operation is bulk read of hosts, passing the request.host_ids (which is a list of host IDs) is likely correct. Thus, change line 134 to: check_permission(current_user, "hosts:read", request.host_ids). This addresses the required argument count and aligns with the apparent usage.
Only line 134 needs changing. No import, method, or definition updates are required, as the intended third argument exists as request.host_ids.
| @@ -131,7 +131,7 @@ | ||
| BulkComplianceDiscoveryResponse with results for all hosts | ||
| """ | ||
| # Check permissions | ||
| check_permission(current_user, "hosts:read") | ||
| check_permission(current_user, "hosts:read", request.host_ids) | ||
|
|
||
| if not request.host_ids: | ||
| raise HTTPException( |
| errors[host_id] = f"Invalid host ID format: {str(e)}" | ||
| failed_discoveries += 1 | ||
| except Exception as e: | ||
| logger.error(f"Compliance discovery failed for host {host_id}: {str(e)}") |
Check failure
Code scanning / CodeQL
Log Injection
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 10 months ago
To mitigate log injection, any user-controlled value written to logs must be sanitized. The best practice for plain-text logs is to strip or replace linebreaks (\r, \n) from the input, preventing a malicious user from creating log entries with forged lines or confusing formatting. In this case, before logging host_id (originating from user input), we should sanitize it by replacing or removing any line breaks. We'll do this immediately before the logging call on line 186. We should also mark user input clearly in the log entry so it's distinguishable from legitimate log structure.
Required elements:
- On line 186, before logging, create a sanitized version of
host_idby replacing/removing any linebreaks (\r\n,\n, and\r). - Use the sanitized
host_idin all log output. - No external packages are required; use Python's string replace method(s).
- Only edit the block directly surrounding line 186.
| @@ -183,9 +183,9 @@ | ||
| errors[host_id] = f"Invalid host ID format: {str(e)}" | ||
| failed_discoveries += 1 | ||
| except Exception as e: | ||
| logger.error(f"Compliance discovery failed for host {host_id}: {str(e)}") | ||
| sanitized_host_id = host_id.replace('\r\n', '').replace('\n', '').replace('\r', '') | ||
| logger.error(f"Compliance discovery failed for host [user input: {sanitized_host_id}]: {str(e)}") | ||
| errors[host_id] = f"Compliance discovery failed: {str(e)}" | ||
| failed_discoveries += 1 | ||
|
|
||
| logger.info(f"Bulk compliance discovery completed: {successful_discoveries} successful, " | ||
| f"{failed_discoveries} failed out of {len(request.host_ids)} total hosts") |
| List of supported compliance frameworks with descriptions | ||
| """ | ||
| # Check permissions | ||
| check_permission(current_user, "hosts:read") |
Check failure
Code scanning / CodeQL
Wrong number of arguments in a call
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 10 months ago
To fix the problem, we need to update the call to check_permission at line 265 in backend/app/routes/host_compliance_discovery.py to pass the required third argument. Since only current_user and a permission string are given, and the third required argument is not shown, its expected value must be inferred. In RBAC contexts, the missing argument is often a request object, resource, or database session. Nearby endpoints in FastAPI typically receive a database session (db: Session = Depends(get_db)), but this handler does not declare one. The safest and most minimal change is to add a db: Session = Depends(get_db) parameter to get_supported_compliance_frameworks, then pass it as the third argument to check_permission. This mirrors conventional patterns elsewhere, doesn't change business logic, and satisfies the argument requirements.
Changes required:
- Update the function signature to accept
db: Session = Depends(get_db) - Pass
dbas the third argument tocheck_permission.
No new imports are needed, as get_db and Session are already imported.
| @@ -253,7 +253,8 @@ | ||
|
|
||
| @router.get("/compliance-frameworks") | ||
| async def get_supported_compliance_frameworks( | ||
| current_user=Depends(get_current_user) | ||
| current_user=Depends(get_current_user), | ||
| db: Session = Depends(get_db) | ||
| ): | ||
| """ | ||
| Get list of compliance frameworks that can be discovered and supported | ||
| @@ -262,7 +263,7 @@ | ||
| List of supported compliance frameworks with descriptions | ||
| """ | ||
| # Check permissions | ||
| check_permission(current_user, "hosts:read") | ||
| check_permission(current_user, "hosts:read", db) | ||
|
|
||
| frameworks = { | ||
| "NIST 800-53": { |
| if version_output and version_output['success']: | ||
| version_text = version_output['stdout'].strip() | ||
| # Extract version number | ||
| version_match = re.search(r'(\\d+\\.\\d+[\\.\\d]*)', version_text) |
Check warning
Code scanning / CodeQL
Duplication in regular expression character class
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 10 months ago
To fix the issue, the duplicate backslash (\\) in the character class [\\.\\d] should be removed. The intended match seems to be either a period (.) or a digit (\d), i.e., [.\d]. In the code, this pattern is part of the regular expression at line 179 of backend/app/services/host_compliance_discovery.py: re.search(r'(\\d+\\.\\d+[\\.\\d]*)', version_text). Only this line needs changing. The fix is simply to change [\\.\\d]* to [.\d]*, i.e., the regular expression should now read r'(\\d+\\.\\d+[.\d]*)' (escaped appropriately for Python raw strings).
No imports or other code changes are required.
| @@ -176,7 +176,7 @@ | ||
| if version_output and version_output['success']: | ||
| version_text = version_output['stdout'].strip() | ||
| # Extract version number | ||
| version_match = re.search(r'(\\d+\\.\\d+[\\.\\d]*)', version_text) | ||
| version_match = re.search(r'(\\d+\\.\\d+[.\d]*)', version_text) | ||
| if version_match: | ||
| version = version_match.group(1) | ||
|
|
| sudo_version = 'Unknown' | ||
| if sudo_version_output and sudo_version_output['success']: | ||
| version_text = sudo_version_output['stdout'].strip() | ||
| version_match = re.search(r'version (\\d+\\.\\d+[\\.\\d]*)', version_text) |
Check warning
Code scanning / CodeQL
Duplication in regular expression character class
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 10 months ago
To fix this problem, remove the duplicate or unnecessary characters from the character class. The goal of the regular expression on line 255 is to extract the version number following "version " in the output string. The pattern is currently: 'version (\\d+\\.\\d+[\\.\\d]*)'. The problematic part is [\\.\\d]* - inside a character class, \. matches a literal dot, and \d matches a digit. However, by double-escaping in Python strings, this becomes '[\\.\\d]*' as the raw regex, which would actually match a backslash, a dot, or a 'd' (since Python does not process escapes in raw strings inside character classes the same way). The correct way is to match zero or more dots or digits; so [.\d]* will match any dots or digits.
The direct fix is to:
- Replace
[\\.\\d]*with[.\d]*inside the regex string. - This makes the pattern:
r'version (\d+\.\d+[.\d]*)' - This works in a raw string and will behave as expected.
Change only line 255 in backend/app/services/host_compliance_discovery.py accordingly.
No imports or additional dependencies are required.
| @@ -252,7 +252,7 @@ | ||
| sudo_version = 'Unknown' | ||
| if sudo_version_output and sudo_version_output['success']: | ||
| version_text = sudo_version_output['stdout'].strip() | ||
| version_match = re.search(r'version (\\d+\\.\\d+[\\.\\d]*)', version_text) | ||
| version_match = re.search(r'version (\d+\.\d+[.\d]*)', version_text) | ||
| if version_match: | ||
| sudo_version = version_match.group(1) | ||
|
|
| if version_output and version_output['success']: | ||
| version_text = version_output['stdout'].strip() | ||
| # Extract version number from output | ||
| version_match = re.search(r'(\\d+\\.\\d+[\\.\\d]*)', version_text) |
Check warning
Code scanning / CodeQL
Duplication in regular expression character class
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 10 months ago
The best way to fix this problem is to replace the faulty character class [\\.\\d] with a pattern that correctly matches version numbers in the typical format "1.2.3", namely using \d+(\.\d+)*. On line 330, change the regular expression from r'(\\d+\\.\\d+[\\.\\d]*)' to r'(\d+\.\d+(\.\d+)*)'. This will correctly match one or more digits, a dot, one or more digits, and then any number of additional dot-number segments—all commonly seen in version strings. This change will be made in backend/app/services/host_compliance_discovery.py, in the _discover_compliance_scanners method, replacing line 330.
| @@ -327,7 +327,7 @@ | ||
| if version_output and version_output['success']: | ||
| version_text = version_output['stdout'].strip() | ||
| # Extract version number from output | ||
| version_match = re.search(r'(\\d+\\.\\d+[\\.\\d]*)', version_text) | ||
| version_match = re.search(r'(\d+\.\d+(\.\d+)*)', version_text) | ||
| if version_match: | ||
| version = version_match.group(1) | ||
|
|
| version = 'Unknown' | ||
| if version_output and version_output['success']: | ||
| version_text = version_output['stdout'].strip() | ||
| version_match = re.search(r'(\\d+\\.\\d+[\\.\\d]*)', version_text) |
Check warning
Code scanning / CodeQL
Duplication in regular expression character class
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 10 months ago
To fix this, we need to rewrite the regular expression to match the intended version pattern without unnecessary duplication or mistakes in the character class. The likely wanted pattern is to match semantic version numbers, such as 1.11.3, 2.0, possibly with more sub-parts. This is often expressed as a group of digits, followed by one or more instances of a period and more digits.
The current pattern, (\d+\.\d+[\.\d]*), attempts to match a digit sequence, a dot, a digit sequence, followed by zero or more occurrences of a character that is either a dot, a digit, or an asterisk (which isn't standard in versions).
A more precise pattern is (\d+\.\d+(?:\.\d+)*), which matches e.g. 1.2, 1.2.3, 1.2.3.4, etc.
Apply this new regex in place of the old one on line 434.
| @@ -431,7 +431,7 @@ | ||
| version = 'Unknown' | ||
| if version_output and version_output['success']: | ||
| version_text = version_output['stdout'].strip() | ||
| version_match = re.search(r'(\\d+\\.\\d+[\\.\\d]*)', version_text) | ||
| version_match = re.search(r'(\d+\.\d+(?:\.\d+)*)', version_text) | ||
| if version_match: | ||
| version = version_match.group(1) | ||
|
|




Summary
Features
API Endpoints
POST /api/host-compliance-discovery/hosts/{host_id}/compliance-discovery- Full compliance discoveryPOST /api/host-compliance-discovery/bulk-compliance-discovery- Bulk discovery (max 20 hosts)GET /api/host-compliance-discovery/hosts/{host_id}/compliance-assessment- Capability assessmentGET /api/host-compliance-discovery/compliance-frameworks- Supported frameworks referenceCompliance Frameworks Supported
Assessment Features
Test Plan
🤖 Generated with Claude Code