Task 3: Compliance Infrastructure Discovery by remyluslosius · Pull Request #28 · Hanalyx/OpenWatch

remyluslosius · 2025-09-02T18:47:43Z

Summary

Implements comprehensive compliance infrastructure discovery for hosts
Detects Python environments, OpenSCAP tools, privilege escalation, audit tools
Provides compliance capability assessment and framework recommendations

Features

Python Environment Detection: Discovers Python 2.x/3.x installations with pip and venv support
OpenSCAP Tools Discovery: Identifies oscap, scap-workbench, and SCAP content directories
Privilege Escalation Assessment: Checks sudo, su, and doas availability and configuration
Compliance Scanners: Detects Lynis, InSpec, AIDE, Tripwire, ClamAV and other security tools
Filesystem Capabilities: Assesses extended attributes, SELinux labels, file capabilities
Audit Tools: Identifies auditd, rsyslog, journalctl and their operational status
Framework Support: Compiles list of supported compliance frameworks (NIST, STIG, CIS, etc.)

API Endpoints

POST /api/host-compliance-discovery/hosts/{host_id}/compliance-discovery - Full compliance discovery
POST /api/host-compliance-discovery/bulk-compliance-discovery - Bulk discovery (max 20 hosts)
GET /api/host-compliance-discovery/hosts/{host_id}/compliance-assessment - Capability assessment
GET /api/host-compliance-discovery/compliance-frameworks - Supported frameworks reference

Compliance Frameworks Supported

Federal: NIST 800-53, FISMA, DISA STIG
Industry: CIS Controls, PCI DSS
Regulatory: HIPAA, SOX, GDPR
Security: File Integrity Monitoring, System Hardening

Assessment Features

Overall compliance readiness scoring (0.0-1.0)
SCAP capability assessment (full/limited/none)
Python environment evaluation
Privilege escalation availability
Audit capability assessment
Missing tools identification
Framework recommendations based on discovered tools

Test Plan

Test individual host compliance discovery
Verify bulk discovery operations with rate limiting
Test compliance capability assessment accuracy
Validate framework recommendations logic
Confirm OpenSCAP tool detection across distributions
Test Python environment discovery for multiple versions

🤖 Generated with Claude Code

- Add HostComplianceDiscoveryService for detecting compliance tooling - Discover Python environments with pip and venv support - Detect OpenSCAP tools and SCAP content capabilities - Check privilege escalation (sudo, su, doas) availability - Discover compliance scanners (Lynis, InSpec, AIDE, etc.) - Assess filesystem capabilities (extended attributes, SELinux labels) - Identify audit tools (auditd, rsyslog, journalctl) and their status - Compile supported compliance frameworks list - Add compliance capability assessment endpoint - Create REST API endpoints for individual and bulk discovery - Include supported compliance frameworks reference endpoint - Add database fields for enhanced host compatibility tracking 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

sonarqubecloud · 2025-09-02T18:48:29Z

Quality Gate failed

Failed conditions
E Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

 from .routes import auth, hosts, scans, content, scap_content, monitoring, users, audit, host_groups, scan_templates, webhooks, mfa
 from .routes.system_settings_unified import router as system_settings_router
-from .routes import credentials, api_keys, remediation_callback, integration_metrics, bulk_operations, compliance, rule_scanning, capabilities
+from .routes import credentials, api_keys, remediation_callback, integration_metrics, bulk_operations, compliance, rule_scanning, capabilities, host_compliance_discovery


To fix this, remove the unused capabilities import from the import statement on line 22 in backend/app/main.py. Carefully edit only the import line, preserving the rest of the imports. If any other names on the line are also unused and flagged by CodeQL, they should be removed as well, but for this task, focus only on removing capabilities. No new imports or code changes are needed. The edit should only affect the line containing the import statement.

+        ComplianceDiscoveryResponse containing discovered compliance information
+    """
+    # Check permissions
+    check_permission(current_user, "hosts:read")


To fix the error, we must ensure the call to check_permission supplies the required third argument (in addition to current_user and the permission string "hosts:read"). Typically, a per-resource permission check will require the resource in question, in this case likely the host (whose permission is being checked). In the code, host is assigned after the permission check. The best solution is to first retrieve the host object, check if it exists and raise 404 if not, then call check_permission passing current_user, the permission string, and the host object. The rest of the code can remain unchanged, preserving existing functionality.

Required changes:

Move the permission check to be after the host is retrieved (after line 89, once host is assured to exist).

Update the call to check_permission to include the host as the third argument.

+            detail=f"Invalid host ID format: {str(e)}"
+        )
+    except Exception as e:
+        logger.error(f"Compliance discovery failed for host {host_id}: {str(e)}")


To fix this problem, we should sanitize the user-provided value before logging it. For log files that are plain text, we should remove any newline characters (\n, \r, \r\n) from host_id before including it in any log data, following the recommendations. The fix should only affect the log entry at line 111, not affect the functionality of the API or the returned error message. We simply sanitize host_id for logging. The most robust fix is to define a helper function (such as sanitize_for_log) to centralize the sanitization logic, ensuring consistency and maintainability. This function can be placed within this file near the top or close to where it's used.

Required changes:

Add a small utility function to sanitize log inputs.

In the exception handler on line 111, before using host_id in the log message, sanitize it using the helper.

+        BulkComplianceDiscoveryResponse with results for all hosts
+    """
+    # Check permissions
+    check_permission(current_user, "hosts:read")


To fix this error, we must ensure that the call to check_permission at line 134 supplies all required positional arguments. We can infer from the error that check_permission expects at least three required arguments. The first two are already being provided: current_user and the string "hosts:read". The third argument should be provided based on the intended usage, which likely relates to the resource being accessed, such as a host ID. In this context, we're checking permissions for bulk compliance discovery over a set of hosts, so it makes sense for the third argument to be request.host_ids or possibly the list of IDs being operated upon or the request object itself, depending on the definition of check_permission.

The single best fix is to add the third required argument to the check_permission call, choosing the most logical value based on context. Since the operation is bulk read of hosts, passing the request.host_ids (which is a list of host IDs) is likely correct. Thus, change line 134 to: check_permission(current_user, "hosts:read", request.host_ids). This addresses the required argument count and aligns with the apparent usage.

Only line 134 needs changing. No import, method, or definition updates are required, as the intended third argument exists as request.host_ids.

+            errors[host_id] = f"Invalid host ID format: {str(e)}"
+            failed_discoveries += 1
+        except Exception as e:
+            logger.error(f"Compliance discovery failed for host {host_id}: {str(e)}")


To mitigate log injection, any user-controlled value written to logs must be sanitized. The best practice for plain-text logs is to strip or replace linebreaks (\r, \n) from the input, preventing a malicious user from creating log entries with forged lines or confusing formatting. In this case, before logging host_id (originating from user input), we should sanitize it by replacing or removing any line breaks. We'll do this immediately before the logging call on line 186. We should also mark user input clearly in the log entry so it's distinguishable from legitimate log structure.

Required elements:

On line 186, before logging, create a sanitized version of host_id by replacing/removing any linebreaks (\r\n, \n, and \r).

Use the sanitized host_id in all log output.

No external packages are required; use Python's string replace method(s).

Only edit the block directly surrounding line 186.

+        List of supported compliance frameworks with descriptions
+    """
+    # Check permissions
+    check_permission(current_user, "hosts:read")


To fix the problem, we need to update the call to check_permission at line 265 in backend/app/routes/host_compliance_discovery.py to pass the required third argument. Since only current_user and a permission string are given, and the third required argument is not shown, its expected value must be inferred. In RBAC contexts, the missing argument is often a request object, resource, or database session. Nearby endpoints in FastAPI typically receive a database session (db: Session = Depends(get_db)), but this handler does not declare one. The safest and most minimal change is to add a db: Session = Depends(get_db) parameter to get_supported_compliance_frameworks, then pass it as the third argument to check_permission. This mirrors conventional patterns elsewhere, doesn't change business logic, and satisfies the argument requirements.

Changes required:

Update the function signature to accept db: Session = Depends(get_db)

Pass db as the third argument to check_permission.

No new imports are needed, as get_db and Session are already imported.

+                    if version_output and version_output['success']:
+                        version_text = version_output['stdout'].strip()
+                        # Extract version number
+                        version_match = re.search(r'(\\d+\\.\\d+[\\.\\d]*)', version_text)


To fix the issue, the duplicate backslash (\\) in the character class [\\.\\d] should be removed. The intended match seems to be either a period (.) or a digit (\d), i.e., [.\d]. In the code, this pattern is part of the regular expression at line 179 of backend/app/services/host_compliance_discovery.py: re.search(r'(\\d+\\.\\d+[\\.\\d]*)', version_text). Only this line needs changing. The fix is simply to change [\\.\\d]* to [.\d]*, i.e., the regular expression should now read r'(\\d+\\.\\d+[.\d]*)' (escaped appropriately for Python raw strings).

No imports or other code changes are required.

+                sudo_version = 'Unknown'
+                if sudo_version_output and sudo_version_output['success']:
+                    version_text = sudo_version_output['stdout'].strip()
+                    version_match = re.search(r'version (\\d+\\.\\d+[\\.\\d]*)', version_text)


To fix this problem, remove the duplicate or unnecessary characters from the character class. The goal of the regular expression on line 255 is to extract the version number following "version " in the output string. The pattern is currently: 'version (\\d+\\.\\d+[\\.\\d]*)'. The problematic part is [\\.\\d]* - inside a character class, \. matches a literal dot, and \d matches a digit. However, by double-escaping in Python strings, this becomes '[\\.\\d]*' as the raw regex, which would actually match a backslash, a dot, or a 'd' (since Python does not process escapes in raw strings inside character classes the same way). The correct way is to match zero or more dots or digits; so [.\d]* will match any dots or digits.

The direct fix is to:

Replace [\\.\\d]* with [.\d]* inside the regex string.

This makes the pattern: r'version (\d+\.\d+[.\d]*)'

This works in a raw string and will behave as expected.

Change only line 255 in backend/app/services/host_compliance_discovery.py accordingly.

No imports or additional dependencies are required.

+                    if version_output and version_output['success']:
+                        version_text = version_output['stdout'].strip()
+                        # Extract version number from output
+                        version_match = re.search(r'(\\d+\\.\\d+[\\.\\d]*)', version_text)


The best way to fix this problem is to replace the faulty character class [\\.\\d] with a pattern that correctly matches version numbers in the typical format "1.2.3", namely using \d+(\.\d+)*. On line 330, change the regular expression from r'(\\d+\\.\\d+[\\.\\d]*)' to r'(\d+\.\d+(\.\d+)*)'. This will correctly match one or more digits, a dot, one or more digits, and then any number of additional dot-number segments—all commonly seen in version strings. This change will be made in backend/app/services/host_compliance_discovery.py, in the _discover_compliance_scanners method, replacing line 330.

+                    version = 'Unknown'
+                    if version_output and version_output['success']:
+                        version_text = version_output['stdout'].strip()
+                        version_match = re.search(r'(\\d+\\.\\d+[\\.\\d]*)', version_text)


To fix this, we need to rewrite the regular expression to match the intended version pattern without unnecessary duplication or mistakes in the character class. The likely wanted pattern is to match semantic version numbers, such as 1.11.3, 2.0, possibly with more sub-parts. This is often expressed as a group of digits, followed by one or more instances of a period and more digits.
The current pattern, (\d+\.\d+[\.\d]*), attempts to match a digit sequence, a dot, a digit sequence, followed by zero or more occurrences of a character that is either a dot, a digit, or an asterisk (which isn't standard in versions).
A more precise pattern is (\d+\.\d+(?:\.\d+)*), which matches e.g. 1.2, 1.2.3, 1.2.3.4, etc.
Apply this new regex in place of the old one on line 434.

github-advanced-security AI found potential problems Sep 2, 2025

View reviewed changes

remyluslosius closed this Sep 2, 2025

remyluslosius deleted the feature/host-compliance-discovery branch September 11, 2025 00:54

@@ -19,7 +19,7 @@
             from .database import engine, create_tables, get_db
             from .routes import auth, hosts, scans, content, scap_content, monitoring, users, audit, host_groups, scan_templates, webhooks, mfa
             from .routes.system_settings_unified import router as system_settings_router
-            from .routes import credentials, api_keys, remediation_callback, integration_metrics, bulk_operations, compliance, rule_scanning, capabilities, host_compliance_discovery
+            from .routes import credentials, api_keys, remediation_callback, integration_metrics, bulk_operations, compliance, rule_scanning, host_compliance_discovery
             # Import security routes only if available
             try:
                 from .routes import automated_fixes

@@ -73,8 +73,6 @@
                 Returns:
                     ComplianceDiscoveryResponse containing discovered compliance information
                 """
-                # Check permissions
-                check_permission(current_user, "hosts:read")
                 try:
                     # Convert string UUID to UUID object
@@ -88,6 +86,9 @@
                             detail=f"Host with ID {host_id} not found"
                         )
+                    # Check permissions
+                    check_permission(current_user, "hosts:read", host)
                     # Perform compliance discovery
                     compliance_service = HostComplianceDiscoveryService()
                     discovery_results = compliance_service.discover_compliance_infrastructure(host)

@@ -17,6 +17,12 @@
             logger = logging.getLogger(__name__)
+            def sanitize_for_log(value: str) -> str:
+                """Sanitize user-provided input before logging to prevent log injection."""
+                if not isinstance(value, str):
+                    value = str(value)
+                return value.replace("\r", "").replace("\n", "")
             router = APIRouter(prefix="/host-compliance-discovery", tags=["Host Compliance Discovery"])
@@ -108,7 +114,8 @@
                         detail=f"Invalid host ID format: {str(e)}"
                     )
                 except Exception as e:
-                    logger.error(f"Compliance discovery failed for host {host_id}: {str(e)}")
+                    safe_host_id = sanitize_for_log(host_id)
+                    logger.error(f"Compliance discovery failed for host {safe_host_id}: {str(e)}")
                     raise HTTPException(
                         status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
                         detail=f"Compliance discovery failed: {str(e)}"

@@ -183,9 +183,9 @@
                         errors[host_id] = f"Invalid host ID format: {str(e)}"
                         failed_discoveries += 1
                     except Exception as e:
-                        logger.error(f"Compliance discovery failed for host {host_id}: {str(e)}")
+                        sanitized_host_id = host_id.replace('\r\n', '').replace('\n', '').replace('\r', '')
+                        logger.error(f"Compliance discovery failed for host [user input: {sanitized_host_id}]: {str(e)}")
                         errors[host_id] = f"Compliance discovery failed: {str(e)}"
-                        failed_discoveries += 1
                 logger.info(f"Bulk compliance discovery completed: {successful_discoveries} successful, "
                            f"{failed_discoveries} failed out of {len(request.host_ids)} total hosts")

@@ -253,7 +253,8 @@
             @router.get("/compliance-frameworks")
             async def get_supported_compliance_frameworks(
-                current_user=Depends(get_current_user)
+                current_user=Depends(get_current_user),
+                db: Session = Depends(get_db)
             ):
                 """
                 Get list of compliance frameworks that can be discovered and supported
@@ -262,7 +263,7 @@
                     List of supported compliance frameworks with descriptions
                 """
                 # Check permissions
-                check_permission(current_user, "hosts:read")
+                check_permission(current_user, "hosts:read", db)
                 frameworks = {
                     "NIST 800-53": {

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Task 3: Compliance Infrastructure Discovery#28

Task 3: Compliance Infrastructure Discovery#28
remyluslosius wants to merge 1 commit into
mainfrom
feature/host-compliance-discovery

remyluslosius commented Sep 2, 2025

Uh oh!

sonarqubecloud Bot commented Sep 2, 2025

Uh oh!

Check notice

Copilot Autofix

Check failure

Copilot Autofix

Check failure

Copilot Autofix

Check failure

Copilot Autofix

Check failure

Copilot Autofix

Check failure

Copilot Autofix

Check warning

Copilot Autofix

Check warning

Copilot Autofix

Check warning

Copilot Autofix

Check warning

Copilot Autofix

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

remyluslosius commented Sep 2, 2025

Summary

Features

API Endpoints

Compliance Frameworks Supported

Assessment Features

Test Plan

Uh oh!

sonarqubecloud Bot commented Sep 2, 2025

Quality Gate failed

Uh oh!

Check notice

Copilot Autofix

Check failure

Copilot Autofix

Check failure

Copilot Autofix

Check failure

Copilot Autofix

Check failure

Copilot Autofix

Check failure

Copilot Autofix

Check warning

Copilot Autofix

Check warning

Copilot Autofix

Check warning

Copilot Autofix

Check warning

Copilot Autofix

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants