added extra unit tests for filehelpfunctions

nocomplexity · nocomplexity · commit 65b9cfbd9939 · 2026-04-23T21:20:13.000+02:00
diff --git a/docs/validatetips.md b/docs/validatetips.md
@@ -0,0 +1,95 @@
+# Checklist for Python Security Applications
+
+Use this checklist to carry out a risk assessment before adopting any security application developed in Python. It is designed to help identify potential weaknesses, evaluate code quality, and ensure the project meets minimal essential security standards. Applying this checklist consistently will support informed decision-making and reduce the risk of introducing security weaknesses into your environment.
+
+
+The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.
+
+A security project MUST meet the highest security quality standards. No tool is perfect, and the way a design is converted into a program is always subjective. There is no single "best" way to create and implement a design; there are always multiple options. However, there are fundamentally wrong ways to develop software that cause weaknesses and lead to vulnerabilities. This MUST be avoided in security-critical software. A security program that is relied upon MUST be inherently secure and MUST NOT introduce vulnerabilities.
+
+There are few specific evaluation criteria that can be directly applied to evaluate Python security applications.
+Transparency, comprehensive documentation, and the practice of proven security-by-design principles MUST be considered a bare minimum.
+
+While security projects vary, the documentation MUST clearly state when a requirement is not applicable. This prevents ambiguity during security reviews and ensures users that even trivial security aspects have been addressed.
+To evaluate or create Python security applications, the following minimal security requirements MUST be met:
+
+1. **Security Policy**
+
+A security project MUST have a security policy, such as a [SECURITY.md](https://github.com/nocomplexity/codeaudit/blob/main/SECURITY.md) file, to ensure that potential security issues can be reported securely and effectively.
+
++++
+
+2. **OpenSSF Best Practices**
+
+
+A security project MUST maintain an [OpenSSF Best Practices](https://www.bestpractices.dev/en) badge to demonstrate that all fundamental security requirements have been satisfied.
+
++++
+
+**3. Fuzzing**
+
+
+A fuzzer SHOULD be incorporated into the testing workflow where appropriate to identify unexpected behaviours or vulnerabilities.
+
++++
+
+
+
+4. **Architecture and Design**
+
+
+A formal architecture or design document MUST exist, outlining key principles and documenting the design decisions that guided the implementation. It is RECOMMENDED that a "Security by Design" approach is followed to ensure security principles are integrated into every stage of the SDLC cycle.
+A common practice is to maintain an [ARCHITECTURE.md](architecture) file within the repository.
+This document SHOULD be released under a Creative Commons licence (or equivalent) and MUST be available without limitation to allow for public review and improvement.
+
++++
+
+
+5. **Dependency Validation**
+
+
+All dependencies SHOULD be validated against known security vulnerabilities.
+
+
++++
+
+
+6. **Dependency Versioning**
+
+
+The `project.toml` file MUST use exact version identifiers (e.g., `==1.2.3` rather than `>1.2`). This is a critical measure to prevent typosquatting and other supply-chain weaknesses. While [PEP 508](https://peps.python.org/pep-0508/ ) allows logical operators, for the purpose of this security standard, pinning to a specific version is required. Advantage is also that the Python part of the package is bit-for-bit reproducible. 
+
+
++++
+
+
+7. **Reproducible Builds**
+
+
+A package SHOULD be published using Reproducible Builds. It is preferred to use a build tool for PyPI distribution that supports reproducible builds by default.
+
+
++++
+
+
+8. **Principle of Least Privilege**
+
+
+The program SHOULD require minimal privileges to execute. Administrative or high-privilege accounts MUST NOT be used if they could compromise the system. If specific authorizations are required, a separate service account MUST be created to facilitate clear security logging. The system documentation MUST clearly outline all required authorizations.
+
+
++++
+
+
+9. **Defence in Depth**
+
+The Defence in Depth principle MUST be practised across design, implementation, and testing. For instance, multiple SAST tools SHOULD be used for code validation to ensure comprehensive coverage.
+
+
++++
+
+
+10. **SAST Scanning**
+
+
+All Python source code MUST be validated using a trusted open-source SAST scanner, such as [Python Code Audit](https://github.com/nocomplexity/codeaudit). Where weaknesses are mitigated but still trigger a notification, they MUST be marked within the code with a clarifying comment.
diff --git a/tests/unit_tests/test_collectsourcefiles.py b/tests/unit_tests/test_collectsourcefiles.py
@@ -0,0 +1,96 @@
+# SPDX-FileCopyrightText: 2025-present Maikel Mardjan(https://nocomplexity.com/) and all contributors!
+# SPDX-License-Identifier: GPL-3.0-or-later
+"""
+Validation on correct behaviour of collect_python_source_files function.
+
+"""
+import pytest
+import os
+from pathlib import Path
+
+from codeaudit.filehelpfunctions import collect_python_source_files  
+
+def test_underscore_file_inclusion(tmp_path):
+    """
+    Verifies that files starting with underscores (like __init__.py) 
+    are INCLUDED in the current os.walk implementation.
+    """
+    # 1. Setup: Create a mix of files
+    (tmp_path / "normal.py").write_text("x = 1")
+    (tmp_path / "_private.py").write_text("x = 2")
+    (tmp_path / "__init__.py").write_text("# init file")
+    (tmp_path / ".hidden.py").write_text("x = 3")  # Should be skipped
+
+    # 2. Action
+    results = collect_python_source_files(str(tmp_path))
+    
+    # Convert results to filenames for easy comparison
+    found_filenames = [os.path.basename(p) for p in results]
+
+    # 3. Assertions
+    assert "normal.py" in found_filenames
+    assert "_private.py" in found_filenames    # This confirms the current Version 2 logic
+    assert "__init__.py" in found_filenames    # This confirms the current Version 2 logic
+    assert ".hidden.py" not in found_filenames  # This confirms dot-files are skipped
+
+def test_collect_skips_invalid_ast(tmp_path, capsys):
+    # 1. Create one valid file and one invalid file
+    valid_file = tmp_path / "good.py"
+    valid_file.write_text("x = 10")
+    
+    invalid_file = tmp_path / "bad.py"
+    invalid_file.write_text("this is definitely not python syntax :::")
+
+    results = collect_python_source_files(str(tmp_path))
+    
+    assert any("good.py" in r for r in results)
+    
+    # The invalid file should NOT be there
+    assert not any("bad.py" in r for r in results)
+    
+    # Verify the error message was printed to the console
+    captured = capsys.readouterr()
+    assert "skipped due to syntax error" in captured.out
+
+
+def test_collect_python_files_full_logic(tmp_path):
+    """
+    Tests the three main behaviors of collect_python_source_files:
+    1. It skips excluded directories (tests, docs, etc.)
+    2. It skips hidden files (starting with '.')
+    3. It filters out files that are not AST parsable.
+    """
+        
+    valid_file = tmp_path / "main.py"
+    valid_file.write_text("def hello(): print('world')", encoding="utf-8")
+    
+    underscore_file = tmp_path / "__init__.py"
+    underscore_file.write_text("# package init", encoding="utf-8")
+
+    # File in an EXCLUDED directory (Should be skipped)
+    docs_dir = tmp_path / "docs"
+    docs_dir.mkdir()
+    excluded_file = docs_dir / "setup.py"
+    excluded_file.write_text("print('skip me')", encoding="utf-8")
+
+    # Hidden file (Should be skipped by default exclude filter)
+    hidden_file = tmp_path / ".config.py"
+    hidden_file.write_text("secret = True", encoding="utf-8")
+
+    # Invalid Python file (AST Syntax Error - Should be skipped)
+    invalid_file = tmp_path / "broken.py"
+    invalid_file.write_text("if True: \n    print('Missing closing parenthesis'", encoding="utf-8")
+    
+    results = collect_python_source_files(str(tmp_path))        
+    found_filenames = [os.path.basename(p) for p in results]
+    
+    assert "main.py" in found_filenames
+    assert "__init__.py" in found_filenames
+    
+    # These should be filtered out
+    assert "setup.py" not in found_filenames      # Directory excluded
+    assert ".config.py" not in found_filenames    # Hidden file
+    assert "broken.py" not in found_filenames     # AST Parse error
+    
+    # Ensure we got exactly 2 files
+    assert len(found_filenames) == 2
diff --git a/tests/unit_tests/test_readinsourcefile.py b/tests/unit_tests/test_readinsourcefile.py
@@ -0,0 +1,97 @@
+# SPDX-FileCopyrightText: 2025-present Maikel Mardjan(https://nocomplexity.com/) and all contributors!
+# SPDX-License-Identifier: GPL-3.0-or-later
+
+import pytest
+from pathlib import Path
+import sys
+
+from codeaudit.filehelpfunctions import read_in_source_file 
+
+
+def test_read_valid_python_file(tmp_path):
+    file = tmp_path / "test.py"
+    file.write_text("print('hello')", encoding="utf-8")
+
+    result = read_in_source_file(file)
+
+    assert result == "print('hello')"
+
+
+def test_reject_directory(tmp_path):
+    with pytest.raises(SystemExit) as exc:
+        read_in_source_file(tmp_path)
+
+    assert exc.value.code == 1
+
+
+def test_reject_non_py_file(tmp_path):
+    file = tmp_path / "test.txt"
+    file.write_text("not python", encoding="utf-8")
+
+    with pytest.raises(SystemExit) as exc:
+        read_in_source_file(file)
+
+    assert exc.value.code == 1
+
+
+def test_file_read_error(monkeypatch, tmp_path):
+    file = tmp_path / "test.py"
+    file.write_text("content", encoding="utf-8")
+
+    def mock_open(*args, **kwargs):
+        raise IOError("boom")
+
+    monkeypatch.setattr(Path, "open", mock_open)
+
+    with pytest.raises(SystemExit) as exc:
+        read_in_source_file(file)
+
+    assert exc.value.code == 1
+
+
+def test_read_in_source_file_success(tmp_path):
+    # Setup: Create a dummy .py file
+    test_file = tmp_path / "script.py"
+    content = "print('hello world')"
+    test_file.write_text(content, encoding="utf-8")
+
+    # Action
+    result = read_in_source_file(test_file)
+
+    # Assert
+    assert result == content
+
+def test_read_in_source_file_is_directory(tmp_path, capsys):
+    # Setup: Use the tmp_path itself (which is a directory)
+    with pytest.raises(SystemExit) as excinfo:
+        read_in_source_file(tmp_path)
+
+    # Assert
+    assert excinfo.value.code == 1
+    captured = capsys.readouterr()
+    assert "Error: The given path is a directory" in captured.out
+
+def test_read_in_source_file_wrong_extension(tmp_path, capsys):
+    # Setup: Create a text file instead of .py
+    test_file = tmp_path / "notes.txt"
+    test_file.write_text("not a python file")
+
+    with pytest.raises(SystemExit) as excinfo:
+        read_in_source_file(test_file)
+
+    # Assert
+    assert excinfo.value.code == 1
+    captured = capsys.readouterr()
+    assert "Error: The given file is not a Python (.py) file" in captured.out
+
+def test_read_in_source_file_not_found(tmp_path, capsys):
+    # Setup: A path that doesn't exist
+    missing_file = tmp_path / "ghost.py"
+
+    with pytest.raises(SystemExit) as excinfo:
+        read_in_source_file(missing_file)
+
+    # Assert
+    assert excinfo.value.code == 1
+    captured = capsys.readouterr()
+    assert "Failed to read file" in captured.out