Saharsh1123/SentinelScan
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Repository files navigation
# SentinelScan SentinelScan is a Python CLI static-analysis tool that scans Python codebases for hardcoded secrets. It uses Python AST parsing to inspect real assignment statements, extracts candidate values, applies a modular rule engine, and reports findings with structured metadata such as rule ID, severity, reason, variable name, file path, and line number. SentinelScan is a lightweight, learning-focused static-analysis project. It demonstrates AST parsing, modular rule evaluation, structured findings, CLI output, JSON output, redaction, testing, linting, and CI. > SentinelScan is not a replacement for mature secret-scanning tools. It is an educational static-analysis project intended to build and demonstrate security engineering concepts. --- ## Features - Recursively scans directories for Python files (`.py`) - Uses Python AST parsing to analyze real assignment statements - Extracts candidates from source code before rule evaluation - Uses structured dataclass models: - `Rule` - `Candidate` - `Finding` - Uses a modular rule engine for scalable detection logic - Detects hardcoded secrets assigned to: - Simple variables - Object attributes - Nested attributes - Multiple assignment targets - Detects common secret types: - Passwords - API keys - Tokens - AWS access keys - Generic secrets - Supports value-pattern-based detection for structured secrets such as AWS access keys - Supports variable-name-based detection for names such as `password`, `pwd`, `passwd`, `api_key`, `apikey`, `token`, and `secret` - Applies minimum-length filtering to reduce obvious false positives - Assigns severity levels such as `HIGH` and `MEDIUM` - Provides detection reasons explaining why each finding was flagged - Supports human-readable CLI output - Supports machine-readable JSON output with `--json` - Supports severity filtering with `--severity` - Supports secret redaction with `--redact` - Supports combined flags such as `--json --severity HIGH --redact` - Handles syntax errors gracefully - Handles file encoding issues using UTF-8 with ignored decode errors - Includes pytest-based unit and CLI tests - Uses Ruff for linting/code quality - Uses GitHub Actions CI to run automated checks --- ## Requirements - Python 3.11+ - `pytest` for testing - `ruff` for linting and formatting - No third-party runtime dependencies --- ## Installation Clone the repository: ```bash git clone https://github.com/Saharsh1123/SentinelScan.git cd SentinelScan ``` Optional: create and activate a virtual environment. ```bash python3 -m venv venv source venv/bin/activate ``` Install test/development dependencies: ```bash python3 -m pip install pytest ruff ``` --- ## Quick Start Scan a directory: ```bash python3 main.py ./your_directory ``` Scan the included test fixture directory: ```bash python3 main.py test_dirs ``` Output JSON: ```bash python3 main.py test_dirs --json ``` Filter by severity: ```bash python3 main.py test_dirs --severity HIGH ``` Redact detected values: ```bash python3 main.py test_dirs --redact ``` Use JSON, severity filtering, and redaction together: ```bash python3 main.py test_dirs --json --severity HIGH --redact ``` For full usage documentation, see [`docs/USAGE.md`](docs/USAGE.md). --- ## Example CLI Output ```bash python3 main.py test_dirs ``` ```text Scanning 4 Python files... --- Findings --- [HIGH] test_dirs/edge_repo/edge_test.py:1 Password → hjkl Reason: variable name matched password/pwd/passwd pattern and value met minimum length [HIGH] test_dirs/test_repo/open_vulns.py:4 AWS Access Key → AKIAEXAMPLE123456789 Reason: value matched AKIA-prefixed AWS access key pattern [MEDIUM] test_dirs/test_repo/open_vulns.py:5 Token → xyzttttggfdddf Reason: variable name matched token pattern and value met minimum length Total findings: 6 ``` --- ## Example JSON Output ```bash python3 main.py test_dirs --json ``` ```json [ { "line": 1, "file": "test_dirs/edge_repo/edge_test.py", "var_name": "password", "rule_id": "PASSWORD", "rule": "Password", "severity": "HIGH", "value": "hjkl", "reason": "variable name matched password/pwd/passwd pattern and value met minimum length" } ] ``` JSON mode prints only valid JSON, without CLI headers, summaries, or human-readable formatting. --- ## Documentation Detailed documentation is split into separate files: | Document | Purpose | |---|---| | [`docs/USAGE.md`](docs/USAGE.md) | CLI flags, examples, JSON output, redaction, and severity filtering | | [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) | Project pipeline, module responsibilities, and dataclass model flow | | [`docs/DETECTION_RULES.md`](docs/DETECTION_RULES.md) | Built-in rules, assignment support, detection reasons, and limitations | | [`docs/TESTING.md`](docs/TESTING.md) | Test coverage, pytest usage, CLI tests, and CI behavior | | [`docs/DEVELOPMENT.md`](docs/DEVELOPMENT.md) | Setup, Ruff, local checks, project structure, and contribution workflow | | [`docs/ROADMAP.md`](docs/ROADMAP.md) | Current limitations and future improvements | --- ## Project Structure ```text SENTINELSCAN/ ├── .github/ │ └── workflows/ │ └── tests.yaml # GitHub Actions workflow for Ruff and pytest ├── detectors/ │ ├── __init__.py # Makes detectors an importable package │ ├── ast_analyzer.py # AST parsing and candidate extraction │ ├── find_secrets.py # High-level detection orchestration │ ├── models.py # Rule, Candidate, and Finding dataclasses │ ├── rule_engine.py # Applies rules to candidates │ └── rules.py # Built-in rule definitions ├── test_dirs/ │ ├── edge_repo/ │ │ └── edge_test.py # Edge-case fixture file │ └── test_repo/ │ ├── embedded_test/ │ │ └── embedded_hello.py # Nested fixture file │ ├── hello.py # Benign fixture file │ └── open_vulns.py # Fixture file containing sample vulnerabilities ├── tests/ │ ├── test_apply_rules.py # Unit tests for rule engine behavior │ ├── test_ast.py # Tests for AST-based detection │ └── test_cli.py # CLI integration tests ├── .gitignore ├── cli.py # CLI argument parsing ├── LICENSE ├── main.py # Application entry point ├── output.py # JSON/text output formatting, filtering, and redaction ├── pytest.ini # Pytest import path configuration ├── README.md # Project overview └── scanner.py # Path validation, file discovery, file reading, and scan orchestration ``` Generated local files such as `__pycache__/`, `.pytest_cache/`, `.ruff_cache/`, and `venv/` may appear during development but should not be committed. --- ## Testing and Linting Run tests: ```bash pytest ``` Run Ruff: ```bash ruff check . ``` Format code: ```bash ruff format . ``` Recommended local check before committing: ```bash ruff check . pytest python3 main.py test_dirs python3 main.py test_dirs --json python3 main.py test_dirs --json --severity HIGH python3 main.py test_dirs --json --severity HIGH --redact ``` --- ## Current Limitations - Only scans Python files (`.py`) - Only analyzes assignment statements supported by the current AST logic - Does not currently support dictionary/subscript assignments such as `config["password"] = "value"` - Does not currently detect secrets passed directly into function calls - Does not currently perform entropy scoring, confidence scoring, data-flow analysis, or taint analysis - Detection rules may still produce false positives or false negatives - Redaction is optional, so use `--redact` when sharing scan output For more detail, see [`docs/ROADMAP.md`](docs/ROADMAP.md). --- ## License This project is for educational and demonstration purposes.