diff --git a/CHANGELOG.md b/CHANGELOG.md index 91284e7..6b1a4ce 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -15,6 +15,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - Adds fixture coverage for both the ACF-style Launchpad-shaped case and a standalone baseline `_()` call - Adds safe fixture coverage proving standard WordPress i18n helpers do not trigger the rule +- Added Path B observability for Magic String Detector aggregated patterns + - Tracks per-pattern phase timings for grep, extraction, and aggregation in `process_aggregated_pattern()` + - Tracks quality metrics for raw matches, extracted strings, unique strings, filtered strings, and emitted violations + - Logs lightweight state transitions (`GREP`, `EXTRACT`, `AGGREGATE`, `COMPLETE`) in debug output + - Shows per-pattern metrics in text output when `--verbose` or `PROFILE=1` is enabled + - Exposes cumulative `magic_string_metrics` in JSON output for downstream reporting + - Verified in text mode and JSON mode against `/Users/noelsaw/Documents/GH Repos/creditconnection2-self-service` + ### Documentation - Added `PROJECT/1-INBOX/PATTERN-PROPOSAL-LAUNCHPAD-CRASH.md` @@ -24,6 +32,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Fixed +- Fixed stale-registry pattern-loader fallback that could make the Magic String Detector appear stuck + - Root cause: stale registry state forced the fallback parser path, pattern extraction failed there, `pattern_search` could become empty, and the first simple Magic String rule then looked like a hang + - Switched the fallback Python heredocs in `dist/lib/pattern-loader.sh` to tab-stripping heredocs so indented closing markers terminate correctly when the registry is stale + - Added a guard in `dist/bin/check-performance.sh` to skip simple rules with empty search patterns and emit an explicit warning instead of broad-matching the entire scan set + - Verified against `/Users/noelsaw/Documents/GH Repos/creditconnection2-self-service` with `--verbose`, profiling, debug tracing, and AI triage flags enabled + - Result: the scan now completes through Magic String Detector and Function Clone Detector; the verification run finished in 35s with `MAGIC_STRING_DETECTOR` profiled at 21311ms + #### Phase 0: Timeout Guards for Unprotected Recursive Grep Calls - **Wrapped 8 raw `grep -r` file-discovery calls with `run_with_timeout "$MAX_SCAN_TIME"`** diff --git a/FEEDBACK-CR-SELF-SERVICE.md b/FEEDBACK-CR-SELF-SERVICE.md new file mode 100644 index 0000000..472c884 --- /dev/null +++ b/FEEDBACK-CR-SELF-SERVICE.md @@ -0,0 +1,91 @@ +# WPCC Pattern Library — False Positive Review +**Source:** AI review of creditconnection2-self-service scan +**Date:** 2026-03-23 +**Scan findings:** 99 total | **Estimated true positives after fixes:** ~40 + +--- + +## Action Items + +### ✅ Fix Now — High Confidence, Low Effort + +- [ ] **FIX `php-shell-exec-functions.json` — `exec-call` pattern matches `curl_exec()`** + **Pattern:** `exec[[:space:]]*\(` has no word boundary → matches `curl_exec(`. + **Fix:** Change to `\bexec[[:space:]]*\(` in the `exec-call` sub-pattern. + **File:** `dist/patterns/php-shell-exec-functions.json` + **FPs eliminated:** 8 (all CRITICAL — all were `curl_exec($curl)` calls) + +- [ ] **FIX `php-dynamic-include.json` — WP-CLI bootstrap scripts flagged as LFI** + **Finding:** `check-user-meta.php:13` and `test-alternate-registry-id.php:24` — `$path` is iterated from a hardcoded static array, never user-controlled. + **Fix:** Add `*-user-meta.php`, `*-registry-id.php`, or more broadly `*/scripts/*` / `*/cli/*` to `exclude_files` in the pattern. + **File:** `dist/patterns/php-dynamic-include.json` + **FPs eliminated:** 2 (both CRITICAL) + +--- + +### 🔍 Investigate Before Acting — Diagnosis Uncertain + +- [ ] **INVESTIGATE "Direct superglobal manipulation" (~17 findings on `CURLOPT_POST`)** + **Reviewer claim:** `curl_setopt($curl, CURLOPT_POST, true)` is matched as superglobal manipulation. + **Our assessment:** The `spo-002-superglobals` pattern requires `$_` prefix in all branches — `CURLOPT_POST` cannot match it. + **Action:** Re-examine the actual line numbers in the scan log for these 17 findings. Determine which rule is actually firing and why. Do NOT apply the reviewer's suggested fix (`$_` anchoring) — it's already implemented. + **File:** `dist/patterns/spo-002-superglobal-manipulation.json` (likely not the culprit) + +- [ ] **INVESTIGATE "Sanitized reads flagged as unsanitized" for `sanitize_text_field($_GET[...])`** + **Finding:** `class-cr-rest-api.php` and `class-cr-business-rest-api.php` — `sanitize_text_field($_GET['registry_id'])` being flagged. + **Our assessment:** `sanitize_` is already in `exclude_patterns` for `unsanitized-superglobal-read`. This is likely a **multiline case** — `$_GET` on its own line while the sanitizer wraps on another. + **Action:** Confirm by inspecting actual flagged lines. If multiline, document as a known structural limitation (grep is line-scoped; lookbehinds won't help here). If same-line, there's a bug in the exclude logic. + **File:** `dist/patterns/unsanitized-superglobal-read.json` + +--- + +### 📋 Deferred — Valid Issues, Structural Effort Required + +- [ ] **DEFERRED: Add admin-only hook whitelist for capability check false positives** + **Finding:** `credit-registry-forms.php:48` — `add_action('admin_notices', ...)` flagged for missing capability check. `admin_notices` only fires for authenticated admin users. + **Reviewer recommendation:** Whitelist inherently-admin-only hooks (`admin_notices`, `admin_init`, `admin_menu`, etc.) + **Our assessment:** Correct diagnosis. Not fixable with regex alone — requires a hook whitelist in the scanner logic. Downgrade severity to INFO as interim. + **Effort:** Low–Medium | **FPs eliminated:** 1 per occurrence + +- [ ] **DEFERRED: Strengthen N+1 loop detection to verify lexical containment** + **Finding 1:** `check-user-meta.php:23` — `get_user_meta()` called sequentially for a single user, not inside a user loop. + **Finding 2:** `class-cr-business-rest-api.php:245` — single `get_user_meta()` re-read after processing. + **Reviewer recommendation:** Confirm the meta call is lexically inside a loop body (`{...}`), not just nearby by line count. + **Our assessment:** The scanner has `is_iterating_over_multiple_objects()` heuristics. These may be gaps in that logic. Review and tighten the "loop containment" check. + **Effort:** Medium | **FPs eliminated:** 2 + +--- + +### ✔️ No Action Required — Already Handled or Misdiagnosed + +- [x] **SKIP — `isset()` exclusion for superglobal reads** + `isset\(` is already in `exclude_patterns` for `unsanitized-superglobal-read.json`. Reviewer's suggestion is already implemented. + +- [x] **SKIP — `$_` prefix anchoring for superglobal manipulation** + All three sub-patterns in `spo-002-superglobals` already require `$_` prefix. The reviewer's suggested fix is already in place. + +- [x] **SKIP — Sanitizer negative-lookbehind regex for `unsanitized-superglobal-read`** + The `exclude_patterns` list already handles same-line sanitizer wrapping. The multiline case is a structural grep limitation, not addressable by the proposed regex. + +--- + +## Valid Issues Found (Not FPs — Tracker for Plugin Owner) + +| # | File | Line | Issue | Risk | +|---|------|------|-------|------| +| 6 | `admin-test-page.php` | 191 | `$_GET['view_file']` used without `sanitize_file_name()`; `strpos($view_file, '..')` bypass-able via encoding | HIGH | +| 7 | `admin-test-page.php` | 145 | `$_GET['view_dir']` displayed with `esc_html()` before `sanitize_file_name()` on line 147 — safe but misordered | LOW (confusing) | +| 8 | `api-functions.php` | 1014 | `$_POST['force_refresh']` in AJAX handler — strict `=== 'true'` comparison limits injection, but verify nonce upstream | LOW–MEDIUM | + +--- + +## Impact Summary + +| Fix | File to Edit | Effort | FPs Eliminated | +|-----|-------------|--------|---------------| +| `\b` word boundary on `exec-call` | `php-shell-exec-functions.json` | 1 line | 8 | +| Add WP-CLI scripts to `exclude_files` | `php-dynamic-include.json` | 2 lines | 2 | +| Investigate superglobal 17-finding cluster | Scan log + `spo-002` | Investigation | Up to ~17 | +| Investigate multiline sanitizer FPs | Scan log + `unsanitized-superglobal-read` | Investigation | Up to ~20 | +| Admin-only hook whitelist | `check-performance.sh` | Medium | 1+ per scan | +| N+1 loop containment tightening | `check-performance.sh` | Medium | 2+ per scan | diff --git a/PROJECT/2-WORKING/FEEDBACK-CR-SELF-SERVICE.md b/PROJECT/2-WORKING/FEEDBACK-CR-SELF-SERVICE.md new file mode 100644 index 0000000..0cd798e --- /dev/null +++ b/PROJECT/2-WORKING/FEEDBACK-CR-SELF-SERVICE.md @@ -0,0 +1,72 @@ +Verdict: Mostly False Positives / Scanner Noise + +"Shell command execution" (CRITICAL) — False Positive +All 8 findings flag curl_exec($curl). This is PHP's cURL library function, not shell execution. It's the standard way to make HTTP requests in PHP without WordPress's wp_remote_* wrappers. No shell is involved — completely safe. + +"Direct superglobal manipulation" on CURLOPT_POST/CURLOPT_POSTFIELDS (HIGH) — False Positive +The scanner is incorrectly matching curl_setopt($curl, CURLOPT_POST, true) as "superglobal manipulation." These are cURL options, not $_POST superglobal writes. This accounts for ~17 of the findings. + +"Dynamic PHP include/require" (CRITICAL) — False Positive +Both check-user-meta.php:13 and test-alternate-registry-id.php:24 are WP-CLI test scripts that locate wp-load.php from a hardcoded relative path array. The $path variable is never user-controlled — it's iterated from a static array. No risk. + +"N+1 query pattern" (CRITICAL) — False Positive +check-user-meta.php:23 — This is a flat script calling get_user_meta() sequentially for a single user, not inside a loop over users. +class-cr-business-rest-api.php:245 — This is a single get_user_meta() re-read after processing, not an N+1 pattern. + +"Admin function missing capability check" (HIGH) — False Positive +credit-registry-forms.php:48 — add_action('admin_notices', ...) is a standard WordPress pattern for showing a dependency notice when a plugin is deactivated. The admin_notices hook itself only fires in the admin panel for authenticated users. The unset($_GET['activate']) on line 51 is also a standard WP pattern to suppress the "Plugin activated" message after forced deactivation. + +6. Unsanitized $_GET['view_file'] (HIGH) — Valid Issue +At admin-test-page.php:191, $_GET['view_file'] is used without sanitize_file_name(). The strpos($view_file, '..') === false check on line 193 is a weak directory traversal guard (can be bypassed with encodings). Should use sanitize_file_name() like view_dir does on line 147. + +7. Unsanitized $_GET['view_dir'] display before sanitization (HIGH) — Valid Issue +At admin-test-page.php:145, $_GET['view_dir'] is output with esc_html() (safe for XSS), but the sanitization via sanitize_file_name() happens on line 147 — after the display. The display itself is safe due to esc_html(), but the order is confusing. + +8. $_POST['force_refresh'] (HIGH) — Low Risk +At api-functions.php:1014, this is compared strictly to the string 'true', so it can only ever be a boolean. No injection vector. However, this runs inside a WP AJAX handler — verify nonce checks exist upstream. + +Many $_GET reads with sanitize_text_field() — False Positive +Lines in class-cr-rest-api.php and class-cr-business-rest-api.php that do sanitize_text_field($_GET['registry_id']) are already properly sanitized. The scanner flags the raw $_GET access but ignores the wrapping sanitization. + +--- + +Easy wins +curl_exec() flagged as shell execution — The regex is likely matching /\b(exec|shell_exec|system|passthru)\s*\(/. Just add a negative lookbehind for curl_: + + +/\b(? **55 detection patterns** | **7 with AI mitigation** | **60-70% fewer false positives** | **Multi-platform: PHP, Headless, Node.js, JS** +> **56 detection patterns** | **7 with AI mitigation** | **60-70% fewer false positives** | **Multi-platform: PHP, Headless, Node.js, JS** ### Feature Highlights - ✅ **20 CRITICAL** OOM and security patterns - ✅ **17 HIGH** performance and security patterns - ✅ **7 patterns** with context-aware severity adjustment -- ✅ **17 heuristic** patterns for code quality insights +- ✅ **18 heuristic** patterns for code quality insights - ✅ **Multi-platform:** WordPress, Headless, Node.js, JavaScript --- -**Generated:** 2026-01-27 22:31:24 UTC +**Generated:** 2026-03-24 01:34:38 UTC **Version:** 1.0.0 **Tool:** Pattern Library Manager diff --git a/dist/bin/check-performance.sh b/dist/bin/check-performance.sh index 100d608..e026888 100755 --- a/dist/bin/check-performance.sh +++ b/dist/bin/check-performance.sh @@ -81,7 +81,7 @@ source "$REPO_ROOT/lib/pattern-loader.sh" # This is the ONLY place the version number should be defined. # All other references (logs, JSON, banners) use this variable. # Update this ONE line when bumping versions - never hardcode elsewhere. -SCRIPT_VERSION="2.2.7" +SCRIPT_VERSION="2.2.9" # Get the start/end line range for the enclosing function/method. # @@ -414,6 +414,33 @@ declare -a JSON_CHECKS=() # Note: Variable names kept as DRY_VIOLATIONS for backward compatibility declare -a DRY_VIOLATIONS=() DRY_VIOLATIONS_COUNT=0 + +# Magic String Detector observability metrics (Path B) +MAGIC_PATTERNS_PROCESSED=0 +MAGIC_TOTAL_RAW_MATCHES=0 +MAGIC_TOTAL_EXTRACTED_STRINGS=0 +MAGIC_TOTAL_UNIQUE_STRINGS=0 +MAGIC_TOTAL_FILTERED_STRINGS=0 +MAGIC_TOTAL_FILTERED_MIN_FILES=0 +MAGIC_TOTAL_FILTERED_MIN_MATCHES=0 +MAGIC_TOTAL_VIOLATIONS=0 +MAGIC_TOTAL_GREP_TIME=0 +MAGIC_TOTAL_EXTRACT_TIME=0 +MAGIC_TOTAL_AGGREGATE_TIME=0 + +LAST_MAGIC_PATTERN_ID="" +LAST_MAGIC_PATTERN_TITLE="" +LAST_MAGIC_STATE="IDLE" +LAST_MAGIC_RAW_MATCHES=0 +LAST_MAGIC_EXTRACTED_STRINGS=0 +LAST_MAGIC_UNIQUE_STRINGS=0 +LAST_MAGIC_FILTERED_STRINGS=0 +LAST_MAGIC_FILTERED_MIN_FILES=0 +LAST_MAGIC_FILTERED_MIN_MATCHES=0 +LAST_MAGIC_VIOLATIONS_ADDED=0 +LAST_MAGIC_GREP_TIME=0 +LAST_MAGIC_EXTRACT_TIME=0 +LAST_MAGIC_AGGREGATE_TIME=0 CLONE_DETECTION_RAN=false # Track whether clone detection was executed # Show enhanced help message @@ -1600,6 +1627,40 @@ EOF ((DRY_VIOLATIONS_COUNT++)) || true } +reset_magic_pattern_metrics() { + LAST_MAGIC_PATTERN_ID="${pattern_id:-}" + LAST_MAGIC_PATTERN_TITLE="${pattern_title:-}" + LAST_MAGIC_STATE="IDLE" + LAST_MAGIC_RAW_MATCHES=0 + LAST_MAGIC_EXTRACTED_STRINGS=0 + LAST_MAGIC_UNIQUE_STRINGS=0 + LAST_MAGIC_FILTERED_STRINGS=0 + LAST_MAGIC_FILTERED_MIN_FILES=0 + LAST_MAGIC_FILTERED_MIN_MATCHES=0 + LAST_MAGIC_VIOLATIONS_ADDED=0 + LAST_MAGIC_GREP_TIME=0 + LAST_MAGIC_EXTRACT_TIME=0 + LAST_MAGIC_AGGREGATE_TIME=0 +} + +set_magic_pattern_state() { + LAST_MAGIC_STATE="$1" + debug_echo " [STATE] ${LAST_MAGIC_PATTERN_ID:-unknown}: ${LAST_MAGIC_STATE}" +} + +accumulate_magic_pattern_metrics() { + MAGIC_TOTAL_RAW_MATCHES=$((MAGIC_TOTAL_RAW_MATCHES + LAST_MAGIC_RAW_MATCHES)) + MAGIC_TOTAL_EXTRACTED_STRINGS=$((MAGIC_TOTAL_EXTRACTED_STRINGS + LAST_MAGIC_EXTRACTED_STRINGS)) + MAGIC_TOTAL_UNIQUE_STRINGS=$((MAGIC_TOTAL_UNIQUE_STRINGS + LAST_MAGIC_UNIQUE_STRINGS)) + MAGIC_TOTAL_FILTERED_STRINGS=$((MAGIC_TOTAL_FILTERED_STRINGS + LAST_MAGIC_FILTERED_STRINGS)) + MAGIC_TOTAL_FILTERED_MIN_FILES=$((MAGIC_TOTAL_FILTERED_MIN_FILES + LAST_MAGIC_FILTERED_MIN_FILES)) + MAGIC_TOTAL_FILTERED_MIN_MATCHES=$((MAGIC_TOTAL_FILTERED_MIN_MATCHES + LAST_MAGIC_FILTERED_MIN_MATCHES)) + MAGIC_TOTAL_VIOLATIONS=$((MAGIC_TOTAL_VIOLATIONS + LAST_MAGIC_VIOLATIONS_ADDED)) + MAGIC_TOTAL_GREP_TIME=$((MAGIC_TOTAL_GREP_TIME + LAST_MAGIC_GREP_TIME)) + MAGIC_TOTAL_EXTRACT_TIME=$((MAGIC_TOTAL_EXTRACT_TIME + LAST_MAGIC_EXTRACT_TIME)) + MAGIC_TOTAL_AGGREGATE_TIME=$((MAGIC_TOTAL_AGGREGATE_TIME + LAST_MAGIC_AGGREGATE_TIME)) +} + # Output final JSON output_json() { local exit_code="$1" @@ -1654,6 +1715,25 @@ output_json() { fi done + local magic_string_metrics_json=$(cat </dev/null || echo "0") + set_magic_pattern_state "GREP" + phase_start_time=$(date +%s%N 2>/dev/null || echo "0") # Run grep to find all matches using the pattern's search pattern # Note: pattern_search is set by load_pattern @@ -2659,7 +2754,9 @@ process_aggregated_pattern() { # Check for timeout (exit code 124) if [ "$grep_exit_code" -eq 124 ]; then + set_magic_pattern_state "COMPLETE" text_echo " ${RED}⚠ Scan timeout after ${MAX_SCAN_TIME}s - skipping pattern${NC}" + accumulate_magic_pattern_metrics rm -f "$temp_matches" return 1 fi @@ -2669,27 +2766,30 @@ process_aggregated_pattern() { local match_count=$(echo "$matches" | grep -c . 2>/dev/null) # Ensure match_count is a valid integer (default to 0 if empty/invalid) match_count=${match_count:-0} + LAST_MAGIC_RAW_MATCHES=$match_count debug_echo "Found $match_count raw matches" - # PROFILING: Log grep time - if [ "$PROFILE" = "1" ] && [ "$step_start_time" != "0" ]; then - local grep_end_time=$(date +%s%N 2>/dev/null || echo "0") - if [ "$grep_end_time" != "0" ]; then - local grep_duration=$(( (grep_end_time - step_start_time) / 1000000 )) - debug_echo " [PROFILE] Grep step: ${grep_duration}ms" - fi - step_start_time=$(date +%s%N 2>/dev/null || echo "0") + phase_end_time=$(date +%s%N 2>/dev/null || echo "0") + if [ "$phase_start_time" != "0" ] && [ "$phase_end_time" != "0" ]; then + LAST_MAGIC_GREP_TIME=$(( (phase_end_time - phase_start_time) / 1000000 )) + fi + if [ "$PROFILE" = "1" ]; then + debug_echo " [PROFILE] Grep step: ${LAST_MAGIC_GREP_TIME}ms" fi # SAFETY: Check if match count exceeds file limit (rough proxy for file count) if [ "$MAX_FILES" -gt 0 ] && [ "$match_count" -gt "$((MAX_FILES * 10))" ]; then + set_magic_pattern_state "COMPLETE" text_echo " ${RED}⚠ Match count ($match_count) suggests excessive file processing - skipping pattern${NC}" + accumulate_magic_pattern_metrics rm -f "$temp_matches" return 1 fi # Extract captured groups and aggregate if [ -n "$matches" ]; then + set_magic_pattern_state "EXTRACT" + phase_start_time=$(date +%s%N 2>/dev/null || echo "0") local iteration=0 local last_progress_time=$(date +%s 2>/dev/null || echo "0") @@ -2724,26 +2824,31 @@ process_aggregated_pattern() { local captured=$(echo "$code" | grep -oE "$pattern_search" | sed -E "s/.*['\"]([a-z0-9_]+)['\"].*/\1/" | head -1) if [ -n "$captured" ]; then + extracted_count=$((extracted_count + 1)) # Escape pipe characters in the captured string for safe storage local escaped_captured=$(echo "$captured" | sed 's/|/\\|/g') echo "$escaped_captured|$file|$line" >> "$temp_matches" fi done <<< "$matches" - # PROFILING: Log extraction time - if [ "$PROFILE" = "1" ] && [ "$step_start_time" != "0" ]; then - local extract_end_time=$(date +%s%N 2>/dev/null || echo "0") - if [ "$extract_end_time" != "0" ]; then - local extract_duration=$(( (extract_end_time - step_start_time) / 1000000 )) - debug_echo " [PROFILE] String extraction: ${extract_duration}ms" - fi - step_start_time=$(date +%s%N 2>/dev/null || echo "0") + LAST_MAGIC_EXTRACTED_STRINGS=$extracted_count + + phase_end_time=$(date +%s%N 2>/dev/null || echo "0") + if [ "$phase_start_time" != "0" ] && [ "$phase_end_time" != "0" ]; then + LAST_MAGIC_EXTRACT_TIME=$(( (phase_end_time - phase_start_time) / 1000000 )) + fi + if [ "$PROFILE" = "1" ]; then + debug_echo " [PROFILE] String extraction: ${LAST_MAGIC_EXTRACT_TIME}ms" fi # Aggregate by captured string if [ -f "$temp_matches" ] && [ -s "$temp_matches" ]; then + set_magic_pattern_state "AGGREGATE" + phase_start_time=$(date +%s%N 2>/dev/null || echo "0") local unique_strings=$(cut -d'|' -f1 "$temp_matches" | sort -u) - local total_unique_strings=$(echo "$unique_strings" | wc -l | tr -d ' ') + total_unique_strings=$(echo "$unique_strings" | wc -l | tr -d ' ') + total_unique_strings=${total_unique_strings:-0} + LAST_MAGIC_UNIQUE_STRINGS=$total_unique_strings local string_iteration=0 local last_string_progress_time=$(date +%s 2>/dev/null || echo "0") @@ -2795,20 +2900,36 @@ process_aggregated_pattern() { # Add to magic string violations add_dry_violation "$pattern_title" "$pattern_severity" "$unescaped_string" "$file_count" "$total_count" "$locations_json" + violations_added=$((violations_added + 1)) + else + filtered_strings=$((filtered_strings + 1)) + if [ "$file_count" -lt "$min_files" ]; then + filtered_min_files=$((filtered_min_files + 1)) + fi + if [ "$total_count" -lt "$min_matches" ]; then + filtered_min_matches=$((filtered_min_matches + 1)) + fi fi done <<< "$unique_strings" - # PROFILING: Log aggregation time - if [ "$PROFILE" = "1" ] && [ "$step_start_time" != "0" ]; then - local agg_end_time=$(date +%s%N 2>/dev/null || echo "0") - if [ "$agg_end_time" != "0" ]; then - local agg_duration=$(( (agg_end_time - step_start_time) / 1000000 )) - debug_echo " [PROFILE] Aggregation: ${agg_duration}ms" - fi + LAST_MAGIC_FILTERED_STRINGS=$filtered_strings + LAST_MAGIC_FILTERED_MIN_FILES=$filtered_min_files + LAST_MAGIC_FILTERED_MIN_MATCHES=$filtered_min_matches + LAST_MAGIC_VIOLATIONS_ADDED=$violations_added + + phase_end_time=$(date +%s%N 2>/dev/null || echo "0") + if [ "$phase_start_time" != "0" ] && [ "$phase_end_time" != "0" ]; then + LAST_MAGIC_AGGREGATE_TIME=$(( (phase_end_time - phase_start_time) / 1000000 )) + fi + if [ "$PROFILE" = "1" ]; then + debug_echo " [PROFILE] Aggregation: ${LAST_MAGIC_AGGREGATE_TIME}ms" fi fi fi + set_magic_pattern_state "COMPLETE" + accumulate_magic_pattern_metrics + # Cleanup rm -f "$temp_matches" } @@ -5825,6 +5946,12 @@ if [ -n "$SIMPLE_PATTERNS" ]; then text_echo "${BLUE}▸ $pattern_title ${check_color}[$check_severity]${NC}" + if [ -z "$pattern_search" ]; then + text_echo " ${YELLOW}⚠ Empty search pattern for $pattern_id - skipping rule${NC}" + debug_echo "Skipping simple pattern with empty search: $pattern_id ($pattern_file)" + continue + fi + # Build --include flags from pattern_file_patterns include_args="" if [ -n "$pattern_file_patterns" ]; then @@ -6289,6 +6416,12 @@ else process_aggregated_pattern "$pattern_file" + if [ "$VERBOSE" = "true" ] || [ "$PROFILE" = "1" ]; then + text_echo " Metrics: ${LAST_MAGIC_RAW_MATCHES} raw → ${LAST_MAGIC_EXTRACTED_STRINGS} extracted → ${LAST_MAGIC_UNIQUE_STRINGS} unique → ${LAST_MAGIC_VIOLATIONS_ADDED} violations" + text_echo " Filtered: ${LAST_MAGIC_FILTERED_STRINGS} strings (min_files=${LAST_MAGIC_FILTERED_MIN_FILES}, min_matches=${LAST_MAGIC_FILTERED_MIN_MATCHES})" + text_echo " Timing: grep=${LAST_MAGIC_GREP_TIME}ms extract=${LAST_MAGIC_EXTRACT_TIME}ms agg=${LAST_MAGIC_AGGREGATE_TIME}ms" + fi + # Check if new violations were added violations_after=$DRY_VIOLATIONS_COUNT new_violations=$((violations_after - violations_before)) diff --git a/dist/lib/pattern-loader.sh b/dist/lib/pattern-loader.sh index 1ec1ff4..b7a825d 100644 --- a/dist/lib/pattern-loader.sh +++ b/dist/lib/pattern-loader.sh @@ -1,7 +1,7 @@ #!/usr/bin/env bash # # Pattern Loader Library -# Version: 1.1.0 +# Version: 1.1.1 # # Loads pattern definitions from JSON files and makes them available to the scanner # @@ -394,7 +394,7 @@ load_pattern() { # per-file extraction. if [ -z "$pattern_search" ] && [ "$pattern_detection_type" != "clone_detection" ]; then if command -v python3 &> /dev/null; then - pattern_search=$(PYTHONSTARTUP= python3 -S </dev/null + pattern_search=$(PYTHONSTARTUP= python3 -S <<-EOFPYTHON 2>/dev/null import json import sys try: @@ -430,7 +430,7 @@ load_pattern() { EOFPYTHON ) elif command -v python &> /dev/null; then - pattern_search=$(PYTHONSTARTUP= python </dev/null + pattern_search=$(PYTHONSTARTUP= python <<-EOFPYTHON 2>/dev/null import json import sys try: @@ -476,7 +476,7 @@ load_pattern() { # provided file_patterns, we respect that and skip the JSON lookup. if [ -z "$pattern_file_patterns" ]; then if command -v python3 &> /dev/null; then - pattern_file_patterns=$(PYTHONSTARTUP= python3 -S </dev/null + pattern_file_patterns=$(PYTHONSTARTUP= python3 -S <<-EOFPYTHON 2>/dev/null import json try: with open('$pattern_file', 'r') as f: @@ -502,7 +502,7 @@ load_pattern() { if [ "$pattern_detection_type" = "scripted" ]; then if [ -z "$pattern_validator_script" ]; then if command -v python3 &> /dev/null; then - pattern_validator_script=$(PYTHONSTARTUP= python3 -S </dev/null + pattern_validator_script=$(PYTHONSTARTUP= python3 -S <<-EOFPYTHON 2>/dev/null import json try: with open('$pattern_file', 'r') as f: @@ -513,7 +513,7 @@ load_pattern() { print('') EOFPYTHON ) - pattern_validator_args=$(PYTHONSTARTUP= python3 -S </dev/null + pattern_validator_args=$(PYTHONSTARTUP= python3 -S <<-EOFPYTHON 2>/dev/null import json try: with open('$pattern_file', 'r') as f: @@ -540,7 +540,7 @@ load_pattern() { # provided mitigation wiring, we prefer that and avoid reopening the JSON. if [ -z "$pattern_mitigation_enabled" ] && [ -z "$pattern_mitigation_script" ] && [ -z "$pattern_severity_downgrade" ]; then if command -v python3 &> /dev/null; then - pattern_mitigation_enabled=$(PYTHONSTARTUP= python3 -S </dev/null + pattern_mitigation_enabled=$(PYTHONSTARTUP= python3 -S <<-EOFPYTHON 2>/dev/null import json try: with open('$pattern_file', 'r') as f: @@ -552,7 +552,7 @@ load_pattern() { print('false') EOFPYTHON ) - pattern_mitigation_script=$(PYTHONSTARTUP= python3 -S </dev/null + pattern_mitigation_script=$(PYTHONSTARTUP= python3 -S <<-EOFPYTHON 2>/dev/null import json try: with open('$pattern_file', 'r') as f: @@ -564,7 +564,7 @@ load_pattern() { print('') EOFPYTHON ) - pattern_mitigation_args=$(PYTHONSTARTUP= python3 -S </dev/null + pattern_mitigation_args=$(PYTHONSTARTUP= python3 -S <<-EOFPYTHON 2>/dev/null import json try: with open('$pattern_file', 'r') as f: @@ -576,7 +576,7 @@ load_pattern() { print('') EOFPYTHON ) - pattern_severity_downgrade=$(PYTHONSTARTUP= python3 -S </dev/null + pattern_severity_downgrade=$(PYTHONSTARTUP= python3 -S <<-EOFPYTHON 2>/dev/null import json try: with open('$pattern_file', 'r') as f: