diff --git a/4X4.md b/4X4.md new file mode 100644 index 0000000..9500dd9 --- /dev/null +++ b/4X4.md @@ -0,0 +1,103 @@ +--- +client: Hypercart / Neochrome +repo: https://github.com/Hypercart-Dev-Tools/WP-Code-Check.git +last_edit: 2026-03-23 +week_of: 2026-03-23 +source_pr_number: n/a +sprint: planning-refresh + +--- + +Pro-tip: Ask your VS Code AI to self populate the metadata above. + +# 4x4 Dashboard - Strategic Goals & Weekly Checklist +A simple, actionable framework to prioritize and track engineering tasks. Focus on alignment, transparency, continuous improvement, and increasing clarity for everyone. This document does not replace your project management tools. It is meant to be a simple, actionable checklist to help you focus on the most important tasks for the week. + +# About the Current Project +WP Code Check is a zero-dependency static analysis toolkit for WordPress performance, security, and reliability issues. Current engineering focus is stabilizing the search backend, reducing false positives in noisy direct-pattern rules, restoring fixture parity, and creating a measured path for an optional Semgrep backend without destabilizing the Bash scanner. + +--- + +## 1. Strategic Backlog +**Maximum of 4 items. Focus on long-term goals and impactful improvements.** +**Update/Reminder:** If these are new projects without a clear scope, consider using the [Project Scope Outline](#bonus-project-scope-outline) below. + +1. - [ ] Unify search backends under one wrapper layer - Replace timeout-wrapped raw file-discovery calls and helper fallbacks with a single API for file discovery, line matches, and context extraction. +2. - [ ] Restore fixture and scorecard trust - Audit failing fixtures, align expected counts with intended semantics, and make regression output credible before larger rule migrations. +3. - [ ] Pilot Semgrep on the highest-noise direct rules - Start with `unsanitized-superglobal-read`, `unsanitized-superglobal-isset-bypass`, `wpdb-query-no-prepare`, and `file-get-contents-url` using side-by-side scorecards. +4. - [ ] Continue rule quality cleanup in Bash - Keep reducing false positives where the current engine is already close, especially heuristics and context-sensitive detectors that do not map cleanly to Semgrep. + +--- + +## 2. Current Week +**Active tasks for the week. Maximum of 4 items.** + +> **Tip:** If your team frequently handles urgent issues, consider reserving 1-2 slots for hotfixes. Otherwise, use all 4 slots for planned work. + +- [x] Refresh planning source of truth - Updated the Semgrep migration plan to match the current codebase, deprecated `BACKLOG.md`, and moved active planning into this 4X4. +- [ ] Add file-discovery wrapper spike - Draft `cached_file_search()` or equivalent and replace one or two high-value call sites such as `AJAX_FILES` and `TERMS_FILES` to prove the interface. Benefit: fewer one-off search code paths, easier maintenance, and a lower chance that one slow check behaves differently from the rest. +- [ ] Add observability for slow checks - Implement per-check timeout warnings and a small top-N slow-check summary so long scans stop looking stuck. Benefit: users can see what is slow, what timed out, and whether the scanner is still making progress instead of guessing that it froze. +- [ ] Audit fixture mismatches and shortlist Semgrep pilots - Re-run failing fixtures, classify false positives vs desired behavior, and finalize the first four Semgrep scorecard candidates. Benefit: more trustworthy test results, clearer migration decisions, and less risk of moving a noisy or inaccurate rule to a new backend. + +--- + +## 3. Previous Week +**Review completed, deferred, or blocked tasks from the prior week.** + +- [x] Added Path B observability for aggregated magic-string patterns - phase timing and quality counters are now visible in text and JSON output. +- [x] Fixed stale-registry fallback behavior - eliminated one apparent hang path in the pattern loader and guarded empty search patterns. +- [x] Fixed high-noise direct-pattern false positives - reduced `php-shell-exec-functions`, `spo-002-superglobals`, and `php-dynamic-include` noise with targeted scanner and pattern fixes. +- [ ] Phase 0b observability remains incomplete - heartbeat output and slow-check rollups are still deferred and need a focused pass. + +--- + +## 4. Recent Lessons Learned +**Capture insights to improve processes and avoid repeating mistakes.** + +1. Small search-path inconsistencies create outsized noise - most recent false positives came from runner behavior and shell quoting, not from the pattern ideas themselves. +2. Timeout protection is necessary but not sufficient - a timed-out check that silently passes prevents hangs, but it also hides diagnostic value unless we surface warnings and timing data. +3. Planning drift happens fast in a monolithic script - line-number-based roadmap docs stale quickly, so strategic docs need periodic refreshes tied to current code references. +4. Not every noisy rule is a Semgrep problem first - if a Bash rule is already close after a targeted fix, it should drop in Semgrep priority behind noisier or structurally simpler candidates. + +--- + +## Bonus: Project Scope Outline + +> **Note:** This section is a work-in-progress and is being developed as a natural extension of the 4x4 methodology. It is not yet a core part of the framework but is included here as a supplementary tool for teams who want to add more context to their planning. + +### 1. Goals +Keep WP Code Check fast, explainable, and operationally reliable while improving detection quality on high-value WordPress performance and security checks. + +### 2. Assumptions +- The current Bash scanner remains the default engine through any Semgrep pilot. +- Search backend stabilization should land incrementally rather than via a large rewrite. +- Fixture parity work is required before scorecards will be trusted for migration decisions. +- The highest-value weekly work is small and verifiable: one wrapper spike, one observability pass, one fixture audit pass. + +### 3. Potential Risks +- Semgrep rules may look cleaner on paper but still miss WordPress-specific context that the Bash pipeline currently encodes. +- Wrapper changes can improve maintainability while accidentally changing output parity if they are not backed by fixture comparisons. +- Monolithic-script edits can create regression risk in unrelated checks unless changes stay narrow and verified. +- Planning can split across too many docs unless this file stays the active weekly view. + +### 4. Long-Term Maintainability +Long-term sustainability depends on shrinking the amount of bespoke search behavior in `check-performance.sh`, keeping rule logic externalized where possible, maintaining trustworthy fixtures, and promoting only the rules to Semgrep that clearly beat the Bash implementation on quality and runtime. + +--- + +## How to Use This Template + +For detailed guidance on the 4x4 framework, see the [README](README.md). + +**Quick tips:** +- Update this document weekly (move "Current Week" to "Previous Week" and plan your new week) +- Limit each section to 4 items maximum to maintain focus +- Capture lessons learned immediately while they're fresh +- Link to detailed tasks in your project management tools (GitHub, Jira, etc.) + +--- + +## License +This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) License. To view a copy of this license, visit [https://creativecommons.org/licenses/by/4.0/](https://creativecommons.org/licenses/by/4.0/). + +Copyright © 2026 Hypercart DBA Neochrome, Inc. | 4x4Clarity.com diff --git a/CHANGELOG.md b/CHANGELOG.md index bd8aa61..3e434d9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -22,6 +22,11 @@ All notable changes to this project will be documented in this file. ### Documentation +- Refreshed planning docs to match the current codebase and consolidated active planning into `4X4.md` + - Updated `PROJECT/1-INBOX/FEATURE-SEMGREP-MIGRATION-PLAN.md` with current search-backend hotspots, refreshed Phase 0 status notes, and reprioritized the Semgrep pilot list based on current false-positive pressure + - Deprecated `PROJECT/2-WORKING/BACKLOG.md` as an active planning source and converted it into a pointer to `4X4.md` plus the Semgrep roadmap doc + - Populated `4X4.md` with current strategic goals, weekly work, prior-week accomplishments, and planning assumptions for WP Code Check + - Added `PROJECT/1-INBOX/PATTERN-PROPOSAL-LAUNCHPAD-CRASH.md` - Captures the plan for converting the Launchpad crash lessons into generalized WPCC anti-pattern proposals - Recommends against a single environment-specific "crash detector" diff --git a/FEEDBACK-CR-SELF-SERVICE.md b/FEEDBACK-CR-SELF-SERVICE.md index 472c884..10715b7 100644 --- a/FEEDBACK-CR-SELF-SERVICE.md +++ b/FEEDBACK-CR-SELF-SERVICE.md @@ -9,37 +9,43 @@ ### ✅ Fix Now — High Confidence, Low Effort -- [ ] **FIX `php-shell-exec-functions.json` — `exec-call` pattern matches `curl_exec()`** +- [x] **FIX `php-shell-exec-functions.json` — `exec-call` pattern matches `curl_exec()`** ✅ *Fixed in commit 740ba08* **Pattern:** `exec[[:space:]]*\(` has no word boundary → matches `curl_exec(`. **Fix:** Change to `\bexec[[:space:]]*\(` in the `exec-call` sub-pattern. **File:** `dist/patterns/php-shell-exec-functions.json` **FPs eliminated:** 8 (all CRITICAL — all were `curl_exec($curl)` calls) -- [ ] **FIX `php-dynamic-include.json` — WP-CLI bootstrap scripts flagged as LFI** +- [x] **`php-dynamic-include.json` — WP-CLI bootstrap scripts no longer flagged as LFI** ✅ *Resolved in follow-up commit* **Finding:** `check-user-meta.php:13` and `test-alternate-registry-id.php:24` — `$path` is iterated from a hardcoded static array, never user-controlled. - **Fix:** Add `*-user-meta.php`, `*-registry-id.php`, or more broadly `*/scripts/*` / `*/cli/*` to `exclude_files` in the pattern. - **File:** `dist/patterns/php-dynamic-include.json` + **Attempted fix (740ba08 — insufficient):** Added `wp-load` to `exclude_patterns`, but the actual matched line is `require_once $path;` — it does not contain `wp-load`. + **Proper fix:** Added new `exclude_if_file_contains` capability to the simple pattern runner and `dist/bin/check-performance.sh`. When a matched file's content contains any string listed in the new `exclude_if_file_contains` JSON array, all matches in that file are suppressed. Added `"wp eval-file"` to `php-dynamic-include.json` under this key — both WP-CLI scripts have this string in their docblock comment. + **Files changed:** `dist/bin/check-performance.sh` (runner feature), `dist/patterns/php-dynamic-include.json` (new exclusion key) **FPs eliminated:** 2 (both CRITICAL) --- -### 🔍 Investigate Before Acting — Diagnosis Uncertain - -- [ ] **INVESTIGATE "Direct superglobal manipulation" (~17 findings on `CURLOPT_POST`)** - **Reviewer claim:** `curl_setopt($curl, CURLOPT_POST, true)` is matched as superglobal manipulation. - **Our assessment:** The `spo-002-superglobals` pattern requires `$_` prefix in all branches — `CURLOPT_POST` cannot match it. - **Action:** Re-examine the actual line numbers in the scan log for these 17 findings. Determine which rule is actually firing and why. Do NOT apply the reviewer's suggested fix (`$_` anchoring) — it's already implemented. - **File:** `dist/patterns/spo-002-superglobal-manipulation.json` (likely not the culprit) - -- [ ] **INVESTIGATE "Sanitized reads flagged as unsanitized" for `sanitize_text_field($_GET[...])`** - **Finding:** `class-cr-rest-api.php` and `class-cr-business-rest-api.php` — `sanitize_text_field($_GET['registry_id'])` being flagged. - **Our assessment:** `sanitize_` is already in `exclude_patterns` for `unsanitized-superglobal-read`. This is likely a **multiline case** — `$_GET` on its own line while the sanitizer wraps on another. - **Action:** Confirm by inspecting actual flagged lines. If multiline, document as a known structural limitation (grep is line-scoped; lookbehinds won't help here). If same-line, there's a bug in the exclude logic. - **File:** `dist/patterns/unsanitized-superglobal-read.json` +### ✅ Implemented After Investigation + +- [x] **FIX `spo-002-superglobals` inline grep corruption** ✅ *Implemented in scanner* + **Scan log findings:** 31 total spo-002 findings. 16 are `CURLOPT_POST`/`CURLOPT_POSTFIELDS`, 2 are JS `type: 'POST'` strings, 4 are `$_SERVER` reads (SERVER not in the pattern alternation), 1 is the only legitimate finding (line 1014). + **Root cause confirmed:** The inline bash spo-002 grep (check-performance.sh ~line 3723) uses a **double-quoted string with `\\$_`**. In bash double-quotes, `\\` → `\` and then `$_` starts expansion of the bash `$_` special variable (last argument of the previous command). At runtime, `$_` contains the last argument from `text_echo "▸ Direct superglobal manipulation..."` — an ANSI-coloured string including `[HIGH]`. This corrupts the entire ERE pattern, causing it to match incorrectly in a non-deterministic way. + **The JSON pattern itself (`\$_(GET|POST...)`) is correct** — tested via `load_pattern` + direct grep, it does NOT match CURLOPT_POST. The bug is entirely in the inline bash code, not the JSON pattern file. + **Fix implemented:** Changed the inline grep at line 3723 from double-quoted to single-quoted string, which prevents `$_` expansion. This is a **scanner bug, not a pattern bug**. The JSON file did not need to change. + **File to fix:** `dist/bin/check-performance.sh` ~line 3723 + **Verified impact:** `spo-002-superglobals` dropped from **31 → 3** findings in the follow-up scan. Remaining 3 are legitimate review cases: `$_POST['force_refresh']`, `unset($_GET['activate'])`, and `$_GET['view_errors']` conditional logic. + +- [x] **FIX simple runner ignoring `exclude_patterns` / `exclude_files`** ✅ *Implemented in scanner* + **Scan log findings:** 30 `unsanitized-superglobal-read` findings. Confirmed FPs include: `class-cr-rest-api.php:90`, `class-cr-rest-api.php:98`, `class-cr-rest-api.php:843`, `class-cr-business-rest-api.php:103`, `class-cr-business-rest-api.php:138`, `class-cr-business-rest-api.php:857` — all are same-line ternary patterns like `isset($_GET['x']) ? sanitize_text_field($_GET['x']) : ''`. + **Root cause confirmed:** The simple pattern runner (`check-performance.sh` ~line 5970) runs `cached_grep -E "$pattern_search"` but **never applies `exclude_patterns` from the JSON definition**. The `exclude_patterns` array in `unsanitized-superglobal-read.json` (which includes `sanitize_`, `isset\(`, `esc_`, etc.) is loaded but silently ignored. The legacy inline checks manually pipe through `grep -v` to apply exclusions; the JSON-driven simple runner does not. + **This is NOT a multiline issue** — the flagged lines all have the sanitizer wrapper on the same line. The exclusion simply isn't being applied at all by the simple runner. + **Additional FPs from same root cause:** `clear-person-cache.php:34`, `setup-user-registry-id.php:23-24`, `set-account-type.php:26-27` — all properly guarded `$_POST` reads with nonce verification on the same line. + **Fix implemented:** The simple pattern runner now parses both `exclude_patterns` and `exclude_files` from the JSON pattern file and filters matches before JSON findings are added. This improves behavior across all JSON-defined `grep`/`simple` patterns, not just `unsanitized-superglobal-read`. + **File to fix:** `dist/bin/check-performance.sh` ~line 5970 (simple pattern runner grep call) + **Verified impact:** `unsanitized-superglobal-read` dropped from **30 → 19** findings in the follow-up scan. The remaining 19 are mostly other classes of reads that still require separate tuning, especially the dedicated `unsanitized-superglobal-isset-bypass` rule. --- -### 📋 Deferred — Valid Issues, Structural Effort Required +### 📋 Deferred — Investigate Further Before Implementing - [ ] **DEFERRED: Add admin-only hook whitelist for capability check false positives** **Finding:** `credit-registry-forms.php:48` — `add_action('admin_notices', ...)` flagged for missing capability check. `admin_notices` only fires for authenticated admin users. @@ -81,11 +87,13 @@ ## Impact Summary -| Fix | File to Edit | Effort | FPs Eliminated | -|-----|-------------|--------|---------------| -| `\b` word boundary on `exec-call` | `php-shell-exec-functions.json` | 1 line | 8 | -| Add WP-CLI scripts to `exclude_files` | `php-dynamic-include.json` | 2 lines | 2 | -| Investigate superglobal 17-finding cluster | Scan log + `spo-002` | Investigation | Up to ~17 | -| Investigate multiline sanitizer FPs | Scan log + `unsanitized-superglobal-read` | Investigation | Up to ~20 | -| Admin-only hook whitelist | `check-performance.sh` | Medium | 1+ per scan | -| N+1 loop containment tightening | `check-performance.sh` | Medium | 2+ per scan | +| Fix | File to Edit | Effort | FPs Eliminated | Status | +|-----|-------------|--------|---------------|--------| +| `\b` word boundary on `exec-call` | `php-shell-exec-functions.json` | 1 line | 8 | ✅ Done (740ba08) | +| `exclude_if_file_contains` + `wp eval-file` | `check-performance.sh` + `php-dynamic-include.json` | Medium | 2 verified | ✅ Done | +| Single-quote inline spo-002 grep | `check-performance.sh` ~L3723 | 1 line | 28 verified | ✅ Done | +| Apply `exclude_patterns` in simple runner | `check-performance.sh` ~L5970 | Medium | 11 verified | ✅ Done | +| Admin-only hook whitelist | `check-performance.sh` | Medium | 1+ per scan | 📋 Deferred | +| N+1 loop containment tightening | `check-performance.sh` | Medium | 2+ per scan | 📋 Deferred | + +**Latest measured totals:** 99 findings before scanner fixes → **88 findings after first round** → **86 findings after dynamic-include fix**. diff --git a/PROJECT/1-INBOX/FEATURE-SEMGREP-MIGRATION-PLAN.md b/PROJECT/1-INBOX/FEATURE-SEMGREP-MIGRATION-PLAN.md index 74502c6..ae34c0f 100644 --- a/PROJECT/1-INBOX/FEATURE-SEMGREP-MIGRATION-PLAN.md +++ b/PROJECT/1-INBOX/FEATURE-SEMGREP-MIGRATION-PLAN.md @@ -3,10 +3,10 @@ **Created:** 2026-02-10 **Status:** Phase 0a Complete **Priority:** High -**Last Updated:** 2026-02-09 +**Last Updated:** 2026-03-23 ## Problem/Request -Intermittent scan stalls occur on very large repositories (expected risk) and sometimes on smaller projects (unexpected). The current scanner mixes cached and uncached recursive search paths, with several raw `grep -r*` and `xargs grep` call sites that can still cause unstable runtime behavior. +Intermittent scan stalls still matter on very large repositories and the scanner still mixes multiple search paths. The highest-risk problems are no longer unguarded hangs in the active path; they are now a maintainability split between `cached_grep()`-based scans, timeout-wrapped raw recursive file discovery, and helper-level `xargs` or raw `grep -r` fallback behavior. ## Context - Scanner entrypoint: `dist/bin/check-performance.sh` @@ -16,16 +16,21 @@ Intermittent scan stalls occur on very large repositories (expected risk) and so - `cached_grep()` - `.wpcignore` support - `--skip-magic-strings` -- Remaining high-risk hotspots still use raw recursive grep or raw xargs patterns: - - `dist/bin/check-performance.sh:2617` - - `dist/bin/check-performance.sh:3222` - - `dist/bin/check-performance.sh:4216` - - `dist/bin/check-performance.sh:4617` - - `dist/bin/check-performance.sh:5024` - - `dist/bin/check-performance.sh:5271` - - `dist/bin/check-performance.sh:5463` - - `dist/bin/check-performance.sh:5554` - - `dist/bin/check-performance.sh:5633` +- Active scan path still contains timeout-wrapped raw recursive file-discovery calls: + - `dist/bin/check-performance.sh:4372` - `AJAX_FILES` + - `dist/bin/check-performance.sh:4773` - `TERMS_FILES` + - `dist/bin/check-performance.sh:5180` - `CRON_FILES` + - `dist/bin/check-performance.sh:5427` - `N1_FILES` + - `dist/bin/check-performance.sh:5619` - `THANKYOU_CONTEXT_FILES` + - `dist/bin/check-performance.sh:5710` - `SMART_COUPONS_FILES` + - `dist/bin/check-performance.sh:5722` - `PERF_RISK_FILES` + - `dist/bin/check-performance.sh:5789` - `JSON_RESPONSE_FILES` +- Helper-level raw search behavior still exists and is part of the maintenance burden: + - `dist/bin/check-performance.sh:3378` - aggregated pattern `xargs grep` + - `dist/bin/check-performance.sh:3381` - aggregated pattern raw `grep -rHn` fallback + - `dist/bin/check-performance.sh:3576` - `fast_grep()` `xargs grep` + - `dist/bin/check-performance.sh:3579` - `fast_grep()` raw `grep -rHn` fallback + - `dist/bin/check-performance.sh:3634` - `cached_grep()` raw `grep -rHn` fallback ## Direct Answer: Improvements Possible on Raw Recursive/Xargs Paths Yes. The most impactful improvements are: @@ -48,15 +53,15 @@ Yes. The most impactful improvements are: **Tasks** 1. ~~Replace known raw recursive/xargs hotspots with safer cached/wrapped calls.~~ → Refined: xargs calls at lines 2617/3222 are intentionally using pre-cached file lists inside already-protected paths — not actual hotspots. The real issue was 8 unprotected `grep -rl` file-discovery calls. 2. Standardize null-delimited file handling for all multi-file grep execution. → Deferred (existing `cached_grep` already uses `tr '\n' '\0' | xargs -0`; the 8 patched calls are file-discovery, not line-matching) -3. ✅ **Ensure every expensive check uses timeout guards.** — Complete (2026-02-09). Wrapped 8 raw `grep -r` calls with `run_with_timeout "$MAX_SCAN_TIME"`: - - `AJAX_FILES` (line ~4216) - - `TERMS_FILES` (line ~4617) - - `CRON_FILES` (line ~5024) - - `N1_FILES` (line ~5271, pipeline — timeout wraps recursive grep stage) - - `THANKYOU_CONTEXT_FILES` (line ~5463) - - `SMART_COUPONS_FILES` (line ~5554) - - `PERF_RISK_FILES` (line ~5566) - - `JSON_RESPONSE_FILES` (line ~5633) +3. ✅ **Ensure every expensive check uses timeout guards.** — Complete (2026-02-09). The active file-discovery calls remain raw `grep -r*`, but they are wrapped with `run_with_timeout "$MAX_SCAN_TIME"`: + - `AJAX_FILES` (line ~4372) + - `TERMS_FILES` (line ~4773) + - `CRON_FILES` (line ~5180) + - `N1_FILES` (line ~5427, pipeline — timeout wraps the recursive grep stage) + - `THANKYOU_CONTEXT_FILES` (line ~5619) + - `SMART_COUPONS_FILES` (line ~5710) + - `PERF_RISK_FILES` (line ~5722) + - `JSON_RESPONSE_FILES` (line ~5789) 4. Add heartbeat logs every 10 seconds for long loops. → Deferred to Phase 0b 5. Add top-N slow checks summary at end of scan. → Deferred to Phase 0b 6. Improve docs for `.wpcignore`, `--skip-magic-strings`, and `MAX_SCAN_TIME`. → Deferred to Phase 0b @@ -67,13 +72,13 @@ Yes. The most impactful improvements are: - Baseline performance snapshot (before/after) → Deferred to Phase 0b **Exit Criteria** -- [x] No unguarded raw `grep -r*` in active scan path (remaining raw grep -r only inside `fast_grep()`/`cached_grep()` fallback paths — by design) +- [x] No unguarded raw `grep -r*` in active scan path; however, active scan still contains timeout-wrapped raw file-discovery calls and helper internals still retain raw `grep -r` / `xargs grep` fallback behavior - [ ] Small-project scans complete reliably in repeated runs → Needs verification testing - [ ] Users can identify long-running checks from logs → Deferred to Phase 0b **Implementation Notes (2026-02-09):** - The xargs calls at lines 2617 and 3222 were originally listed as hotspots but are inside already-protected paths (pre-cached file list + `run_with_timeout`). Removed from scope. -- Timeout behavior: on timeout, check returns empty result, reports "passed," scan continues. Silent degradation chosen over hang. Per-check timeout warnings deferred (see BACKLOG.md). +- Timeout behavior: on timeout, check returns empty result, reports "passed," scan continues. Silent degradation chosen over hang. Per-check timeout warnings remain deferred and are now tracked in `4X4.md` instead of `BACKLOG.md`. - No new functions or abstractions introduced — reuses existing `run_with_timeout` infrastructure. ### Phase 1: Unified Search Backend Wrapper @@ -113,17 +118,17 @@ Yes. The most impactful improvements are: - Semgrep is optional via feature flag. - Pilot only direct/noisy rule subset. -**Candidate Rules (initial)** +**Candidate Rules (reprioritized for current false-positive pressure)** 1. `unsanitized-superglobal-read` -2. `spo-002-superglobal-manipulation` +2. `unsanitized-superglobal-isset-bypass` 3. `wpdb-query-no-prepare` -4. `php-eval-injection` -5. `php-dynamic-include` -6. `php-shell-exec-functions` -7. `php-hardcoded-credentials` -8. `unsanitized-superglobal-isset-bypass` -9. `file-get-contents-url` -10. `wp-json-html-escape` (evaluate feasibility) +4. `file-get-contents-url` +5. `wp-json-html-escape` +6. `php-hardcoded-credentials` +7. `php-eval-injection` +8. `spo-002-superglobal-manipulation` - lower urgency after the inline grep quoting fix +9. `php-dynamic-include` - lower urgency after the WP-CLI bootstrap false-positive fix +10. `php-shell-exec-functions` - lower urgency after the `curl_exec()` word-boundary fix **Tasks** 1. Implement `--search-backend semgrep` toggle (default remains current backend). diff --git a/PROJECT/2-WORKING/BACKLOG.md b/PROJECT/2-WORKING/BACKLOG.md deleted file mode 100644 index 3652537..0000000 --- a/PROJECT/2-WORKING/BACKLOG.md +++ /dev/null @@ -1,368 +0,0 @@ -# Backlog - Issues to Investigate - -## 2026-02-09 - -### Deferred from Phase 0 (Semgrep Migration Plan) - -Phase 0a (timeout guards) is complete. The following items were scoped out and deferred: - -- [ ] **Phase 0b: Observability** — Add heartbeat logs every 10s for long-running check loops; add top-N slow checks summary at end of scan. Not required for stability, but improves debugging when scans are slow. -- [ ] **Phase 1: Unified Search Backend Wrapper** — Create a single file-discovery wrapper (like `cached_grep` but for `grep -rl` operations). Currently the 8 protected calls still use raw `grep -r` with timeout — they work but don't benefit from the cached PHP file list. A `cached_file_search` function could route file-discovery through the pre-built cache for 10-50x speedup on those checks. Only pursue if post-Phase-0 profiling shows these checks are still slow in practice. -- [ ] **Timeout wrapping inside `fast_grep()` / `cached_grep()` fallback paths** — Lines 3225, 3423, 3478 use raw `grep -r` as fallback when no PHP file cache exists (e.g., JS-only projects). These are low-risk (only triggered without cache) but could be wrapped for completeness. -- [ ] **Per-check timeout warnings** — Currently, if a check times out, the result is silently empty (check passes). Adding a user-visible `⚠ Check timed out` message (like the aggregated pattern handler at line ~2627) would improve transparency. Deferred to avoid duplicating timeout detection logic at 8 sites; a helper function would be cleaner. - -See: `PROJECT/1-INBOX/FEATURE-SEMGREP-MIGRATION-PLAN.md` for Phase 2 (Semgrep pilot) and Phase 3 (production promotion). - ---- - -## 2026-01-27 - -- [x] Add System CLI support - -- [x] Append file names with first 4 characters of the plugin name to the output file name so its easier to find later. - -## 2026-01-17 -- [ ] Add new Test Fixtures for DSM patterns -- [ ] Research + decision: verify whether `spo-002-superglobals-bridge` should be supported in `should_suppress_finding()` (in `dist/bin/check-performance.sh`) and define the implementation path (add allowlist vs require baseline); update DSM fixture plan accordingly. - -### Checklist - 2026-01-15 -- [x] Add Tier 1 rules - First 6 completed -- [x] Last TTY fix for HTML output -- [x] Grep optimization complete (Phase 2.5) - 10-50x faster on large directories -- [x] **DONE: Optimize Magic String Detector (aggregation logic)** - Shell escaping fixed (v1.3.21) -- [x] **DONE: Optimize Function Clone Detector** - Now opt-in by default (v1.3.20) -- [ ] Make a comment in main script to make rules in external files going forward -- [ ] Breakout check-performance.sh into multiple files and move to all external rule files -- [ ] **TODO: Update HTML report to clarify DRY Violations section when clone detection is skipped** - - Currently shows "DRY Violations (0) - Test did not run" which is misleading - - Should show: "DRY Violations (0) - Magic Strings: 0, Function Clones: Skipped" - - Or split into two sections: "Magic Strings" and "Function Clones (Skipped)" - - Related: Both Magic String Detector and Function Clone Detector add to same DRY_VIOLATIONS array - - [ ] **P1: Align fixture expectations with current pattern library + registry-backed loader** - - 7/10 fixture tests are currently failing due to `total_errors` / `total_warnings` mismatches (e.g. `antipatterns.php`, `clean-code.php`, `file-get-contents-url.php`, `cron-interval-validation.php`). - - Before simply updating expected counts, audit each failing fixture to confirm whether new findings represent **desired behavior** or **false positives** (especially in `clean-code.php`). - - Once semantics are confirmed, either (a) adjust the underlying patterns/validators to restore the intended behavior, or (b) update the expected counts in `dist/tests/run-fixture-tests.sh` to match the new, correct semantics. - - Re-run `./tests/run-fixture-tests.sh --trace` and ensure all fixtures pass under the registry-backed loader path. - -## ✅ COMPLETED: Phase 3 Performance Optimization (Magic String & Clone Detection) - -**Status:** ✅ Complete (2026-01-15) -**Priority:** HIGH -**Created:** 2026-01-15 -**Completed:** 2026-01-15 -**Version:** 1.3.20 -**See:** `PROJECT/3-COMPLETED/PHASE-3-PERFORMANCE-OPTIMIZATION.md` - -### Problem - -After completing grep optimization (10-50x speedup), **two major performance bottlenecks remain**: - -1. **Magic String Detector (Aggregated Patterns)** - Complex aggregation logic causes hangs -2. **Function Clone Detector** - Consumes 94% of scan time, causes timeouts on large codebases - -**Current behavior:** -- Small plugin (8 files): 114 seconds total, 108s in clone detector -- Large plugin (500+ files): **TIMEOUT** (>10 minutes) -- WooCommerce scan: Never completes - -### Root Causes - -**Magic String Detector:** -- Uses `process_aggregated_pattern()` with complex grouping logic -- Runs multiple grep passes per pattern -- No timeout protection on aggregation loops -- May have O(n²) complexity in aggregation phase - -**Function Clone Detector:** -- Processes every PHP file individually -- Extracts function signatures with multiple grep passes -- Computes MD5 hash for every function -- Nested loops for comparison (O(n² × m) complexity) -- No file count limits (violates Phase 1 safeguards) -- No timeout protection - -### Proposed Solutions - -**Priority 1: Make Clone Detection Optional (Quick Win)** ✅ COMPLETE -- [x] Add `--skip-clone-detection` flag (already exists, verified working) -- [x] Make clone detection **opt-in** by default (implemented in v1.3.20) -- [x] Add `--enable-clone-detection` flag to explicitly enable -- [x] Update help text and documentation - -**Priority 2: Add Safeguards to Clone Detector** ✅ COMPLETE -- [x] Add `MAX_FILES` limit (already existed: MAX_CLONE_FILES=100) -- [x] Add timeout wrapper (already existed: run_with_timeout) -- [x] Add progress indicators (already existed: every 10 seconds) -- [x] Early exit if no duplicates found (implemented: checks unique hashes) - -**Priority 3: Optimize Clone Detection Algorithm** ✅ COMPLETE -- [x] Cache function signatures (not needed with sampling) -- [x] Use associative arrays (already implemented) -- [x] Sampling for large codebases (implemented: 50-100 files = every 2nd, 100+ = every 3rd) -- [ ] External tool integration (deferred: current solution is sufficient) - -**Priority 4: Profile Magic String Detector** ✅ COMPLETE -- [x] Add profiling to `process_aggregated_pattern()` (granular step timing added) -- [x] Identify slow steps (grep, extraction, aggregation all timed) -- [x] Add timeout protection (already existed from Phase 1) -- [ ] Caching aggregation results (not needed: no aggregated patterns in current pattern library) - -### Expected Impact - -**With clone detection disabled (default):** -- Small plugin (< 10 files): ~5-10 seconds (vs. 114s currently) -- Medium plugin (10-50 files): ~10-30 seconds -- Large plugin (50-200 files): ~30-60 seconds -- WooCommerce-sized (500+ files): ~60-120 seconds - -**With optimized clone detection (opt-in):** -- Small plugin: ~20-30 seconds (vs. 114s currently) -- Medium plugin: ~1-2 minutes (vs. 5-10 minutes currently) -- Large plugin: ~3-5 minutes (vs. 20-30 minutes currently) -- WooCommerce-sized: ~10-15 minutes (vs. TIMEOUT currently) - -### Acceptance Criteria ✅ ALL COMPLETE - -- [x] Full directory scans complete without timeout (< 5 minutes for 500 files) -- [x] Clone detection is opt-in (disabled by default) -- [x] Magic String Detector completes in < 30 seconds for 500 files -- [x] Progress indicators show which section is running -- [x] Re-profiling shows < 10% time in clone detection when disabled -- [x] Documentation updated with performance expectations - -### Related Files - -- `PROJECT/2-WORKING/PHASE-2-PERFORMANCE-PROFILING.md` - Profiling data and analysis -- `dist/bin/check-performance.sh` - Lines 1748+ (clone detection), 2150+ (aggregated patterns) - ---- - -## Mini Project Plan: Enhanced Context Detection (False Positive Reduction) - -Goal: Improve context/scope accuracy (especially “same function”) to reduce false positives and severity inflation, while keeping the scanner fast and zero-dependency. - -Notes: -- This is **not a new standalone script**. `dist/bin/check-performance.sh` already has limited “same function” scoping (used in caching mitigation); this mini-project extends/centralizes that approach. - -- [ ] Audit where we rely on context windows today (±N lines) and where “same function” scoping would reduce false positives. -- [x] Add/centralize a helper to compute function/method scope boundaries (support `function foo()`, `public/protected/private static function foo()`, and common formatting). -- [x] Use the helper in mitigation detection (so caching/ids-only/admin-only/parent-scoped all share the same scoping rules). -- [x] Add 2–4 fixtures that prove: (a) cross-function false positives are prevented, (b) true positives still fire. -- [ ] Validate on 1–2 real repos + gather feedback: - - [ ] Are false positives still a problem? - - [ ] Is baseline suppression working well? - - [ ] Do users want AST-level accuracy? -- [ ] Short-Medium Term: MCP Server - Send tasks to agents for work -- [ ] Super Long term: Agnostic anamaoly detection and pattern library - -Completed (so far): -- Centralized function/method scope detection in `dist/bin/check-performance.sh` and applied it across mitigation detectors. -- Added fixture coverage for class methods (including `private static function` and admin-only gating inside a method). -- Increased fixture validation default/template count to 20. - -Constraints: -- 2–3 hours -- No new dependencies -- Preserve fast performance - -Decision gate (AST scanner only if needed): -- [ ] Users demand higher accuracy -- [ ] False positives remain a major pain point -- [ ] Users accept dependencies + slower performance - -Status: In progress (partially complete) - -## ✅ RESOLVED 2025-12-31: Fixture Validation Subprocess Issue - -**Resolution:** Refactored to use direct pattern matching instead of subprocess calls. - -### Original Problem -The fixture validation feature (proof of detection) was partially implemented but had a subprocess output parsing issue. - -### What We Built -1. Added `validate_single_fixture()` function that runs check-performance.sh against a fixture file -2. Added `run_fixture_validation()` function that tests 4 core fixtures: - - `antipatterns.php` (expect 6 errors, 3-5 warnings) - - `clean-code.php` (expect 0 errors, 1 warning) - - `ajax-safe.php` (expect 0 errors, 0 warnings) - - `file-get-contents-url.php` (expect 4 errors, 0 warnings) -3. Added `NEOCHROME_SKIP_FIXTURE_VALIDATION=1` environment variable to prevent infinite recursion -4. Added output to text, JSON, and HTML reports - -### The Bug -When the script calls itself recursively to validate fixtures, the subprocess output is different: -- **Manual command line run**: Output is ~11,000 chars, correctly shows `"total_errors": 6` -- **From within script**: Output is ~3,200 chars, parsing returns 0 errors/0 warnings - -### Debug Evidence -``` -[DEBUG] Testing fixture: antipatterns.php (expect 6 errors, 3-5 warnings) -[DEBUG] Output length: 3274 -[DEBUG] Got: 0 errors, 0 warnings -[DEBUG] antipatterns.php: FAILED -``` - -But manually running the same command works: -```bash -NEOCHROME_SKIP_FIXTURE_VALIDATION=1 ./bin/check-performance.sh --paths "./tests/fixtures/antipatterns.php" --format json --no-log -# Returns: "total_errors": 6, "total_warnings": 5 -``` - -### Possible Causes to Investigate -1. **Environment inheritance**: Some variable from parent process affecting child -2. **Path resolution**: `$SCRIPT_DIR` might resolve differently in subprocess -3. **Output format**: Subprocess might be outputting text instead of JSON -4. **Grep parsing**: The regex might not be matching due to whitespace/formatting -5. **Subshell behavior**: Variables or state being shared unexpectedly - -### Files Modified -- `dist/bin/check-performance.sh` - Added fixture validation functions (lines 809-905 approx) -- `dist/bin/report-templates/report-template.html` - Added fixture status badge in footer -- `CHANGELOG.md` - Documented feature (entry exists but feature not fully working) - -### Debug Code Left In -The following debug statements are currently in the code (search for `NEOCHROME_DEBUG`): -- Line ~825: Output length debug -- Line ~840: Got X errors debug -- Line ~878: Testing fixture debug -- Line ~884: PASSED/FAILED debug - -### Next Steps -1. Add more debug to see actual output content (not just length) -2. Check if subprocess is outputting text format instead of JSON -3. Try redirecting stderr separately to see if there are errors -4. Check if `$SCRIPT_DIR` resolves correctly in subprocess context -5. Consider alternative approach: use exit codes instead of parsing JSON - -### Workaround (if needed) -Could disable fixture validation temporarily by setting: -```bash -export NEOCHROME_SKIP_FIXTURE_VALIDATION=1 -``` - -### Priority -Medium - Feature is additive (proof of detection), core scanning still works fine. - ---- - -### Resolution Details (2025-12-31) - -**Problem:** Subprocess calls were returning truncated/different output when called from within the script. - -**Solution:** Instead of spawning subprocesses to run full scans, we now use direct `grep` pattern matching against fixture files: - -```bash -# Old approach (broken): -output=$("$SCRIPT_DIR/check-performance.sh" --paths "$fixture_file" --format json) - -# New approach (working): -actual_count=$(grep -c "$pattern" "$fixture_file") -``` - -**Result:** All 4 fixture validations now pass: -- `antipatterns.php` - detects `get_results` (unbounded queries) -- `antipatterns.php` - detects `get_post_meta` (N+1 patterns) -- `file-get-contents-url.php` - detects `file_get_contents` (external URLs) -- `clean-code.php` - detects `posts_per_page` (bounded queries) - -**Output locations:** -- Text: Shows "✓ Detection verified: 4 test fixtures passed" in SUMMARY -- JSON: Includes `fixture_validation` object with status, passed, failed counts -- HTML: Shows green "✓ Detection Verified (4 fixtures)" badge in footer - ---- - -## ✅ MOSTLY COMPLETE: Migrate Inline Patterns to External JSON Rules - -**Status:** 93% Complete (56 JSON patterns, 5 inline patterns remaining) -**Priority:** LOW (remaining patterns are complex/custom logic) -**Owner:** Core maintainer -**Created:** 2026-01-02 -**Updated:** 2026-01-15 (v1.3.23 - Migrated 4 security patterns) - -### Current State (2026-01-15) - -**✅ Migrated to JSON:** 52 patterns in `dist/patterns/` -- Simple patterns (direct grep): ~35 patterns -- Scripted patterns (with validators): ~8 patterns -- Aggregated patterns (magic strings): ~4 patterns -- Clone detection patterns: ~5 patterns - -**❌ Remaining Inline (5 patterns):** - -| Pattern ID | Type | Reason Not Migrated | Complexity | -|------------|------|---------------------|------------| -| `spo-001-debug-code` | Multi-language (PHP+JS) | Uses `OVERRIDE_GREP_INCLUDE` for multiple file types | Medium | -| `hcc-001-localstorage-exposure` | JavaScript security | Multiple patterns with complex alternation | Medium | -| `hcc-002-client-serialization` | JavaScript security | Multiple patterns with complex alternation | Medium | -| `hcc-008-unsafe-regexp` | Multi-language RegExp | Complex pattern matching | Medium | -| `spo-003-insecure-deserialization` | Security critical | No JSON file exists yet | Low | - -**✅ Recently Migrated (v1.3.23 - 2026-01-15):** - -| Pattern ID | JSON File | Status | -|------------|-----------|--------| -| `php-eval-injection` | ✅ `dist/patterns/php-eval-injection.json` | Migrated - runs from JSON | -| `php-dynamic-include` | ✅ `dist/patterns/php-dynamic-include.json` | Migrated - runs from JSON | -| `php-shell-exec-functions` | ✅ `dist/patterns/php-shell-exec-functions.json` | Migrated - runs from JSON | -| `php-hardcoded-credentials` | ✅ `dist/patterns/php-hardcoded-credentials.json` | Migrated - runs from JSON | -| `php-user-controlled-file-write` | ✅ `dist/patterns/php-user-controlled-file-write.json` | Kept inline - needs multi-pattern runner | - -**❌ Custom Logic Patterns (not suitable for JSON):** - -| Pattern | Lines | Reason | -|---------|-------|--------| -| `unbounded-wc-get-orders` | 3995-4036 | Context analysis (checks for `limit => -1` in surrounding lines) | -| `wp-query-unbounded` | 4347-4385 | Multi-step validation (checks for `posts_per_page`, `nopaging`, context) | -| `n-plus-1-pattern` | 4729-4766 | Two-phase grep (find meta calls, then check for loops) | -| `wc-n-plus-one-pattern` | 4792-4845 | Complex context analysis (loop detection + WC function calls) | - -### Assessment: Are We Done? - -**YES, for practical purposes:** - -1. **85% coverage** - 52/61 patterns are in JSON (all new patterns since v1.0.68) -2. **All simple patterns migrated** - Remaining are complex/custom logic -3. **Pattern library infrastructure complete:** - - ✅ Pattern loader (`dist/lib/pattern-loader.sh`) - - ✅ Pattern discovery (simple, scripted, aggregated, clone detection) - - ✅ Pattern library manager (auto-generates registry) - - ✅ Pattern library documentation (`PATTERN-LIBRARY.json`, `PATTERN-LIBRARY.md`) - -4. **Remaining inline patterns are intentional:** - - **Security-critical patterns** (eval, shell exec) kept inline for visibility - - **Complex context analysis** (N+1, unbounded queries) require custom logic - - **Multi-language patterns** (JS+PHP) need special handling - -### Recommendation: Close This Task - -**Rationale:** -- ✅ Goal achieved: "External JSON as single source of truth for **simple** detection rules" -- ✅ Infrastructure complete: Pattern loader, discovery, registry, docs -- ✅ New patterns use JSON: CONTRIBUTING.md updated (v1.0.68+) -- ✅ Maintainability improved: 52 patterns are data-driven -- ⚠️ Remaining patterns are **intentionally inline** due to complexity - -**Remaining work (if needed):** -- [ ] Migrate 5 security patterns to JSON (low priority, already have JSON files) -- [ ] Document why 4 custom logic patterns stay inline (add comments in script) -- [ ] Consider creating "scripted" pattern type for complex context analysis - -### Definition of Done (Revised) - -- [x] All **simple** production rules live in `dist/patterns/*.json` -- [x] Pattern loader infrastructure complete -- [x] Pattern library auto-generated and documented -- [x] CONTRIBUTING.md updated to prefer JSON patterns -- [x] New patterns (v1.0.68+) use JSON exclusively -- [ ] ~~All patterns in JSON~~ (revised: complex patterns intentionally inline) -- [x] Fixture and regression tests pass with no change in counts - -### Conclusion - -**This task is COMPLETE for its original intent.** The remaining 9 inline patterns are either: -1. Security-critical (intentionally visible in main script) -2. Complex custom logic (not suitable for simple JSON patterns) -3. Already have JSON files but use `run_check` for backward compatibility - -**Recommend:** Move this to `PROJECT/3-COMPLETED/` and create a new task for "Advanced Pattern Types" if needed. diff --git a/PROJECT/1-INBOX/AI-DDTK-DOCUMENTATION-UPDATE.md b/PROJECT/3-COMPLETED/AI-DDTK-DOCUMENTATION-UPDATE.md similarity index 100% rename from PROJECT/1-INBOX/AI-DDTK-DOCUMENTATION-UPDATE.md rename to PROJECT/3-COMPLETED/AI-DDTK-DOCUMENTATION-UPDATE.md diff --git a/PROJECT/4-MISC/BACKLOG.md b/PROJECT/4-MISC/BACKLOG.md new file mode 100644 index 0000000..856a522 --- /dev/null +++ b/PROJECT/4-MISC/BACKLOG.md @@ -0,0 +1,10 @@ +# Backlog - Deprecated + +This file is no longer an active planning source. + +As of 2026-03-23: +- Use `4X4.md` for strategic priorities and the current week's work. +- Use `PROJECT/1-INBOX/FEATURE-SEMGREP-MIGRATION-PLAN.md` for the Semgrep and search-backend roadmap. +- Use task-specific documents under `PROJECT/1-INBOX`, `PROJECT/2-WORKING`, and `PROJECT/3-COMPLETED` for implementation details. + +This file is retained only to avoid breaking historical references in older docs and changelog entries. It should not receive new tasks or status updates. diff --git a/dist/PATTERN-LIBRARY.json b/dist/PATTERN-LIBRARY.json index 198fed5..933d08c 100644 --- a/dist/PATTERN-LIBRARY.json +++ b/dist/PATTERN-LIBRARY.json @@ -1,6 +1,6 @@ { "version": "1.0.0", - "generated": "2026-03-24T01:34:38Z", + "generated": "2026-03-24T02:51:36Z", "summary": { "total_patterns": 56, "enabled": 56, @@ -576,7 +576,7 @@ "mitigation_detection": false, "heuristic": false, "file": "php-shell-exec-functions.json", - "search_pattern": "shell_exec[[:space:]]*\\(|exec[[:space:]]*\\(|system[[:space:]]*\\(|passthru[[:space:]]*\\(", + "search_pattern": "shell_exec[[:space:]]*\\(|\\bexec[[:space:]]*\\(|system[[:space:]]*\\(|passthru[[:space:]]*\\(", "file_patterns": ["*.php"] }, { diff --git a/dist/bin/check-performance.sh b/dist/bin/check-performance.sh index 906e161..aac2a76 100755 --- a/dist/bin/check-performance.sh +++ b/dist/bin/check-performance.sh @@ -3720,7 +3720,7 @@ SUPERGLOBAL_VISIBLE="" # PERFORMANCE: Use cached file list instead of grep -r # NOTE: Explicitly restrict to PHP files so that documentation (e.g. .md) and # non-PHP assets are not scanned when running in JS-only or mixed repos. -SUPERGLOBAL_MATCHES=$(cached_grep --include=*.php -E "unset\\(\\$_(GET|POST|REQUEST|COOKIE)\\[|\\$_(GET|POST|REQUEST)[[:space:]]*=|\\$_(GET|POST|REQUEST|COOKIE)\\[[^]]*\\][[:space:]]*=" | \ +SUPERGLOBAL_MATCHES=$(cached_grep --include=*.php -E 'unset\(\$_(GET|POST|REQUEST|COOKIE)\[|\$_(GET|POST|REQUEST)[[:space:]]*=|\$_(GET|POST|REQUEST|COOKIE)\[[^]]*\][[:space:]]*=' | \ grep -v '//.*\$_' || true) if [ -n "$SUPERGLOBAL_MATCHES" ]; then @@ -5963,6 +5963,49 @@ if [ -n "$SIMPLE_PATTERNS" ]; then include_args="--include=*.php" fi + local exclude_file_globs="" + local exclude_line_patterns="" + local exclude_file_contains="" + local current_exclusion_block="" + + while IFS= read -r json_line; do + case "$json_line" in + *'"exclude_files"'*) + current_exclusion_block="exclude_files" + continue + ;; + *'"exclude_patterns"'*) + current_exclusion_block="exclude_patterns" + continue + ;; + *'"exclude_if_file_contains"'*) + current_exclusion_block="exclude_if_file_contains" + continue + ;; + esac + + if [ -n "$current_exclusion_block" ]; then + if echo "$json_line" | grep -q ']'; then + current_exclusion_block="" + continue + fi + + exclusion_value=$(echo "$json_line" | sed -n 's/^[[:space:]]*"\(.*\)"[[:space:]]*,\{0,1\}[[:space:]]*$/\1/p') + if [ -n "$exclusion_value" ]; then + if [ "$current_exclusion_block" = "exclude_files" ]; then + exclude_file_globs="${exclude_file_globs}${exclusion_value} +" + elif [ "$current_exclusion_block" = "exclude_if_file_contains" ]; then + exclude_file_contains="${exclude_file_contains}${exclusion_value} +" + else + exclude_line_patterns="${exclude_line_patterns}${exclusion_value} +" + fi + fi + fi + done < "$pattern_file" + # Run grep with the pattern # PERFORMANCE: Use cached file list instead of grep -r matches="" @@ -5977,6 +6020,7 @@ if [ -n "$SIMPLE_PATTERNS" ]; then if [ "$match_count" -gt 0 ]; then # Apply baseline suppression suppressed_count=0 + excluded_count=0 visible_matches="" while IFS= read -r match; do @@ -5986,8 +6030,44 @@ if [ -n "$SIMPLE_PATTERNS" ]; then line=$(echo "$match" | cut -d: -f2) code=$(echo "$match" | cut -d: -f3-) + excluded_by_pattern=false + + if [ -n "$exclude_file_globs" ]; then + while IFS= read -r exclude_glob; do + [ -z "$exclude_glob" ] && continue + case "$file" in + $exclude_glob) + excluded_by_pattern=true + break + ;; + esac + done <<< "$exclude_file_globs" + fi + + if [ "$excluded_by_pattern" = false ] && [ -n "$exclude_file_contains" ]; then + while IFS= read -r contain_str; do + [ -z "$contain_str" ] && continue + if grep -qF "$contain_str" "$file" 2>/dev/null; then + excluded_by_pattern=true + break + fi + done <<< "$exclude_file_contains" + fi + + if [ "$excluded_by_pattern" = false ] && [ -n "$exclude_line_patterns" ]; then + while IFS= read -r exclude_pattern; do + [ -z "$exclude_pattern" ] && continue + if printf '%s\n' "$code" | grep -qE "$exclude_pattern" 2>/dev/null; then + excluded_by_pattern=true + break + fi + done <<< "$exclude_line_patterns" + fi + + if [ "$excluded_by_pattern" = true ]; then + ((excluded_count++)) || true # Check baseline suppression - if should_suppress_finding "$pattern_id" "$file"; then + elif should_suppress_finding "$pattern_id" "$file"; then ((suppressed_count++)) || true else # Add to visible matches @@ -6003,13 +6083,16 @@ $match" fi done <<< "$matches" - visible_count=$((match_count - suppressed_count)) + visible_count=$((match_count - suppressed_count - excluded_count)) if [ "$visible_count" -gt 0 ]; then text_echo "${check_color} ✗ Found $visible_count violation(s)${NC}" if [ "$suppressed_count" -gt 0 ]; then text_echo " ${BLUE} (${suppressed_count} suppressed by baseline)${NC}" fi + if [ "$excluded_count" -gt 0 ]; then + text_echo " ${BLUE} (${excluded_count} excluded by pattern filters)${NC}" + fi # Increment error/warning counters if [ "$check_severity" = "CRITICAL" ] || [ "$check_severity" = "HIGH" ]; then diff --git a/dist/patterns/php-dynamic-include.json b/dist/patterns/php-dynamic-include.json index ca06786..8da9b78 100644 --- a/dist/patterns/php-dynamic-include.json +++ b/dist/patterns/php-dynamic-include.json @@ -28,6 +28,9 @@ "//.*include", "//.*require" ], + "exclude_if_file_contains": [ + "wp eval-file" + ], "exclude_files": [ "*/vendor/*", "*/node_modules/*", diff --git a/dist/patterns/php-shell-exec-functions.json b/dist/patterns/php-shell-exec-functions.json index 8f1f90d..4e5b1bb 100644 --- a/dist/patterns/php-shell-exec-functions.json +++ b/dist/patterns/php-shell-exec-functions.json @@ -17,7 +17,7 @@ }, { "id": "exec-call", - "pattern": "exec[[:space:]]*\\(" + "pattern": "\\bexec[[:space:]]*\\(" }, { "id": "system-call",