Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Adds fixture coverage for both the ACF-style Launchpad-shaped case and a standalone baseline `_()` call
- Adds safe fixture coverage proving standard WordPress i18n helpers do not trigger the rule

- Added Path B observability for Magic String Detector aggregated patterns
- Tracks per-pattern phase timings for grep, extraction, and aggregation in `process_aggregated_pattern()`
- Tracks quality metrics for raw matches, extracted strings, unique strings, filtered strings, and emitted violations
- Logs lightweight state transitions (`GREP`, `EXTRACT`, `AGGREGATE`, `COMPLETE`) in debug output
- Shows per-pattern metrics in text output when `--verbose` or `PROFILE=1` is enabled
- Exposes cumulative `magic_string_metrics` in JSON output for downstream reporting
- Verified in text mode and JSON mode against `/Users/noelsaw/Documents/GH Repos/creditconnection2-self-service`

### Documentation

- Added `PROJECT/1-INBOX/PATTERN-PROPOSAL-LAUNCHPAD-CRASH.md`
Expand All @@ -24,6 +32,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Fixed

- Fixed stale-registry pattern-loader fallback that could make the Magic String Detector appear stuck
- Root cause: stale registry state forced the fallback parser path, pattern extraction failed there, `pattern_search` could become empty, and the first simple Magic String rule then looked like a hang
- Switched the fallback Python heredocs in `dist/lib/pattern-loader.sh` to tab-stripping heredocs so indented closing markers terminate correctly when the registry is stale
- Added a guard in `dist/bin/check-performance.sh` to skip simple rules with empty search patterns and emit an explicit warning instead of broad-matching the entire scan set
- Verified against `/Users/noelsaw/Documents/GH Repos/creditconnection2-self-service` with `--verbose`, profiling, debug tracing, and AI triage flags enabled
- Result: the scan now completes through Magic String Detector and Function Clone Detector; the verification run finished in 35s with `MAGIC_STRING_DETECTOR` profiled at 21311ms

#### Phase 0: Timeout Guards for Unprotected Recursive Grep Calls

- **Wrapped 8 raw `grep -r` file-discovery calls with `run_with_timeout "$MAX_SCAN_TIME"`**
Expand Down
91 changes: 91 additions & 0 deletions FEEDBACK-CR-SELF-SERVICE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# WPCC Pattern Library β€” False Positive Review
**Source:** AI review of creditconnection2-self-service scan
**Date:** 2026-03-23
**Scan findings:** 99 total | **Estimated true positives after fixes:** ~40

---

## Action Items

### βœ… Fix Now β€” High Confidence, Low Effort

- [ ] **FIX `php-shell-exec-functions.json` β€” `exec-call` pattern matches `curl_exec()`**
**Pattern:** `exec[[:space:]]*\(` has no word boundary β†’ matches `curl_exec(`.
**Fix:** Change to `\bexec[[:space:]]*\(` in the `exec-call` sub-pattern.
**File:** `dist/patterns/php-shell-exec-functions.json`
**FPs eliminated:** 8 (all CRITICAL β€” all were `curl_exec($curl)` calls)

- [ ] **FIX `php-dynamic-include.json` β€” WP-CLI bootstrap scripts flagged as LFI**
**Finding:** `check-user-meta.php:13` and `test-alternate-registry-id.php:24` β€” `$path` is iterated from a hardcoded static array, never user-controlled.
**Fix:** Add `*-user-meta.php`, `*-registry-id.php`, or more broadly `*/scripts/*` / `*/cli/*` to `exclude_files` in the pattern.
**File:** `dist/patterns/php-dynamic-include.json`
**FPs eliminated:** 2 (both CRITICAL)

---

### πŸ” Investigate Before Acting β€” Diagnosis Uncertain

- [ ] **INVESTIGATE "Direct superglobal manipulation" (~17 findings on `CURLOPT_POST`)**
**Reviewer claim:** `curl_setopt($curl, CURLOPT_POST, true)` is matched as superglobal manipulation.
**Our assessment:** The `spo-002-superglobals` pattern requires `$_` prefix in all branches β€” `CURLOPT_POST` cannot match it.
**Action:** Re-examine the actual line numbers in the scan log for these 17 findings. Determine which rule is actually firing and why. Do NOT apply the reviewer's suggested fix (`$_` anchoring) β€” it's already implemented.
**File:** `dist/patterns/spo-002-superglobal-manipulation.json` (likely not the culprit)

- [ ] **INVESTIGATE "Sanitized reads flagged as unsanitized" for `sanitize_text_field($_GET[...])`**
**Finding:** `class-cr-rest-api.php` and `class-cr-business-rest-api.php` β€” `sanitize_text_field($_GET['registry_id'])` being flagged.
**Our assessment:** `sanitize_` is already in `exclude_patterns` for `unsanitized-superglobal-read`. This is likely a **multiline case** β€” `$_GET` on its own line while the sanitizer wraps on another.
**Action:** Confirm by inspecting actual flagged lines. If multiline, document as a known structural limitation (grep is line-scoped; lookbehinds won't help here). If same-line, there's a bug in the exclude logic.
**File:** `dist/patterns/unsanitized-superglobal-read.json`

---

### πŸ“‹ Deferred β€” Valid Issues, Structural Effort Required

- [ ] **DEFERRED: Add admin-only hook whitelist for capability check false positives**
**Finding:** `credit-registry-forms.php:48` β€” `add_action('admin_notices', ...)` flagged for missing capability check. `admin_notices` only fires for authenticated admin users.
**Reviewer recommendation:** Whitelist inherently-admin-only hooks (`admin_notices`, `admin_init`, `admin_menu`, etc.)
**Our assessment:** Correct diagnosis. Not fixable with regex alone β€” requires a hook whitelist in the scanner logic. Downgrade severity to INFO as interim.
**Effort:** Low–Medium | **FPs eliminated:** 1 per occurrence

- [ ] **DEFERRED: Strengthen N+1 loop detection to verify lexical containment**
**Finding 1:** `check-user-meta.php:23` β€” `get_user_meta()` called sequentially for a single user, not inside a user loop.
**Finding 2:** `class-cr-business-rest-api.php:245` β€” single `get_user_meta()` re-read after processing.
**Reviewer recommendation:** Confirm the meta call is lexically inside a loop body (`{...}`), not just nearby by line count.
**Our assessment:** The scanner has `is_iterating_over_multiple_objects()` heuristics. These may be gaps in that logic. Review and tighten the "loop containment" check.
**Effort:** Medium | **FPs eliminated:** 2

---

### βœ”οΈ No Action Required β€” Already Handled or Misdiagnosed

- [x] **SKIP β€” `isset()` exclusion for superglobal reads**
`isset\(` is already in `exclude_patterns` for `unsanitized-superglobal-read.json`. Reviewer's suggestion is already implemented.

- [x] **SKIP β€” `$_` prefix anchoring for superglobal manipulation**
All three sub-patterns in `spo-002-superglobals` already require `$_` prefix. The reviewer's suggested fix is already in place.

- [x] **SKIP β€” Sanitizer negative-lookbehind regex for `unsanitized-superglobal-read`**
The `exclude_patterns` list already handles same-line sanitizer wrapping. The multiline case is a structural grep limitation, not addressable by the proposed regex.

---

## Valid Issues Found (Not FPs β€” Tracker for Plugin Owner)

| # | File | Line | Issue | Risk |
|---|------|------|-------|------|
| 6 | `admin-test-page.php` | 191 | `$_GET['view_file']` used without `sanitize_file_name()`; `strpos($view_file, '..')` bypass-able via encoding | HIGH |
| 7 | `admin-test-page.php` | 145 | `$_GET['view_dir']` displayed with `esc_html()` before `sanitize_file_name()` on line 147 β€” safe but misordered | LOW (confusing) |
| 8 | `api-functions.php` | 1014 | `$_POST['force_refresh']` in AJAX handler β€” strict `=== 'true'` comparison limits injection, but verify nonce upstream | LOW–MEDIUM |

---

## Impact Summary

| Fix | File to Edit | Effort | FPs Eliminated |
|-----|-------------|--------|---------------|
| `\b` word boundary on `exec-call` | `php-shell-exec-functions.json` | 1 line | 8 |
| Add WP-CLI scripts to `exclude_files` | `php-dynamic-include.json` | 2 lines | 2 |
| Investigate superglobal 17-finding cluster | Scan log + `spo-002` | Investigation | Up to ~17 |
| Investigate multiline sanitizer FPs | Scan log + `unsanitized-superglobal-read` | Investigation | Up to ~20 |
| Admin-only hook whitelist | `check-performance.sh` | Medium | 1+ per scan |
| N+1 loop containment tightening | `check-performance.sh` | Medium | 2+ per scan |
72 changes: 72 additions & 0 deletions PROJECT/2-WORKING/FEEDBACK-CR-SELF-SERVICE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
Verdict: Mostly False Positives / Scanner Noise

"Shell command execution" (CRITICAL) β€” False Positive
All 8 findings flag curl_exec($curl). This is PHP's cURL library function, not shell execution. It's the standard way to make HTTP requests in PHP without WordPress's wp_remote_* wrappers. No shell is involved β€” completely safe.

"Direct superglobal manipulation" on CURLOPT_POST/CURLOPT_POSTFIELDS (HIGH) β€” False Positive
The scanner is incorrectly matching curl_setopt($curl, CURLOPT_POST, true) as "superglobal manipulation." These are cURL options, not $_POST superglobal writes. This accounts for ~17 of the findings.

"Dynamic PHP include/require" (CRITICAL) β€” False Positive
Both check-user-meta.php:13 and test-alternate-registry-id.php:24 are WP-CLI test scripts that locate wp-load.php from a hardcoded relative path array. The $path variable is never user-controlled β€” it's iterated from a static array. No risk.

"N+1 query pattern" (CRITICAL) β€” False Positive
check-user-meta.php:23 β€” This is a flat script calling get_user_meta() sequentially for a single user, not inside a loop over users.
class-cr-business-rest-api.php:245 β€” This is a single get_user_meta() re-read after processing, not an N+1 pattern.

"Admin function missing capability check" (HIGH) β€” False Positive
credit-registry-forms.php:48 β€” add_action('admin_notices', ...) is a standard WordPress pattern for showing a dependency notice when a plugin is deactivated. The admin_notices hook itself only fires in the admin panel for authenticated users. The unset($_GET['activate']) on line 51 is also a standard WP pattern to suppress the "Plugin activated" message after forced deactivation.

6. Unsanitized $_GET['view_file'] (HIGH) β€” Valid Issue
At admin-test-page.php:191, $_GET['view_file'] is used without sanitize_file_name(). The strpos($view_file, '..') === false check on line 193 is a weak directory traversal guard (can be bypassed with encodings). Should use sanitize_file_name() like view_dir does on line 147.

7. Unsanitized $_GET['view_dir'] display before sanitization (HIGH) β€” Valid Issue
At admin-test-page.php:145, $_GET['view_dir'] is output with esc_html() (safe for XSS), but the sanitization via sanitize_file_name() happens on line 147 β€” after the display. The display itself is safe due to esc_html(), but the order is confusing.

8. $_POST['force_refresh'] (HIGH) β€” Low Risk
At api-functions.php:1014, this is compared strictly to the string 'true', so it can only ever be a boolean. No injection vector. However, this runs inside a WP AJAX handler β€” verify nonce checks exist upstream.

Many $_GET reads with sanitize_text_field() β€” False Positive
Lines in class-cr-rest-api.php and class-cr-business-rest-api.php that do sanitize_text_field($_GET['registry_id']) are already properly sanitized. The scanner flags the raw $_GET access but ignores the wrapping sanitization.

---

Easy wins
curl_exec() flagged as shell execution β€” The regex is likely matching /\b(exec|shell_exec|system|passthru)\s*\(/. Just add a negative lookbehind for curl_:


/\b(?<!curl_)(exec|shell_exec|system|passthru)\s*\(/
CURLOPT_POST flagged as superglobal manipulation β€” The regex is probably matching POST in any context. If the rule targets $_POST, ensure the pattern anchors on the $_ prefix:


/\$_(POST|GET|REQUEST|SERVER)\s*\[/
That alone would eliminate ~25 false positives from this report.

Sanitized reads flagged as unsanitized β€” This is the biggest noise source. A single-line lookahead can catch the most common WordPress pattern where the $_GET/$_POST access is wrapped in a sanitization call:


/(?<!sanitize_text_field\()(?<!sanitize_file_name\()(?<!esc_html\()(?<!esc_attr\()(?<!wp_verify_nonce\()(?<!absint\()\$_(GET|POST|REQUEST)\s*\[/
Or invert it β€” match the line, then exclude if the $_GET[...] appears as an argument to a known sanitizer on the same line. A two-pass approach is cleaner:

Match $_GET['foo']
Skip if the match is inside sanitize_text_field(, absint(, wp_verify_nonce(, etc.
isset($_GET[...]) / isset($_POST[...]) flagged β€” isset() checks don't read the value. These should be excluded entirely:


/(?<!isset\()\$_(GET|POST|REQUEST)\[/
Harder but doable
Dynamic include with hardcoded paths β€” The scanner can't easily tell that $path comes from a static array vs user input. But you could reduce noise by checking if the variable was assigned from a literal array within N lines above. Alternatively, skip flagging when the require is inside a file_exists() guard on the same variable β€” that pattern almost always indicates a bootstrap loader, not user-controlled inclusion.

N+1 detection β€” The heuristic "meta call inside loop" needs to actually verify there is a loop. If the scanner is matching get_user_meta anywhere near a foreach/for/while, it should confirm the meta call is lexically inside the loop body (between { and }), not just nearby by line count.

Not really fixable with regex
Admin capability check β€” Determining whether a current_user_can() check exists somewhere in the call chain before an add_action('admin_notices', ...) requires control-flow analysis. A regex scanner can't do this reliably. Best approach: downgrade this to INFO severity, or maintain a whitelist of hooks that are inherently admin-only (admin_notices, admin_init, etc.).

Summary of impact
Fix Effort False positives eliminated
curl_exec negative lookbehind 1 line 8
Anchor superglobal on $_ prefix 1 line ~17
Skip isset() wrapping 1 line ~10
Skip known sanitizer wrapping Medium ~20
Loop verification for N+1 Medium 2
Whitelist admin-only hooks Low 1
The first three fixes alone would cut this report from 99 findings to roughly 40, with almost no loss of true positives.
68 changes: 68 additions & 0 deletions PROJECT/3-COMPLETED/MAGIC-STRINGS-OBSERVABILITY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Magic Strings Detector - Observability Enhancement

**Created:** 2026-03-24
**Completed:** 2026-03-24
**Status:** Completed
**Shipped In:** 2.2.9
**Selected Path:** B (Timing + Quality + State)

## Summary

Implemented Path B observability for the Magic String Detector after completing a technical spike that removed a stale-registry fallback defect.

The detector now reports per-pattern timing and quality metrics in verbose/profile text output and emits cumulative `magic_string_metrics` in JSON output.

## Implementation

- Completed the technical spike that fixed the stale-registry fallback parser in `dist/lib/pattern-loader.sh` and guarded empty search patterns in `dist/bin/check-performance.sh`.
- Added cumulative Magic String Detector observability counters in `dist/bin/check-performance.sh`.
- Added per-pattern runtime metrics to `process_aggregated_pattern()`:
- raw matches
- extracted strings
- unique strings
- filtered strings
- violations added
- grep/extract/aggregate timings in milliseconds
- Added lightweight state transition logging for:
- `GREP`
- `EXTRACT`
- `AGGREGATE`
- `COMPLETE`
- Added text output for per-pattern metrics when `--verbose` or `PROFILE=1` is enabled.
- Added root-level JSON output block:

```json
"magic_string_metrics": {
"patterns_processed": 4,
"total_raw_matches": 7,
"total_extracted_strings": 7,
"total_unique_strings": 4,
"total_filtered_strings": 4,
"filtered_below_min_files": 4,
"filtered_below_min_matches": 4,
"total_violations": 0,
"timing_ms": {"grep": 130, "extract": 160, "aggregate": 101}
}
```

## Results

- Verification target: `/Users/noelsaw/Documents/GH Repos/creditconnection2-self-service`
- Text-mode verification completed successfully with the new per-pattern metrics visible in the Magic String Detector section.
- JSON-mode verification completed successfully with `magic_string_metrics` present in the output payload.
- Example text-mode output now includes lines like:
- `Metrics: 4 raw β†’ 4 extracted β†’ 1 unique β†’ 0 violations`
- `Filtered: 1 strings (min_files=1, min_matches=1)`
- `Timing: grep=34ms extract=89ms agg=32ms`

## Lessons Learned

- Fixing the stale-registry fallback first was necessary; otherwise the new observability would have described a broken execution path.
- The existing profiling and debug hooks were sufficient for Path B, so the implementation stayed small and localized.
- There are still unrelated issues outside this task, including a missing validator script for `wp-template-tags-in-loops` and baseline noise from historical paths.

## Related

- `CHANGELOG.md`
- `dist/bin/check-performance.sh`
- `dist/lib/pattern-loader.sh`
30 changes: 23 additions & 7 deletions dist/PATTERN-LIBRARY.json
Original file line number Diff line number Diff line change
@@ -1,27 +1,27 @@
{
"version": "1.0.0",
"generated": "2026-02-07T05:07:35Z",
"generated": "2026-03-24T01:34:38Z",
"summary": {
"total_patterns": 55,
"enabled": 55,
"total_patterns": 56,
"enabled": 56,
"disabled": 0,
"by_severity": {
"CRITICAL": 20,
"HIGH": 17,
"MEDIUM": 13,
"MEDIUM": 14,
"LOW": 4
},
"by_category": {
"performance": 22,"Performance": 5,"duplication": 5,"reliability": 5,"security": 14
"performance": 22,"Performance": 5,"duplication": 5,"reliability": 6,"security": 14
},
"by_pattern_type": {
"php": 44,
"php": 45,
"headless": 6,
"nodejs": 4,
"javascript": 1
},
"mitigation_detection_enabled": 7,
"heuristic_patterns": 17,
"heuristic_patterns": 18,
"definitive_patterns": 38
},
"patterns": [
Expand Down Expand Up @@ -467,6 +467,22 @@
"search_pattern": "\\.then[[:space:]]*\\([^)]+\\)[[:space:]]*;[[:space:]]*$|Promise\\.all[[:space:]]*\\([^)]+\\)\\.then",
"file_patterns": ["*.js", "*.jsx", "*.ts", "*.tsx"]
},
{
"id": "nonstandard-wordpress-translation-alias",
"version": "1.0.0",
"enabled": true,
"category": "reliability",
"severity": "MEDIUM",
"title": "Non-standard WordPress translation alias `_()`",
"description": "Detects `_()` calls in WordPress PHP code. WordPress standard translation helpers are `__()`, `esc_html__()`, `esc_attr__()`, and related native APIs. Using `_()` introduces ambiguity, reduces readability, and can cause compatibility or bootstrap fragility in real-world theme/plugin environments.",
"detection_type": "simple",
"pattern_type": "php",
"mitigation_detection": false,
"heuristic": true,
"file": "nonstandard-wordpress-translation-alias.json",
"search_pattern": "(^|[^[:alnum:]_])_[[:space:]]*\\(",
"file_patterns": ["*.php"]
},
{
"id": "nopaging-true",
"version": "1.0.0",
Expand Down
Loading
Loading