Skip to content

Commit b5e48d4

Browse files
authored
Merge pull request #120 from Hypercart-Dev-Tools/development
Development to Main
2 parents 8c3466d + 289ad5a commit b5e48d4

File tree

8 files changed

+458
-62
lines changed

8 files changed

+458
-62
lines changed

CHANGELOG.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1515
- Adds fixture coverage for both the ACF-style Launchpad-shaped case and a standalone baseline `_()` call
1616
- Adds safe fixture coverage proving standard WordPress i18n helpers do not trigger the rule
1717

18+
- Added Path B observability for Magic String Detector aggregated patterns
19+
- Tracks per-pattern phase timings for grep, extraction, and aggregation in `process_aggregated_pattern()`
20+
- Tracks quality metrics for raw matches, extracted strings, unique strings, filtered strings, and emitted violations
21+
- Logs lightweight state transitions (`GREP`, `EXTRACT`, `AGGREGATE`, `COMPLETE`) in debug output
22+
- Shows per-pattern metrics in text output when `--verbose` or `PROFILE=1` is enabled
23+
- Exposes cumulative `magic_string_metrics` in JSON output for downstream reporting
24+
- Verified in text mode and JSON mode against `/Users/noelsaw/Documents/GH Repos/creditconnection2-self-service`
25+
1826
### Documentation
1927

2028
- Added `PROJECT/1-INBOX/PATTERN-PROPOSAL-LAUNCHPAD-CRASH.md`
@@ -24,6 +32,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
2432

2533
### Fixed
2634

35+
- Fixed stale-registry pattern-loader fallback that could make the Magic String Detector appear stuck
36+
- Root cause: stale registry state forced the fallback parser path, pattern extraction failed there, `pattern_search` could become empty, and the first simple Magic String rule then looked like a hang
37+
- Switched the fallback Python heredocs in `dist/lib/pattern-loader.sh` to tab-stripping heredocs so indented closing markers terminate correctly when the registry is stale
38+
- Added a guard in `dist/bin/check-performance.sh` to skip simple rules with empty search patterns and emit an explicit warning instead of broad-matching the entire scan set
39+
- Verified against `/Users/noelsaw/Documents/GH Repos/creditconnection2-self-service` with `--verbose`, profiling, debug tracing, and AI triage flags enabled
40+
- Result: the scan now completes through Magic String Detector and Function Clone Detector; the verification run finished in 35s with `MAGIC_STRING_DETECTOR` profiled at 21311ms
41+
2742
#### Phase 0: Timeout Guards for Unprotected Recursive Grep Calls
2843

2944
- **Wrapped 8 raw `grep -r` file-discovery calls with `run_with_timeout "$MAX_SCAN_TIME"`**

FEEDBACK-CR-SELF-SERVICE.md

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
# WPCC Pattern Library — False Positive Review
2+
**Source:** AI review of creditconnection2-self-service scan
3+
**Date:** 2026-03-23
4+
**Scan findings:** 99 total | **Estimated true positives after fixes:** ~40
5+
6+
---
7+
8+
## Action Items
9+
10+
### ✅ Fix Now — High Confidence, Low Effort
11+
12+
- [ ] **FIX `php-shell-exec-functions.json``exec-call` pattern matches `curl_exec()`**
13+
**Pattern:** `exec[[:space:]]*\(` has no word boundary → matches `curl_exec(`.
14+
**Fix:** Change to `\bexec[[:space:]]*\(` in the `exec-call` sub-pattern.
15+
**File:** `dist/patterns/php-shell-exec-functions.json`
16+
**FPs eliminated:** 8 (all CRITICAL — all were `curl_exec($curl)` calls)
17+
18+
- [ ] **FIX `php-dynamic-include.json` — WP-CLI bootstrap scripts flagged as LFI**
19+
**Finding:** `check-user-meta.php:13` and `test-alternate-registry-id.php:24``$path` is iterated from a hardcoded static array, never user-controlled.
20+
**Fix:** Add `*-user-meta.php`, `*-registry-id.php`, or more broadly `*/scripts/*` / `*/cli/*` to `exclude_files` in the pattern.
21+
**File:** `dist/patterns/php-dynamic-include.json`
22+
**FPs eliminated:** 2 (both CRITICAL)
23+
24+
---
25+
26+
### 🔍 Investigate Before Acting — Diagnosis Uncertain
27+
28+
- [ ] **INVESTIGATE "Direct superglobal manipulation" (~17 findings on `CURLOPT_POST`)**
29+
**Reviewer claim:** `curl_setopt($curl, CURLOPT_POST, true)` is matched as superglobal manipulation.
30+
**Our assessment:** The `spo-002-superglobals` pattern requires `$_` prefix in all branches — `CURLOPT_POST` cannot match it.
31+
**Action:** Re-examine the actual line numbers in the scan log for these 17 findings. Determine which rule is actually firing and why. Do NOT apply the reviewer's suggested fix (`$_` anchoring) — it's already implemented.
32+
**File:** `dist/patterns/spo-002-superglobal-manipulation.json` (likely not the culprit)
33+
34+
- [ ] **INVESTIGATE "Sanitized reads flagged as unsanitized" for `sanitize_text_field($_GET[...])`**
35+
**Finding:** `class-cr-rest-api.php` and `class-cr-business-rest-api.php``sanitize_text_field($_GET['registry_id'])` being flagged.
36+
**Our assessment:** `sanitize_` is already in `exclude_patterns` for `unsanitized-superglobal-read`. This is likely a **multiline case**`$_GET` on its own line while the sanitizer wraps on another.
37+
**Action:** Confirm by inspecting actual flagged lines. If multiline, document as a known structural limitation (grep is line-scoped; lookbehinds won't help here). If same-line, there's a bug in the exclude logic.
38+
**File:** `dist/patterns/unsanitized-superglobal-read.json`
39+
40+
---
41+
42+
### 📋 Deferred — Valid Issues, Structural Effort Required
43+
44+
- [ ] **DEFERRED: Add admin-only hook whitelist for capability check false positives**
45+
**Finding:** `credit-registry-forms.php:48``add_action('admin_notices', ...)` flagged for missing capability check. `admin_notices` only fires for authenticated admin users.
46+
**Reviewer recommendation:** Whitelist inherently-admin-only hooks (`admin_notices`, `admin_init`, `admin_menu`, etc.)
47+
**Our assessment:** Correct diagnosis. Not fixable with regex alone — requires a hook whitelist in the scanner logic. Downgrade severity to INFO as interim.
48+
**Effort:** Low–Medium | **FPs eliminated:** 1 per occurrence
49+
50+
- [ ] **DEFERRED: Strengthen N+1 loop detection to verify lexical containment**
51+
**Finding 1:** `check-user-meta.php:23``get_user_meta()` called sequentially for a single user, not inside a user loop.
52+
**Finding 2:** `class-cr-business-rest-api.php:245` — single `get_user_meta()` re-read after processing.
53+
**Reviewer recommendation:** Confirm the meta call is lexically inside a loop body (`{...}`), not just nearby by line count.
54+
**Our assessment:** The scanner has `is_iterating_over_multiple_objects()` heuristics. These may be gaps in that logic. Review and tighten the "loop containment" check.
55+
**Effort:** Medium | **FPs eliminated:** 2
56+
57+
---
58+
59+
### ✔️ No Action Required — Already Handled or Misdiagnosed
60+
61+
- [x] **SKIP — `isset()` exclusion for superglobal reads**
62+
`isset\(` is already in `exclude_patterns` for `unsanitized-superglobal-read.json`. Reviewer's suggestion is already implemented.
63+
64+
- [x] **SKIP — `$_` prefix anchoring for superglobal manipulation**
65+
All three sub-patterns in `spo-002-superglobals` already require `$_` prefix. The reviewer's suggested fix is already in place.
66+
67+
- [x] **SKIP — Sanitizer negative-lookbehind regex for `unsanitized-superglobal-read`**
68+
The `exclude_patterns` list already handles same-line sanitizer wrapping. The multiline case is a structural grep limitation, not addressable by the proposed regex.
69+
70+
---
71+
72+
## Valid Issues Found (Not FPs — Tracker for Plugin Owner)
73+
74+
| # | File | Line | Issue | Risk |
75+
|---|------|------|-------|------|
76+
| 6 | `admin-test-page.php` | 191 | `$_GET['view_file']` used without `sanitize_file_name()`; `strpos($view_file, '..')` bypass-able via encoding | HIGH |
77+
| 7 | `admin-test-page.php` | 145 | `$_GET['view_dir']` displayed with `esc_html()` before `sanitize_file_name()` on line 147 — safe but misordered | LOW (confusing) |
78+
| 8 | `api-functions.php` | 1014 | `$_POST['force_refresh']` in AJAX handler — strict `=== 'true'` comparison limits injection, but verify nonce upstream | LOW–MEDIUM |
79+
80+
---
81+
82+
## Impact Summary
83+
84+
| Fix | File to Edit | Effort | FPs Eliminated |
85+
|-----|-------------|--------|---------------|
86+
| `\b` word boundary on `exec-call` | `php-shell-exec-functions.json` | 1 line | 8 |
87+
| Add WP-CLI scripts to `exclude_files` | `php-dynamic-include.json` | 2 lines | 2 |
88+
| Investigate superglobal 17-finding cluster | Scan log + `spo-002` | Investigation | Up to ~17 |
89+
| Investigate multiline sanitizer FPs | Scan log + `unsanitized-superglobal-read` | Investigation | Up to ~20 |
90+
| Admin-only hook whitelist | `check-performance.sh` | Medium | 1+ per scan |
91+
| N+1 loop containment tightening | `check-performance.sh` | Medium | 2+ per scan |
Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
Verdict: Mostly False Positives / Scanner Noise
2+
3+
"Shell command execution" (CRITICAL) — False Positive
4+
All 8 findings flag curl_exec($curl). This is PHP's cURL library function, not shell execution. It's the standard way to make HTTP requests in PHP without WordPress's wp_remote_* wrappers. No shell is involved — completely safe.
5+
6+
"Direct superglobal manipulation" on CURLOPT_POST/CURLOPT_POSTFIELDS (HIGH) — False Positive
7+
The scanner is incorrectly matching curl_setopt($curl, CURLOPT_POST, true) as "superglobal manipulation." These are cURL options, not $_POST superglobal writes. This accounts for ~17 of the findings.
8+
9+
"Dynamic PHP include/require" (CRITICAL) — False Positive
10+
Both check-user-meta.php:13 and test-alternate-registry-id.php:24 are WP-CLI test scripts that locate wp-load.php from a hardcoded relative path array. The $path variable is never user-controlled — it's iterated from a static array. No risk.
11+
12+
"N+1 query pattern" (CRITICAL) — False Positive
13+
check-user-meta.php:23 — This is a flat script calling get_user_meta() sequentially for a single user, not inside a loop over users.
14+
class-cr-business-rest-api.php:245 — This is a single get_user_meta() re-read after processing, not an N+1 pattern.
15+
16+
"Admin function missing capability check" (HIGH) — False Positive
17+
credit-registry-forms.php:48 — add_action('admin_notices', ...) is a standard WordPress pattern for showing a dependency notice when a plugin is deactivated. The admin_notices hook itself only fires in the admin panel for authenticated users. The unset($_GET['activate']) on line 51 is also a standard WP pattern to suppress the "Plugin activated" message after forced deactivation.
18+
19+
6. Unsanitized $_GET['view_file'] (HIGH) — Valid Issue
20+
At admin-test-page.php:191, $_GET['view_file'] is used without sanitize_file_name(). The strpos($view_file, '..') === false check on line 193 is a weak directory traversal guard (can be bypassed with encodings). Should use sanitize_file_name() like view_dir does on line 147.
21+
22+
7. Unsanitized $_GET['view_dir'] display before sanitization (HIGH) — Valid Issue
23+
At admin-test-page.php:145, $_GET['view_dir'] is output with esc_html() (safe for XSS), but the sanitization via sanitize_file_name() happens on line 147 — after the display. The display itself is safe due to esc_html(), but the order is confusing.
24+
25+
8. $_POST['force_refresh'] (HIGH) — Low Risk
26+
At api-functions.php:1014, this is compared strictly to the string 'true', so it can only ever be a boolean. No injection vector. However, this runs inside a WP AJAX handler — verify nonce checks exist upstream.
27+
28+
Many $_GET reads with sanitize_text_field() — False Positive
29+
Lines in class-cr-rest-api.php and class-cr-business-rest-api.php that do sanitize_text_field($_GET['registry_id']) are already properly sanitized. The scanner flags the raw $_GET access but ignores the wrapping sanitization.
30+
31+
---
32+
33+
Easy wins
34+
curl_exec() flagged as shell execution — The regex is likely matching /\b(exec|shell_exec|system|passthru)\s*\(/. Just add a negative lookbehind for curl_:
35+
36+
37+
/\b(?<!curl_)(exec|shell_exec|system|passthru)\s*\(/
38+
CURLOPT_POST flagged as superglobal manipulation — The regex is probably matching POST in any context. If the rule targets $_POST, ensure the pattern anchors on the $_ prefix:
39+
40+
41+
/\$_(POST|GET|REQUEST|SERVER)\s*\[/
42+
That alone would eliminate ~25 false positives from this report.
43+
44+
Sanitized reads flagged as unsanitized — This is the biggest noise source. A single-line lookahead can catch the most common WordPress pattern where the $_GET/$_POST access is wrapped in a sanitization call:
45+
46+
47+
/(?<!sanitize_text_field\()(?<!sanitize_file_name\()(?<!esc_html\()(?<!esc_attr\()(?<!wp_verify_nonce\()(?<!absint\()\$_(GET|POST|REQUEST)\s*\[/
48+
Or invert it — match the line, then exclude if the $_GET[...] appears as an argument to a known sanitizer on the same line. A two-pass approach is cleaner:
49+
50+
Match $_GET['foo']
51+
Skip if the match is inside sanitize_text_field(, absint(, wp_verify_nonce(, etc.
52+
isset($_GET[...]) / isset($_POST[...]) flagged — isset() checks don't read the value. These should be excluded entirely:
53+
54+
55+
/(?<!isset\()\$_(GET|POST|REQUEST)\[/
56+
Harder but doable
57+
Dynamic include with hardcoded paths — The scanner can't easily tell that $path comes from a static array vs user input. But you could reduce noise by checking if the variable was assigned from a literal array within N lines above. Alternatively, skip flagging when the require is inside a file_exists() guard on the same variable — that pattern almost always indicates a bootstrap loader, not user-controlled inclusion.
58+
59+
N+1 detection — The heuristic "meta call inside loop" needs to actually verify there is a loop. If the scanner is matching get_user_meta anywhere near a foreach/for/while, it should confirm the meta call is lexically inside the loop body (between { and }), not just nearby by line count.
60+
61+
Not really fixable with regex
62+
Admin capability check — Determining whether a current_user_can() check exists somewhere in the call chain before an add_action('admin_notices', ...) requires control-flow analysis. A regex scanner can't do this reliably. Best approach: downgrade this to INFO severity, or maintain a whitelist of hooks that are inherently admin-only (admin_notices, admin_init, etc.).
63+
64+
Summary of impact
65+
Fix Effort False positives eliminated
66+
curl_exec negative lookbehind 1 line 8
67+
Anchor superglobal on $_ prefix 1 line ~17
68+
Skip isset() wrapping 1 line ~10
69+
Skip known sanitizer wrapping Medium ~20
70+
Loop verification for N+1 Medium 2
71+
Whitelist admin-only hooks Low 1
72+
The first three fixes alone would cut this report from 99 findings to roughly 40, with almost no loss of true positives.
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
# Magic Strings Detector - Observability Enhancement
2+
3+
**Created:** 2026-03-24
4+
**Completed:** 2026-03-24
5+
**Status:** Completed
6+
**Shipped In:** 2.2.9
7+
**Selected Path:** B (Timing + Quality + State)
8+
9+
## Summary
10+
11+
Implemented Path B observability for the Magic String Detector after completing a technical spike that removed a stale-registry fallback defect.
12+
13+
The detector now reports per-pattern timing and quality metrics in verbose/profile text output and emits cumulative `magic_string_metrics` in JSON output.
14+
15+
## Implementation
16+
17+
- Completed the technical spike that fixed the stale-registry fallback parser in `dist/lib/pattern-loader.sh` and guarded empty search patterns in `dist/bin/check-performance.sh`.
18+
- Added cumulative Magic String Detector observability counters in `dist/bin/check-performance.sh`.
19+
- Added per-pattern runtime metrics to `process_aggregated_pattern()`:
20+
- raw matches
21+
- extracted strings
22+
- unique strings
23+
- filtered strings
24+
- violations added
25+
- grep/extract/aggregate timings in milliseconds
26+
- Added lightweight state transition logging for:
27+
- `GREP`
28+
- `EXTRACT`
29+
- `AGGREGATE`
30+
- `COMPLETE`
31+
- Added text output for per-pattern metrics when `--verbose` or `PROFILE=1` is enabled.
32+
- Added root-level JSON output block:
33+
34+
```json
35+
"magic_string_metrics": {
36+
"patterns_processed": 4,
37+
"total_raw_matches": 7,
38+
"total_extracted_strings": 7,
39+
"total_unique_strings": 4,
40+
"total_filtered_strings": 4,
41+
"filtered_below_min_files": 4,
42+
"filtered_below_min_matches": 4,
43+
"total_violations": 0,
44+
"timing_ms": {"grep": 130, "extract": 160, "aggregate": 101}
45+
}
46+
```
47+
48+
## Results
49+
50+
- Verification target: `/Users/noelsaw/Documents/GH Repos/creditconnection2-self-service`
51+
- Text-mode verification completed successfully with the new per-pattern metrics visible in the Magic String Detector section.
52+
- JSON-mode verification completed successfully with `magic_string_metrics` present in the output payload.
53+
- Example text-mode output now includes lines like:
54+
- `Metrics: 4 raw → 4 extracted → 1 unique → 0 violations`
55+
- `Filtered: 1 strings (min_files=1, min_matches=1)`
56+
- `Timing: grep=34ms extract=89ms agg=32ms`
57+
58+
## Lessons Learned
59+
60+
- Fixing the stale-registry fallback first was necessary; otherwise the new observability would have described a broken execution path.
61+
- The existing profiling and debug hooks were sufficient for Path B, so the implementation stayed small and localized.
62+
- There are still unrelated issues outside this task, including a missing validator script for `wp-template-tags-in-loops` and baseline noise from historical paths.
63+
64+
## Related
65+
66+
- `CHANGELOG.md`
67+
- `dist/bin/check-performance.sh`
68+
- `dist/lib/pattern-loader.sh`

dist/PATTERN-LIBRARY.json

Lines changed: 23 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,27 @@
11
{
22
"version": "1.0.0",
3-
"generated": "2026-02-07T05:07:35Z",
3+
"generated": "2026-03-24T01:34:38Z",
44
"summary": {
5-
"total_patterns": 55,
6-
"enabled": 55,
5+
"total_patterns": 56,
6+
"enabled": 56,
77
"disabled": 0,
88
"by_severity": {
99
"CRITICAL": 20,
1010
"HIGH": 17,
11-
"MEDIUM": 13,
11+
"MEDIUM": 14,
1212
"LOW": 4
1313
},
1414
"by_category": {
15-
"performance": 22,"Performance": 5,"duplication": 5,"reliability": 5,"security": 14
15+
"performance": 22,"Performance": 5,"duplication": 5,"reliability": 6,"security": 14
1616
},
1717
"by_pattern_type": {
18-
"php": 44,
18+
"php": 45,
1919
"headless": 6,
2020
"nodejs": 4,
2121
"javascript": 1
2222
},
2323
"mitigation_detection_enabled": 7,
24-
"heuristic_patterns": 17,
24+
"heuristic_patterns": 18,
2525
"definitive_patterns": 38
2626
},
2727
"patterns": [
@@ -467,6 +467,22 @@
467467
"search_pattern": "\\.then[[:space:]]*\\([^)]+\\)[[:space:]]*;[[:space:]]*$|Promise\\.all[[:space:]]*\\([^)]+\\)\\.then",
468468
"file_patterns": ["*.js", "*.jsx", "*.ts", "*.tsx"]
469469
},
470+
{
471+
"id": "nonstandard-wordpress-translation-alias",
472+
"version": "1.0.0",
473+
"enabled": true,
474+
"category": "reliability",
475+
"severity": "MEDIUM",
476+
"title": "Non-standard WordPress translation alias `_()`",
477+
"description": "Detects `_()` calls in WordPress PHP code. WordPress standard translation helpers are `__()`, `esc_html__()`, `esc_attr__()`, and related native APIs. Using `_()` introduces ambiguity, reduces readability, and can cause compatibility or bootstrap fragility in real-world theme/plugin environments.",
478+
"detection_type": "simple",
479+
"pattern_type": "php",
480+
"mitigation_detection": false,
481+
"heuristic": true,
482+
"file": "nonstandard-wordpress-translation-alias.json",
483+
"search_pattern": "(^|[^[:alnum:]_])_[[:space:]]*\\(",
484+
"file_patterns": ["*.php"]
485+
},
470486
{
471487
"id": "nopaging-true",
472488
"version": "1.0.0",

0 commit comments

Comments
 (0)