You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The AI triage script (`dist/bin/ai-triage.py`) generates **hardcoded recommendations and narrative** that don't validate against actual findings. This causes hallucinations where recommendations are made for issues that don't exist in the scan results.
**Remember:** The HTML converter reads the JSON file at the time it runs. If you regenerate HTML before updating the JSON with AI triage data, the HTML will not include the triage information.
736
736
737
+
### AI Triage Hallucinations: Recommendations for Non-Existent Issues
738
+
739
+
**Symptom:** AI triage recommendations mention issues (e.g., "Remove debugger statements") that don't appear in the actual findings list.
740
+
741
+
**Root Cause:** The AI triage script was generating hardcoded recommendations that didn't validate against actual findings. This has been fixed in v1.1+.
742
+
743
+
**How to Detect:**
744
+
1. Review the recommendations in the HTML report
745
+
2. Search the findings list for the recommended issue
746
+
3. If no findings match the recommendation → it's a hallucination
747
+
748
+
**Example (Fixed in v1.1):**
749
+
```
750
+
❌ OLD (v1.0): Recommendation: "Remove debugger; statements from shipped JS"
751
+
But: Zero findings for debugger statements in the scan
752
+
753
+
✅ NEW (v1.1): Only recommendations for issues actually found in triaged findings
754
+
```
755
+
756
+
**Prevention (v1.1+):**
757
+
- AI triage now builds recommendations dynamically from actual findings
758
+
- Each recommendation is validated against the triaged findings set
759
+
- Validation step logs: `✅ Validation passed: N recommendations match actual findings`
760
+
- If no actionable findings exist, a generic guidance recommendation is provided instead
761
+
762
+
**For AI Agents (v1.1+):**
763
+
- The script automatically validates recommendations before writing JSON
764
+
- Look for this log message: `[AI Triage] ✅ Validation passed: N recommendations match actual findings`
765
+
- If you see warnings about mismatched recommendations, investigate the triaged findings
766
+
- Never manually add hardcoded recommendations; always derive them from actual findings
767
+
768
+
**If You Encounter Hallucinations:**
769
+
1. Check the AI triage script version: `grep "version.*:" dist/bin/ai-triage.py | head -1`
770
+
2. If version < 1.1, update the script from the main branch
# Minimal executive summary tailored to what we observed in the sample.
330
+
# Build dynamic narrative and recommendations from actual findings
331
331
narrative_parts= []
332
332
narrative_parts.append(
333
333
"This Phase 2 triage pass reviews a subset of findings to separate likely true issues from policy/heuristic noise (especially in vendored/minified assets)."
narrative_parts.append("No findings were triaged in this pass.")
359
+
335
360
narrative_parts.append(
336
-
"Key confirmed items in the reviewed set include shipped `debugger;` statements and missing explicit HTTP timeouts. Several REST and admin capability findings appear to be heuristic/policy-driven and may be acceptable when endpoints are not list-based or when capabilities are enforced by WordPress menu APIs."
337
-
)
338
-
narrative_parts.append(
339
-
"A large portion of findings come from bundled/minified JavaScript or third-party libraries; these are difficult to validate from pattern matching alone and are therefore marked as Needs Review unless a clear mitigation is visible (e.g., regex escaping before `new RegExp()`)."
361
+
"Findings in vendored/minified code are difficult to validate from pattern matching alone and are marked as Needs Review unless a clear mitigation is visible."
340
362
)
341
363
342
-
recommendations= [
343
-
'Remove/strip `debugger;` statements from shipped JS assets (or upgrade/patch the vendored library that contains them).',
344
-
'Add explicit `timeout` arguments to `wp_remote_get/wp_remote_post/wp_remote_request` calls where missing.',
345
-
'For REST endpoints, confirm which routes return potentially large collections; add `per_page`/limit constraints there (action/single-item routes may not need pagination).',
346
-
'For superglobal reads, ensure values are validated/sanitized before use and that nonce/capability checks exist on the request path.',
347
-
]
364
+
# Build recommendations only for issues actually found
365
+
recommendations= []
366
+
367
+
# Recommendation templates mapped to finding IDs
368
+
recommendation_map= {
369
+
'spo-001-debug-code': 'Remove/strip `debugger;` statements from shipped JS assets (or upgrade/patch the vendored library that contains them).',
370
+
'http-no-timeout': 'Add explicit `timeout` arguments to `wp_remote_get/wp_remote_post/wp_remote_request` calls where missing.',
371
+
'rest-no-pagination': 'For REST endpoints, confirm which routes return potentially large collections; add `per_page`/limit constraints there (action/single-item routes may not need pagination).',
372
+
'spo-002-superglobals': 'For superglobal reads, ensure values are validated/sanitized before use and that nonce/capability checks exist on the request path.',
373
+
'unsanitized-superglobal-read': 'Sanitize all superglobal reads ($_GET, $_POST, $_REQUEST) before use in sensitive operations.',
374
+
'spo-004-missing-cap-check': 'Add capability checks to admin functions and hooks using current_user_can().',
375
+
'wpdb-query-no-prepare': 'Use $wpdb->prepare() for all database queries with external input.',
376
+
}
377
+
378
+
# Only add recommendations for issues that were actually found
0 commit comments