WEB-4679: declare a scan manifest so the backend can prune uninstalled tools by anonpran · Pull Request #187 · websentry-ai/coding-discovery-tool

anonpran · 2026-06-08T13:08:19Z

Summary

The discovery agent never told the backend which tools were still installed, so an uninstalled tool's row persisted on the dashboard forever. The completed scan event now carries a manifest of the (home_user, tool_name) pairs successfully scanned this run, plus covered_home_users, so the backend can reconcile by set-difference and soft-delete what's gone.

Changes

ai_tools_discovery.py: accumulate scanned_manifest on the send-success and dedup hash-match paths only — a tool that errored on read is deliberately left out so it's never mistaken for uninstalled. Pass manifest + covered_home_users (= all enumerated OS users, so a user who removed their last tool is still in scope) to the completed send_scan_event.
utils.py: send_scan_event gains optional manifest / covered_home_users, inserted into the payload only when present (backward compatible).

Stdlib-only; the accumulation is pure in-memory and cannot raise (runs on customer machines).

Cross-repo

Pairs with the ai-gateway-data WEB-4679 PR (reconcile + removed_at soft-delete). Forward/backward compatible — an old backend simply ignores the new fields, deploy order independent.

Test plan

tests/test_scan_completed_manifest.py (8 tests): manifest carried on the completed event; success + hash-match included, errored read excluded; covered_home_users includes a zero-tool user; legacy call omits both keys; non-completed events carry no manifest.

🤖 Generated with Claude Code

Note

Medium Risk
Changes backend reconciliation semantics: incorrect manifest entries could soft-delete live tools or leave stale rows, though the PR deliberately errs toward keeping tools on ambiguous failures.

Overview
WEB-4679 lets the backend stop showing tools that were uninstalled by reconciling against what the agent actually saw this run. The completed scan event now includes a manifest of (home_user, tool_name) pairs and covered_home_users (all enumerated OS users), so the gateway can set-diff and soft-delete rows outside that scope.

Manifest entries are driven by detection/presence, not whether config read or upload succeeded. Each per-user pass records the tool before filtering/extraction; Copilot/Augment ownership skips discard the optimistic entry. Detector exceptions add the tool via a new failures set on detect_all_tools; device-wide process_single_tool failures add the tool for every covered user. Upload/hash-match outcomes no longer gate inclusion.

send_scan_event accepts optional manifest and covered_home_users and only adds them when not None (legacy callers unchanged). tests/test_scan_completed_manifest.py covers payload shaping, CLI completed events, and in-process presence semantics (read errors, upload failures, detector errors).

^{Reviewed by Cursor Bugbot for commit e7e7a9e. Bugbot is set up for automated code reviews on this repo. Configure here.}

Greptile Summary

This PR adds scan manifests so the backend can prune removed discovery tools. The main changes are:

Tracks detected (home_user, tool_name) pairs during discovery.
Sends manifest and covered_home_users on completed scan events.
Keeps detected tools in the manifest across read, upload, and detector error paths.
Adds tests for completed-event payload shape and manifest membership rules.

Confidence Score: 4/5

This should be fixed before merging.

Manifest entries can be assigned to users who did not detect the tool.
Detector failures can protect the wrong tool name when one detector owns multiple concrete rows.
Both cases can make backend pruning keep stale rows or remove live rows.

scripts/coding_discovery_tools/ai_tools_discovery.py

Important Files Changed

Filename	Overview
scripts/coding_discovery_tools/ai_tools_discovery.py	Builds and sends the completed scan manifest, but some membership paths can still produce incorrect prune inputs.
scripts/coding_discovery_tools/utils.py	Adds optional manifest fields to scan event payloads while preserving legacy calls.
tests/test_scan_completed_manifest.py	Covers the new payload fields and several presence-based manifest paths.

Comments Outside Diff (2)

scripts/coding_discovery_tools/ai_tools_discovery.py, line 2646-2649 (link)

Upload failure silently drops tool from manifest — may trigger false soft-delete

When send_report_to_backend returns (False, retryable=True), the report is queued for the next run but the tool is never appended to scanned_manifest. The backend then receives a covered_home_users entry for the user but no manifest entry for this tool, which it may interpret as "tool uninstalled" and issue a soft-delete — even though the tool is still present; the upload just failed transiently.

The PR description states the exclusion criterion as "a tool that errored on read", and the comment on line 2644 echoes "Successfully read and uploaded." Upload failures are not read failures, so by the stated design intent the tool should still be manifested. The fix is to record the tool in scanned_manifest as soon as filter_tool_projects_by_user and generate_single_tool_report succeed (i.e., after the report is built), independent of whether the HTTP upload succeeds.
scripts/coding_discovery_tools/ai_tools_discovery.py, line 2712-2736 (link)

No metrics instrumented for the new manifest/pruning flow

The existing sentry_metrics_payload (sent via send_discovery_metrics) tracks tool_count and user_count, but adds nothing for the new manifest. Without manifest-specific metrics it will be impossible to diagnose backend pruning anomalies in production. Concrete metrics to add to the metadata block:
- manifest_size: len(scanned_manifest) — how many (tool, user) pairs were recorded this run.
- manifest_excluded_reads: tools that errored on read (the key correctness signal for the errored-read exclusion logic).
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

_{Reviews (6): Last reviewed commit: "Merge origin/staging into nanda/web-4679..." | Re-trigger Greptile}

Greptile also left 2 inline comments on this PR.

…d tools The discovery agent never told the backend which tools were still installed, so an uninstalled tool's row persisted on the dashboard forever. The completed scan event now carries a manifest of the (home_user, tool_name) pairs successfully scanned this run plus covered_home_users, letting the backend reconcile by set-difference and soft-delete what's gone. - ai_tools_discovery.py: accumulate scanned_manifest on the send-success and dedup hash-match paths only (a tool that errored on read is left out so it is never mistaken for uninstalled); pass manifest + covered_home_users (= all enumerated OS users, so a user who removed their last tool is still in scope) to the completed send_scan_event. - utils.py: send_scan_event gains optional manifest / covered_home_users, inserted into the payload only when present (backward compatible). Stdlib-only; the accumulation is pure in-memory and cannot raise. Pairs with the ai-gateway-data WEB-4679 reconcile change (forward/backward compatible: an old backend simply ignores the new fields). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…covery-never-prunes-uninstalled-tools-removed-tools

Addresses Greptile P2: accumulate the manifest as a set of tuples (a pair can never be double-recorded) and serialize to a sorted list of dicts at the send site. Functionally equivalent today (the success and hash-match branches are mutually exclusive, one entry per (tool, user)), but removes the latent duplicate risk and makes the output deterministic. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…tighten comments Greptile flag: a tool whose READ succeeded but whose upload failed transiently was dropped from the manifest, so the backend could mistake a network blip for an uninstall and prune a live tool (enforce mode). Record it in the send-failure branch too — the manifest tracks what was SEEN, not what uploaded. Adds a regression test (read-success + upload-failure stays in the manifest). Also condense the WEB-4679 comments to concise one-liners. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ve tool A tool whose read errors without a scan_event=failed (the generic per-user except, the per-tool except, or a PermissionError whose failed-event send itself fails) leaves the manifest possibly missing an installed tool — the backend would then set-diff it as uninstalled and wrongly prune it. Track scanned_manifest_complete and send manifest=None (backend treats as legacy = no prune) when the run wasn't fully read, deferring cleanup to a clean run. Adds a test forcing a generic (non-Permission) read error -> manifest=None. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… success A read/extraction error no longer drops a live tool from the manifest (presence is recorded before extraction) and never fail-closes the whole manifest to None — so one tool's failure can't block pruning every other tool on the device. A detector that errors is kept in the manifest (presence unknown, not an uninstall). Removes the global scanned_manifest_complete fail-close. Tests rewritten to the new contract. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes using high effort and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit bcb482e. Configure here.}

vigneshsubbiah16 · 2026-06-26T18:00:50Z

🛡️ Automated Security Review (consensus)

1 finding — 0 high-confidence, 1 to triage. Reviewers: Cursor, Claude, Semgrep, Gitleaks.

Findings

🟡 TRIAGE: Device-wide error over-expands manifest across all users

scripts/coding_discovery_tools/ai_tools_discovery.py:2696

Impact: On a device-wide processing exception, the handler adds (u, tool_name) for every enumerated user, even if only one user had the tool detected — stale inventory rows may persist and backend pruning is blocked for unaffected users.

Fix: On device-wide failure, add manifest entries only for users where the tool was actually detected this run (e.g., track per-user detection before the outer try, or re-use per-user failure sets), rather than iterating all_users.

Flagged by: Cursor

Notes

Claude, Semgrep, Gitleaks: no traditional security issues (no secrets, injection, auth bypass, or new sensitive data exposure).
Greptile upload-failure / dedup comments: superseded by the current diff — manifest is recorded at detection time (before read/upload) and scanned_manifest is a set; covered by test_upload_failure_keeps_tool_in_manifest.

🤖 consensus review · reviewers: Cursor,Claude,Semgrep,Gitleaks · head bcb482e2 · 2026-06-26T18:00Z

…talled-tools-removed-tools Conflicts were purely additive — union both sides: - utils.send_scan_event: keep staging's system_user param alongside WEB-4679's manifest + covered_home_users (and their docstrings); body already adds all three. - ai_tools_discovery completed event: pass system_user + manifest + covered_home_users. Manifest-from-presence fix auto-merged cleanly. Manifest tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

vigneshsubbiah16 · 2026-06-26T18:20:57Z

🛡️ Automated Security Review (consensus)

1 finding — 0 high-confidence, 1 to triage. Reviewers: Cursor, Claude, Semgrep, Gitleaks.

Findings

🟡 TRIAGE: Device-wide `process_single_tool` failure expands manifest to all users

scripts/coding_discovery_tools/ai_tools_discovery.py:3217

Impact: On a device-wide extraction exception, the manifest adds (user, tool_name) for every entry in covered_home_users, which can block backend pruning for users who no longer have that tool (stale inventory persists on the dashboard).

Fix: Scope manifest entries to users where per-user detection actually found the tool, or send a separate “presence unknown” signal for device-wide failures instead of manifesting the tool for all covered users.

Reviewers: Cursor (Bugbot), Lead

🤖 consensus review · reviewers: Cursor,Claude,Semgrep,Gitleaks · head e7e7a9e5 · 2026-06-26T18:20Z

greptile-apps · 2026-06-26T18:21:54Z


                    try:
+                        # Record presence before extraction so a read error below can't drop a live tool.
+                        scanned_manifest.add((user_name, tool_name))


Tracks phantom ownership
This adds the tool for every user in all_users, but the tool list was deduped globally by tool name and path before this loop. If Alice still has a user-scoped tool and Bob removed it, Alice's detection can keep one global tool entry, then this line adds (Bob, tool_name) even though Bob did not detect it. Only Copilot CLI and Augment have a later ownership discard, so other user-scoped tools can remain in Bob's manifest and the backend will never prune Bob's stale row.

greptile-apps · 2026-06-26T18:21:55Z

+                )
+            # Detector errored -> presence unknown -> keep it in the manifest (don't treat as uninstalled).
+            for failed_tool_name in user_detect_failures:
+                scanned_manifest.add((user, failed_tool_name))


Uses umbrella names
This records detector failures with detector.tool_name, but some detectors protect rows whose successful tool['name'] values are different. For example, a Copilot detector failure can add GitHub Copilot, while existing rows may be GitHub Copilot (VS Code) or other surface-specific names. The backend set-diffs exact (home_user, tool_name) pairs, so a transient detector error can still prune the real surface row and keep only a phantom umbrella-name entry. The failure path needs to protect the concrete row names that the detector owns, or skip pruning that detector namespace for the user.

anonpran and others added 2 commits June 8, 2026 18:35

Merge remote-tracking branch 'origin/staging' into nanda/web-4679-dis…

6402e54

…covery-never-prunes-uninstalled-tools-removed-tools

anonpran requested a review from a team June 8, 2026 13:08

greptile-apps Bot reviewed Jun 8, 2026

View reviewed changes

Comment thread scripts/coding_discovery_tools/ai_tools_discovery.py Outdated

anonpran and others added 4 commits June 8, 2026 19:08

cursor Bot reviewed Jun 26, 2026

View reviewed changes

Comment thread scripts/coding_discovery_tools/ai_tools_discovery.py

greptile-apps Bot reviewed Jun 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WEB-4679: declare a scan manifest so the backend can prune uninstalled tools#187

WEB-4679: declare a scan manifest so the backend can prune uninstalled tools#187
anonpran wants to merge 7 commits into
stagingfrom
nanda/web-4679-discovery-never-prunes-uninstalled-tools-removed-tools

anonpran commented Jun 8, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

vigneshsubbiah16 commented Jun 26, 2026

Uh oh!

vigneshsubbiah16 commented Jun 26, 2026

Uh oh!

greptile-apps Bot Jun 26, 2026

Uh oh!

greptile-apps Bot Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

anonpran commented Jun 8, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Cross-repo

Test plan

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Comments Outside Diff (2)

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vigneshsubbiah16 commented Jun 26, 2026

🛡️ Automated Security Review (consensus)

Findings

🟡 TRIAGE: Device-wide error over-expands manifest across all users

Notes

Uh oh!

vigneshsubbiah16 commented Jun 26, 2026

🛡️ Automated Security Review (consensus)

Findings

🟡 TRIAGE: Device-wide process_single_tool failure expands manifest to all users

Uh oh!

greptile-apps Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

anonpran commented Jun 8, 2026 •

edited by greptile-apps Bot

Loading

🟡 TRIAGE: Device-wide `process_single_tool` failure expands manifest to all users