Skip to content

Drop CI/local-run noise from production Sentry (DISCOVERY-TOOL-SCRIPT-17/-13/-12/-D)#216

Merged
anonpran merged 2 commits into
mainfrom
fix/sentry-ci-noise-filter
Jun 26, 2026
Merged

Drop CI/local-run noise from production Sentry (DISCOVERY-TOOL-SCRIPT-17/-13/-12/-D)#216
anonpran merged 2 commits into
mainfrom
fix/sentry-ci-noise-filter

Conversation

@anonpran

@anonpran anonpran commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Summary

The discovery tool's CI suite runs the real script against a loopback mock gateway (--domain http://127.0.0.1:<ephemeral port>) on clean GitHub-hosted runners. With no AI tools installed, every scan finds zero tools and the previously log-only report_to_sentry() paths self-report to the production Sentry project — no_tools_found / send-failure / timeout / signal events with zero customer impact.

Verified in Sentry (last 24h): 346 such events across 4 issuesDISCOVERY-TOOL-SCRIPT-D (222), -17 (71), -13 (49), -12 (4). Every one carries a loopback domain tag; users_impacted: 0.

This adds a before_send-style guard at the single Sentry choke point (report_to_sentry() — there is no Sentry SDK in this zero-dependency tool) that drops an event whose report target is a loopback host.

Changes

  • scripts/coding_discovery_tools/utils.py: new _event_domain_is_loopback() / _ip_is_loopback() / _is_ci_or_local_event() helpers + an early-return guard at the top of report_to_sentry() (before the dedup set / per-run counters / circuit breaker, so dropped events have zero side effects).

Drop signal: loopback domain (false-positive-free)

The host is normalized through the stdlib IP parsers (socket.inet_aton / inet_pton), so every spelling is caught: IPv4 dotted quad + shorthand (127.0.0.1, 127.1), ::1 and its expanded form, IPv4-mapped ::ffff:127.0.0.1, plus localhost / 0.0.0.0. A real install always points at the customer gateway URL, never loopback; an FQDN like 127.example.com is not a valid IP literal so it is never matched.

Why only this signal: an earlier draft also keyed on a CI env var and on a runner/runneradmin system_user. Both were dropped — the env var broke report_to_sentry()'s own dedup/cap/breaker tests (which run under CI), and the account-name check would false-positive on a real customer named runner/runneradmin (per @greptile-apps), suppressing even their priority/terminal reports. The loopback domain already covers 100% of the 346 observed events, so it is the complete and FP-free signal. Bare-context extract/detect emits are out of scope for this in-function filter; the fix for those is source-side (don't point the production DSN at CI runs).

The guard never raises and fails open (defaults to keeping the event), so a malformed context can never suppress a genuine customer error.

Test plan

  • Matcher verified: loopback spellings (127.0.0.1, 127.1, ::1, [0:0:0:0:0:0:0:1], ::ffff:127.0.0.1, localhost) match; real gateways/FQDNs (api.getunbound.ai, 127.example.com, 127.0.0.1.evil.com, 10.0.0.5, 8.8.8.8) do not. A runner user with a real domain is not suppressed.
  • Existing report_to_sentry() transport tests (TestSentryRunGuards, TestSentryDebugLogging, TestSentryPriorityBypassesCap) still pass under GITHUB_ACTIONS=true.
  • Full suite (python -m unittest discover, CI's runner): all test jobs green on macOS + Windows × py 3.9/3.11/3.12.
  • elite-pr-reviewer: APPROVE. /security-review + Cursor Security + Greptile: addressed; no high-confidence findings.

Notes

  • Staging sync: the no_tools_found instrumentation (PR [WEB-4956] Add no_tools_found Sentry event to diagnose empty discovery scans #215) currently lives on staging, not main, so this guard must ride the normal main→staging merge-commit sync to fully silence -17. The guard is phase-agnostic and lands cleanly on both.
  • Once deployed + synced, resolve DISCOVERY-TOOL-SCRIPT-17 / -13 / -12 / -D in Sentry (intentionally not using a Fixes <ISSUE> auto-close keyword, since events can keep arriving until staging carries the guard).

🤖 Generated with Claude Code


Note

Low Risk
Single choke-point filter with fail-open semantics; only suppresses events whose report target is loopback, not production customer domains.

Overview
Stops discovery-tool CI and local test runs from polluting the production Sentry project. CI exercises the real script against a loopback mock gateway (--domain http://127.0.0.1:…), which previously triggered the same report_to_sentry() paths as customer installs.

In utils.py, report_to_sentry() now returns early when the event context’s domain host is loopback—before dedup, per-run caps, or the circuit breaker, so skipped events have no side effects. New helpers normalize the domain string (no urllib, for Zscaler compatibility) and detect loopback via localhost, 0.0.0.0, and stdlib IP parsing for IPv4 shorthand, ::1, and IPv4-mapped forms. The check fails open (malformed context still reports). Real customer gateways are unchanged.

Reviewed by Cursor Bugbot for commit 2030019. Bugbot is set up for automated code reviews on this repo. Configure here.

Greptile Summary

This PR reduces production Sentry noise from local discovery-tool runs. The main changes are:

  • Adds loopback host detection for discovery report domains.
  • Filters loopback-domain events at report_to_sentry().
  • Keeps malformed or missing context on the normal reporting path.

Confidence Score: 5/5

This looks safe to merge.

  • No blocking issues found in the changed code.

Important Files Changed

Filename Overview
scripts/coding_discovery_tools/utils.py Adds loopback-domain filtering around Sentry reporting while keeping non-loopback events reportable.

Reviews (7): Last reviewed commit: "Drop CI/local-run noise from production ..." | Re-trigger Greptile

The discovery CI suite runs the real script against a loopback mock
gateway on clean GitHub-hosted runners, so every scan finds zero tools
and the previously log-only Sentry paths self-report to the production
project: no_tools_found / send-failure / timeout / signal events with
zero customer impact (DISCOVERY-TOOL-SCRIPT-17 / -13 / -12 / -D; 346
events observed in 24h).

Add a before_send-style guard at the top of report_to_sentry() that
drops an event carrying a CI fingerprint, using three signals, each
~0-false-positive on customer machines:
  - a CI environment marker (GITHUB_ACTIONS / CI) -- ground truth, and
    the only one that also covers the bare-context extract/detect emits;
  - the report target (domain) is a loopback host (127.0.0.0/8 dotted
    quad / localhost / ::1 / 0.0.0.0) -- a real install always points at
    the customer gateway URL, never loopback;
  - the OS account (system_user) is a GitHub-hosted-runner account
    (runner / runneradmin).

The guard never raises and defaults to keeping the event, so a malformed
context can never suppress a genuine customer error.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@anonpran anonpran requested a review from a team June 26, 2026 05:33
Comment thread scripts/coding_discovery_tools/utils.py Outdated
Comment thread scripts/coding_discovery_tools/utils.py Outdated
@vigneshsubbiah16

Copy link
Copy Markdown
Contributor

🛡️ Automated Security Review (consensus)

0 findings — 0 high-confidence, 0 to triage. Reviewers: Cursor, Claude, Semgrep, Gitleaks, Greptile.

No exploitable vulnerabilities, secret exposure, or unsafe parsing/injection paths in this diff. The change is an observability filter at the Sentry choke point that fails open on errors and only suppresses events matching CI/local fingerprints from trusted context fields (domain, system_user, process env).

Previously acknowledged (not re-flagged)

  • Blanket drop when CI / GITHUB_ACTIONS is set — intentional signal [WEB-2984] Accepts, verifies api key and send report to backend #1 in the PR; author notes customer MDM/LaunchAgent installs are not expected to set these vars, and this is the only way to filter bare-context extract/detect emits.
  • Drop for system_user runner / runneradmin — accepted tradeoff per PR description: vanishingly rare on customer machines, and only suppresses error telemetry (not customer discovery data).
  • Alternate loopback spellings (127.1, expanded IPv6 forms) may bypass the domain check — incomplete noise filtering at worst leaks more CI test events to Sentry; not a customer-data or exploit path, and standard loopback forms used by this repo's CI are covered.

🤖 consensus review · reviewers: Cursor,Claude,Semgrep,Gitleaks · head 5df9fc28 · 2026-06-26T05:41Z

@anonpran anonpran force-pushed the fix/sentry-ci-noise-filter branch from 5df9fc2 to acfacee Compare June 26, 2026 05:42
Comment thread scripts/coding_discovery_tools/utils.py Outdated

@vigneshsubbiah16 vigneshsubbiah16 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛡️ Automated Security Review (consensus)

0 findings — 0 high-confidence, 0 to triage. Reviewers: Cursor, Claude, Semgrep, Gitleaks.

✅ Security consensus: no issues found. (reviewers: Cursor, Claude, Semgrep, Gitleaks)


🤖 consensus review · reviewers: Cursor,Claude,Semgrep,Gitleaks · head acfaceeb · 2026-06-26T05:51Z

@anonpran anonpran force-pushed the fix/sentry-ci-noise-filter branch from acfacee to e93bf06 Compare June 26, 2026 05:52
@anonpran

Copy link
Copy Markdown
Contributor Author

Thanks @greptile-apps — addressed both findings:

1. "Generic CI Env Suppresses Telemetry" (P1, on 5df9fc2) — fixed by removing the GITHUB_ACTIONS/CI env signal entirely. You were right: an env-based unconditional drop suppressed every report under CI, including bare extract/detect errors and the priority no_tools_found diagnostic — and it broke report_to_sentry()'s own dedup/cap/breaker tests (which run under CI). The filter is now keyed purely on the event (loopback domain + runner system_user). Bare-context emits are explicitly out of scope for the in-function filter; the right fix for those is source-side (don't point the production DSN at CI), noted in the PR body.

2. "Loopback aliases bypass" (P1, on acfacee) — fixed in e93bf06 exactly as you suggested: the host is now normalized through the stdlib IP parsers (socket.inet_aton / inet_pton) instead of a literal dotted-quad string check. This catches http://127.1, the expanded http://[0:0:0:0:0:0:0:1], and IPv4-mapped http://[::ffff:127.0.0.1] — all added as test cases — while still rejecting FQDNs like 127.example.com / 127.0.0.1.evil.com and real gateways (10.0.0.5, 8.8.8.8, 2001:db8::1) so customer telemetry is never suppressed.

@greptile-apps

greptile-apps Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Both fixes look correct. Here's a quick trace through the key paths:

Fix 1 (CI env removed): _is_ci_or_local_event now only touches ctx.get("domain") and ctx.get("system_user") — no os.environ anywhere in the filter. Dedup/cap/breaker tests that run under CI are unaffected.

Fix 2 (loopback aliases): The _ip_is_loopback logic handles all the tricky cases cleanly:

  • http://127.1inet_aton("127.1")[0] == 127
  • http://[0:0:0:0:0:0:0:1]:8080 → brackets stripped → inet_pton(AF_INET6, "0:0:0:0:0:0:0:1")b"\x00"*15 + b"\x01"
  • http://[::ffff:127.0.0.1]:9000 → IPv4-mapped prefix check → packed[12] == 127
  • bare "::1"count(":") is 2, so the elif <= 1 branch is skipped, host stays intact → inet_pton matches ✓
  • 127.example.com → both parsers reject it (not a valid IP literal) → False ✓

The 0.0.0.0 and localhost string checks before _ip_is_loopback are also correct since inet_aton("0.0.0.0")[0] would be 0, not 127, so without the explicit string check it would slip through.

The AttributeError guard in the inet_pton except clause correctly handles Windows where socket.inet_pton may not exist. Test coverage for the new edge cases (127.1, expanded IPv6, IPv4-mapped) matches exactly what was added.

This LGTM. Both P1 findings are fully addressed.

Comment thread scripts/coding_discovery_tools/utils.py Outdated

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using high effort and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Want reviews to match your repository better? Bugbot Learning can learn team-specific rules from PR activity. A team admin can enable Learning in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit e93bf06. Configure here.

Comment thread scripts/coding_discovery_tools/utils.py Outdated
Comment thread scripts/coding_discovery_tools/utils.py
@anonpran anonpran force-pushed the fix/sentry-ci-noise-filter branch from e93bf06 to 2da1687 Compare June 26, 2026 05:59
@vigneshsubbiah16

Copy link
Copy Markdown
Contributor

🛡️ Automated Security Review (consensus)

0 findings — 0 high-confidence, 0 to triage. Reviewers: Cursor, Claude, Semgrep, Gitleaks, Greptile.

No exploitable vulnerabilities, secret exposure, injection surface, or auth/crypto weaknesses. The change only suppresses outbound Sentry telemetry using trusted run context (domain from --domain, system_user from the local OS account), normalizes loopback hosts via stdlib IP parsers, and fails open so parse errors never drop genuine customer reports.

Previously acknowledged (not re-flagged)

  • Generic CI / GITHUB_ACTIONS env-based suppression — Removed after maintainer fix; filter is keyed on event context only so CI transport tests and real-gateway CI runs are not blanket-dropped (Greptile P1, addressed in follow-up commits).
  • Loopback alias bypass (127.1, expanded ::1, IPv4-mapped forms) — Fixed via socket.inet_aton / inet_pton normalization with test coverage (Greptile P1, confirmed LGTM).
  • Username-only runner-account drop (runner / runneradmin) — Accepted by design in the PR: vanishingly rare on customer machines; worst case is loss of Unbound error telemetry, not customer report delivery or data exposure (Greptile P1).

🤖 consensus review · reviewers: Cursor,Claude,Semgrep,Gitleaks · head 2da1687e · 2026-06-26T06:01Z

Comment thread scripts/coding_discovery_tools/utils.py Outdated
@anonpran anonpran force-pushed the fix/sentry-ci-noise-filter branch from 2da1687 to 4077ede Compare June 26, 2026 06:13
Comment thread scripts/coding_discovery_tools/utils.py Outdated
@vigneshsubbiah16

Copy link
Copy Markdown
Contributor

🛡️ Automated Security Review (consensus)

0 findings — 0 high-confidence, 0 to triage. Reviewers: Cursor, Claude, Semgrep, Gitleaks, Greptile.

✅ Security consensus: no issues found. (reviewers: Cursor, Claude, Semgrep, Gitleaks, Greptile)

Previously acknowledged (not re-flagged)

  • Generic CI env suppresses all telemetry — Addressed: maintainer removed GITHUB_ACTIONS/CI env checks; filter is keyed only on event context (domain, system_user) so transport tests under CI still exercise the HTTP path (confirmed by Greptile on follow-up).
  • Loopback alias bypass (127.1, expanded IPv6, IPv4-mapped) — Addressed: host normalization now uses stdlib socket.inet_aton / inet_pton with test coverage (commit e93bf06; Greptile LGTM).
  • Username-only drop on runner / runneradmin — Accepted by design: intentional CI fingerprint documented in the PR; domain and system_user are trusted operator-controlled inputs, and the fail-open guard never suppresses on malformed context.

🤖 consensus review · reviewers: Cursor,Claude,Semgrep,Gitleaks · head 4077edeb · 2026-06-26T06:21Z

@anonpran anonpran force-pushed the fix/sentry-ci-noise-filter branch from 4077ede to 43c870d Compare June 26, 2026 06:30
@anonpran

Copy link
Copy Markdown
Contributor Author

Thanks @greptile-apps — tightened exactly as recommended in 43c870d. Removed the runner/runneradmin system_user signal entirely; the filter now keys solely on the loopback domain.

This resolves both points together:

  1. A real customer whose OS account is literally named runner/runneradmin is no longer suppressed.
  2. Because the loopback-domain check is false-positive-free (a real install always points at the customer gateway URL, never 127.x), no genuine event can be dropped — including the priority no_tools_found, send-failure, and terminal abort/timeout reports the early-return precedes.

It also costs zero coverage: all 346 observed noise events already carry a loopback domain, so the user-account signal was catching nothing the domain check didn't. Net result is simpler and strictly safer.

@greptile-apps

greptile-apps Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

That's the right call. Removing the system_user signal is strictly better — the loopback domain check is already sufficient to cover all 346 observed noise events, and it carries zero false-positive risk since a real production install can never have a loopback domain. Dropping the user-account check eliminates the (admittedly edge-case) suppression risk for accounts named runner/runneradmin at no coverage cost.

Both original concerns are resolved. LGTM.

The discovery CI suite runs the real script against a loopback mock
gateway on clean GitHub-hosted runners, so every scan finds zero tools
and the previously log-only Sentry paths self-report to the production
project: no_tools_found / send-failure / timeout / signal events with
zero customer impact (DISCOVERY-TOOL-SCRIPT-17 / -13 / -12 / -D; 346
events in 24h, all with a loopback report domain).

Add a guard at the top of report_to_sentry() that drops an event whose
report target (domain) is a loopback host. The host is normalized via the
stdlib socket parsers, so every spelling is caught (IPv4 dotted quad +
shorthand, ::1 and its expanded form, IPv4-mapped ::ffff:127.0.0.1),
plus localhost, *.localhost, and 0.0.0.0. A real install always points at
the customer gateway URL, never loopback, so a genuine customer event is
never suppressed.

Keyed on the event domain, not a CI env var or the OS account name: an
env var would also drop report_to_sentry()'s own dedup/cap/breaker tests
(which run under CI), and the runner/runneradmin account name would
false-positive on a real customer using that service account. The guard
never raises and defaults to keeping the event.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@anonpran

Copy link
Copy Markdown
Contributor Author

Cleaned up the comments (diff is now comment-free apart from three one-line docstrings) and addressed the open review threads in 2030019:

  • Username-only drop (r3479491309 / Greptile): already resolved in 43c870d — the runner/runneradmin system_user signal was removed entirely. The filter keys only on the loopback domain now, so a customer running under one of those service accounts is never suppressed.
  • Hash fragments break loopback matching (Cursor, low): fixed — host extraction now also strips the URL fragment (...split("#", 1)[0]), so http://127.0.0.1#… is matched.
  • Reserved .localhost subdomains (Cursor, low): fixed — now matches localhost and any *.localhost host (RFC 6761 reserved → always loopback, FP-safe), so http://app.localhost is matched.

Verified: loopback spellings (incl. #-fragment and *.localhost) match; real gateways/FQDNs (api.getunbound.ai, localhost.evil.com, 10.0.0.5) do not; existing report_to_sentry() transport tests pass under GITHUB_ACTIONS=true.

@anonpran anonpran force-pushed the fix/sentry-ci-noise-filter branch from 43c870d to 2030019 Compare June 26, 2026 06:41
@vigneshsubbiah16

Copy link
Copy Markdown
Contributor

🛡️ Automated Security Review (consensus)

0 findings — 0 high-confidence, 0 to triage. Reviewers: Cursor, Claude, Semgrep, Gitleaks.

✅ Security consensus: no issues found. (reviewers: Cursor, Claude, Semgrep, Gitleaks)

Previously acknowledged (not re-flagged)

  • CI env-based suppression (GITHUB_ACTIONS/CI) — Removed entirely; maintainer noted it dropped all telemetry under CI (including real gateway domains) and broke report_to_sentry() transport tests; filter is now event-scoped only.
  • Loopback alias bypass (127.1, expanded IPv6, IPv4-mapped) — Fixed via socket.inet_aton / inet_pton normalization; Greptile confirmed coverage and no customer-domain false positives.
  • Username-only suppression (runner/runneradmin) — Removed; maintainer/Greptile agreed loopback domain alone covers observed CI noise with zero FP risk for real installs.

🤖 consensus review · reviewers: Cursor,Claude,Semgrep,Gitleaks · head 43c870d2 · 2026-06-26T06:41Z

@vigneshsubbiah16

Copy link
Copy Markdown
Contributor

🛡️ Automated Security Review (consensus)

0 findings — 0 high-confidence, 0 to triage. Reviewers: Cursor, Claude, Semgrep, Gitleaks.

✅ Security consensus: no issues found. (reviewers: Cursor, Claude, Semgrep, Gitleaks)

Previously acknowledged (not re-flagged)

  • Generic CI env suppresses telemetryCI/GITHUB_ACTIONS gating removed; filter keys only on loopback domain (@anonpran).
  • Loopback alias bypass — Host normalized via socket.inet_aton/inet_pton to catch shorthand IPv4, expanded IPv6, and IPv4-mapped forms (@anonpran, confirmed by Greptile).
  • Username-only drop (runner/runneradmin)system_user signal removed entirely; loopback domain alone covers observed noise with zero FP risk (@anonpran, confirmed by Greptile).
  • Hash fragments break loopback matching — Host extraction now strips URL fragments (#…) (@anonpran, 2030019).
  • Reserved .localhost subdomainslocalhost and *.localhost now matched as loopback (@anonpran, 2030019).

🤖 consensus review · reviewers: Cursor,Claude,Semgrep,Gitleaks · head 20300198 · 2026-06-26T06:50Z

@anonpran anonpran merged commit e85dd50 into main Jun 26, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants