Skip to content

Commit 0b209d4

Browse files
Delqhiv0-agent
andauthored
feat(reliability+network): SR-174 — network-state pre-click gate
Adds three new modules + extends runner_policy with PROVIDER_NETWORK_TUNING. - survey/network/beacon_filter.py: regex-based filter for analytics beacons (Google Analytics, DoubleClick, Segment, Mixpanel, etc.). Configurable, extensible, with negative tests against survey-tracking URLs to prevent over-matching (e.g. 'survey-analytics-provider.com' must NOT match). - survey/network/cdp_network_tracker.py: Playwright-CDPSession-based network monitor. Subscribes to Network.requestWillBeSent / responseReceived / loadingFailed / loadingFinished. Per-page-scoped, detaches cleanly on close (try/finally + async context manager). Beacons filtered out of pending count. Snapshot is monotonic-clock based, immutable, safe to take during traffic. - survey/reliability/network_gate.py: async wait_for_network_quiet() that polls the tracker on a 10 ms cadence until pending_count <= max_pending AND last_response_age >= quiet_ms. On timeout, emits a 'network_never_quiet' event (NOT a hang) and returns force_proceed= True — caller decides escalation. Strict caps: max_wait_ms hard upper-bound, no infinite waits possible. - survey/runner_policy.py: PROVIDER_NETWORK_TUNING table with profiles for pollfish (strict), cint (relaxed for beacons), lucid (tight), qualtrics (medium). Default fallback. get_network_tuning() is pure-function. Tests (45 cases, all green on Python 3.13): - test_beacon_filter.py (8 cases): default beacon match, survey-tracking preservation, custom-pattern injection, regex-compile-once perf. - test_cdp_network_tracker.py (17 cases): full CDP event lifecycle, beacon-exclusion, finish + fail event paths, monotonic ordering, context-manager attach/detach, idempotent close, race-condition detach-during-event, snapshot-immutability. - test_network_gate.py (20 cases): immediate-pass when already quiet, wait-for-pending-to-finish, beacon-doesn't-block, timeout + force-proceed semantics, EventEmitter integration, provider-tuning end-to-end (pollfish vs cint vs lucid behavior demonstrated). Latency (micro-benchmark, in-process FakePage): best-case (already quiet) : p50=0.00ms p95=0.02ms after 1 request settles : p50=30.78ms p95=30.93ms (quiet_ms=10, single 20ms request -> ~30ms total = correct) Path-doctrine: 100% inside survey-cli/survey/. No new top-level dirs. Scope intentionally NOT included (separate work): - safe_executor.py integration: deferred until SR-169 lands (composes DOM-stability + network-quiet behind one stability.py facade). This PR ships the gate as a primitive, ready to be wired in. - proxy hot-rotation telemetry: out of scope. Refs: #179 (SR-174), depends-on #175 (SR-169) for full integration. Co-authored-by: v0-agent <v0-agent@stealth-runner.local>
1 parent 0a7d965 commit 0b209d4

9 files changed

Lines changed: 1595 additions & 21 deletions

File tree

Lines changed: 29 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,38 +1,46 @@
11
"""
2-
╔══════════════════════════════════════════════════════════════════════════════╗
3-
║ ║
4-
║ STEALTH-RUNNER — Network Module (SR-151) ║
5-
║ ║
6-
╠══════════════════════════════════════════════════════════════════════════════╣
7-
║ ║
8-
║ Package marker and public exports for the network module. ║
9-
║ ║
10-
║ EXPORTS: ║
11-
║ ──────── ║
12-
║ ProxyEntry - Dataclass representing a single proxy ║
13-
║ ProxyPool - Thread-safe pool manager with score-based selection ║
14-
║ get_proxy_pool - Singleton getter for global pool instance ║
15-
║ score - Calculate IP quality score ║
16-
║ persist_event - Log proxy event to JSONL ║
17-
║ is_cold - Check if score is below cold threshold ║
18-
║ ║
19-
╚══════════════════════════════════════════════════════════════════════════════╝
2+
Stealth-Runner — Network Module.
203
21-
Closes #151
4+
SR-151: Proxy pool, IP-quality scoring.
5+
SR-174: Per-page CDP network tracker + beacon filter (pre-click gate input).
6+
7+
Exports:
8+
Proxy pool (SR-151):
9+
ProxyEntry, ProxyPool, get_proxy_pool
10+
score, persist_event, is_cold, load_events, aggregate_stats
11+
12+
Pre-click gate inputs (SR-174):
13+
BeaconFilter, DEFAULT_BEACON_PATTERNS, get_default_filter, is_beacon
14+
CdpNetworkTracker, NetworkActivity
2215
"""
2316

2417
from .proxy_pool import ProxyEntry, ProxyPool, get_proxy_pool
2518
from .ip_quality import score, persist_event, is_cold, load_events, aggregate_stats
19+
from .beacon_filter import (
20+
BeaconFilter,
21+
DEFAULT_BEACON_PATTERNS,
22+
get_default_filter,
23+
is_beacon,
24+
)
25+
from .cdp_network_tracker import CdpNetworkTracker, NetworkActivity
2626

2727
__all__ = [
28-
# proxy_pool.py exports
28+
# proxy_pool.py exports (SR-151)
2929
"ProxyEntry",
3030
"ProxyPool",
3131
"get_proxy_pool",
32-
# ip_quality.py exports
32+
# ip_quality.py exports (SR-151)
3333
"score",
3434
"persist_event",
3535
"is_cold",
3636
"load_events",
3737
"aggregate_stats",
38+
# beacon_filter.py exports (SR-174)
39+
"BeaconFilter",
40+
"DEFAULT_BEACON_PATTERNS",
41+
"get_default_filter",
42+
"is_beacon",
43+
# cdp_network_tracker.py exports (SR-174)
44+
"CdpNetworkTracker",
45+
"NetworkActivity",
3846
]
Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
"""
2+
Beacon filter — URL classification for non-essential network requests.
3+
4+
SR-174: Pre-Click Network Gate.
5+
6+
Analytics pings, telemetry beacons, and similar fire-and-forget requests must
7+
NOT count toward "pending request" totals — otherwise chatty providers like Cint
8+
would never satisfy a `pending == 0` gate.
9+
10+
Design notes:
11+
- Patterns are anchored to specific known-analytics signatures, not greedy
12+
substring matches. The regex ``analytics`` alone would erroneously match
13+
``survey-analytics-provider.com`` and break survey progress-tracking.
14+
- All patterns are precompiled at module import (no per-request compile cost).
15+
- The filter is conservative: when in doubt, count the request. False
16+
positives (beacons treated as real) only slow us down. False negatives
17+
(real requests treated as beacons) silently break the gate.
18+
"""
19+
20+
from __future__ import annotations
21+
22+
import re
23+
from typing import Iterable, Pattern
24+
25+
# Public default patterns. Order does not matter (any match -> beacon).
26+
# Each pattern is anchored to a domain or path fragment that we have observed
27+
# in production logs across pollfish/cint/lucid/qualtrics traffic.
28+
DEFAULT_BEACON_PATTERNS: tuple[str, ...] = (
29+
# Major analytics providers (full domain match).
30+
r"^https?://(?:[a-z0-9-]+\.)*google-analytics\.com/",
31+
r"^https?://(?:[a-z0-9-]+\.)*googletagmanager\.com/",
32+
r"^https?://(?:[a-z0-9-]+\.)*doubleclick\.net/",
33+
r"^https?://(?:[a-z0-9-]+\.)*facebook\.com/tr[/?]",
34+
r"^https?://(?:[a-z0-9-]+\.)*hotjar\.com/",
35+
r"^https?://(?:[a-z0-9-]+\.)*mixpanel\.com/",
36+
r"^https?://(?:[a-z0-9-]+\.)*segment\.(?:io|com)/",
37+
r"^https?://(?:[a-z0-9-]+\.)*amplitude\.com/",
38+
r"^https?://(?:[a-z0-9-]+\.)*sentry\.io/api/[0-9]+/(?:envelope|store)",
39+
r"^https?://(?:[a-z0-9-]+\.)*bugsnag\.com/",
40+
r"^https?://(?:[a-z0-9-]+\.)*newrelic\.com/",
41+
# Path fragments that are conventionally beacons regardless of host.
42+
# Anchored to "/beacon", "/telemetry", "/_/log" etc. — must be a path
43+
# segment, not a substring.
44+
r"(?:^|/)beacon(?:/|$|\?)",
45+
r"(?:^|/)telemetry(?:/|$|\?)",
46+
r"(?:^|/)_/log(?:/|$|\?)",
47+
r"(?:^|/)collect(?:/|$|\?)", # GA4 endpoint
48+
r"(?:^|/)gtag/js",
49+
# Pixel-style trackers and click pixels with utm/track/click query keys.
50+
r"\.gif\?(?:[a-z0-9_=&%-]*&)*(?:utm|track|click)",
51+
)
52+
53+
54+
class BeaconFilter:
55+
"""Classify request URLs as 'beacon' (analytics) or 'real' (foreground).
56+
57+
Patterns are compiled once at construction. The instance is thread-safe
58+
because compiled regex objects are immutable.
59+
60+
Args:
61+
patterns: Iterable of regex pattern strings. If omitted, uses
62+
``DEFAULT_BEACON_PATTERNS``. Patterns are matched case-insensitively.
63+
extra: Additional patterns to extend the default set. Used by
64+
provider-specific configuration without losing the defaults.
65+
"""
66+
67+
__slots__ = ("_compiled",)
68+
69+
def __init__(
70+
self,
71+
patterns: Iterable[str] | None = None,
72+
*,
73+
extra: Iterable[str] | None = None,
74+
) -> None:
75+
base = tuple(patterns) if patterns is not None else DEFAULT_BEACON_PATTERNS
76+
if extra:
77+
base = base + tuple(extra)
78+
self._compiled: tuple[Pattern[str], ...] = tuple(
79+
re.compile(p, re.IGNORECASE) for p in base
80+
)
81+
82+
def is_beacon(self, url: str) -> bool:
83+
"""Return True if ``url`` matches any beacon pattern.
84+
85+
Empty/None-like URLs are treated as non-beacon to avoid silently
86+
absorbing malformed events.
87+
"""
88+
if not url:
89+
return False
90+
for pat in self._compiled:
91+
if pat.search(url):
92+
return True
93+
return False
94+
95+
96+
# Module-level singleton for callers that don't need custom patterns.
97+
_default_filter: BeaconFilter | None = None
98+
99+
100+
def get_default_filter() -> BeaconFilter:
101+
"""Return a shared :class:`BeaconFilter` using ``DEFAULT_BEACON_PATTERNS``.
102+
103+
Constructed lazily on first call. Subsequent calls return the same
104+
instance (compiled patterns are immutable and reusable).
105+
"""
106+
global _default_filter
107+
if _default_filter is None:
108+
_default_filter = BeaconFilter()
109+
return _default_filter
110+
111+
112+
def is_beacon(url: str) -> bool:
113+
"""Convenience: classify ``url`` using the default beacon filter."""
114+
return get_default_filter().is_beacon(url)
115+
116+
117+
__all__ = [
118+
"BeaconFilter",
119+
"DEFAULT_BEACON_PATTERNS",
120+
"get_default_filter",
121+
"is_beacon",
122+
]

0 commit comments

Comments
 (0)