Skip to content

Commit 48cd891

Browse files
committed
fix(relay): classify script quarantine by failure type
A single cooldown duration is too coarse for Apps Script deployment failures. Quota exhaustion and account-level authorization failures recover on a much longer cadence than transient Google edge or Apps Script backend failures. Treating both classes the same either probes exhausted deployments too aggressively or removes transiently unhealthy deployments for longer than necessary. Relay failure handling now classifies script failures into two explicit quarantine classes. HTTP 429, HTTP 403, and response bodies that match quota or service-invocation limit text are treated as hard quota/account failures and quarantined for 24 hours. Google or Apps Script transient 5xx responses are treated as temporary relay failures and use the existing short cooldown window. The transient class is deliberately narrow. Generic upstream 5xx bodies such as a destination-origin bad gateway do not quarantine a script ID by themselves; the body must look like a Google, Apps Script, GFE, backend, service-unavailable, temporary, or timeout failure. This avoids punishing healthy deployments for ordinary origin-side errors that Apps Script relayed correctly. The same classifier is used across the direct relay path, h1 fallback path, tunnel single-operation path, and tunnel batch path. Quota-like errors returned inside the Apps Script JSON envelope still force the hard quarantine path even when the outer HTTP status is 200. The English and Persian guides now describe auto-quarantine as two failure classes instead of a single ten-minute blacklist. Unit coverage verifies hard quota/account classification, transient Google-edge classification, ordinary upstream 5xx pass-through, and the quarantine durations for both classes.
1 parent 6c839f7 commit 48cd891

3 files changed

Lines changed: 101 additions & 24 deletions

File tree

docs/guide.fa.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -350,7 +350,7 @@ sni_hosts = ["www.google.com", "drive.google.com", "docs.google.com"]
350350
| Connection pool | TTL ۴۵ ثانیه، حداکثر ۲۰ idle |
351351
| رمزگشایی gzip | اتوماتیک |
352352
| چند اسکریپت | چرخش round-robin |
353-
| Blacklist خودکار | روی خطای 429 / quota، با cooldown ۱۰ دقیقه |
353+
| قرنطینهٔ خودکار اسکریپت | خطاهای quota/account برای ۲۴ ساعت؛ خطاهای گذرای relay با cooldown کوتاه |
354354
| کش پاسخ | ۵۰ مگابایت، FIFO + TTL، آگاه از `Cache-Control: max-age`، heuristic برای static asset |
355355
| Coalescing | GETهای یکسان همزمان یک fetch upstream را به اشتراک می‌گذارند |
356356
| تونل بازنویسی SNI | مستقیم به لبهٔ گوگل (بدون رله) برای `google.com`، `youtube.com`، `youtu.be`، `youtube-nocookie.com`، `fonts.googleapis.com` — دامنه‌های اضافی از فیلد `hosts` |

docs/guide.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -346,7 +346,7 @@ This port focuses on the **`apps_script` mode** — the only one that reliably w
346346
- [x] Connection pooling (45 s TTL, max 20 idle)
347347
- [x] Gzip response decoding
348348
- [x] Multi-script round-robin
349-
- [x] Auto-blacklist failing scripts on 429 / quota errors (10 min cooldown)
349+
- [x] Auto-quarantine failing scripts: quota/account failures for 24 h, transient relay failures for a short cooldown
350350
- [x] Response cache (50 MB, FIFO + TTL, `Cache-Control: max-age` aware, heuristics for static assets)
351351
- [x] Request coalescing: concurrent identical GETs share one upstream fetch
352352
- [x] SNI-rewrite tunnels for `google.com`, `youtube.com`, `youtu.be`, `youtube-nocookie.com`, `fonts.googleapis.com`, configurable via `hosts` map

src/domain_fronter.rs

Lines changed: 99 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -471,7 +471,30 @@ impl HostStat {
471471
}
472472
}
473473

474-
const BLACKLIST_COOLDOWN_SECS: u64 = 600;
474+
const TRANSIENT_SCRIPT_COOLDOWN_SECS: u64 = 600;
475+
const HARD_SCRIPT_QUARANTINE_SECS: u64 = 24 * 60 * 60;
476+
477+
#[derive(Debug, Copy, Clone, PartialEq, Eq)]
478+
enum ScriptQuarantine {
479+
Hard,
480+
Transient,
481+
}
482+
483+
impl ScriptQuarantine {
484+
fn cooldown(self) -> Duration {
485+
match self {
486+
ScriptQuarantine::Hard => Duration::from_secs(HARD_SCRIPT_QUARANTINE_SECS),
487+
ScriptQuarantine::Transient => Duration::from_secs(TRANSIENT_SCRIPT_COOLDOWN_SECS),
488+
}
489+
}
490+
491+
fn label(self) -> &'static str {
492+
match self {
493+
ScriptQuarantine::Hard => "hard quota/account quarantine",
494+
ScriptQuarantine::Transient => "transient relay cooldown",
495+
}
496+
}
497+
}
475498

476499
/// Auto-blacklist defaults are now per-instance fields on `DomainFronter`,
477500
/// driven by `Config::auto_blacklist_strikes` / `_window_secs` /
@@ -893,11 +916,11 @@ impl DomainFronter {
893916
picked
894917
}
895918

896-
fn blacklist_script(&self, script_id: &str, reason: &str) {
919+
fn quarantine_script(&self, script_id: &str, quarantine: ScriptQuarantine, reason: &str) {
897920
self.blacklist_script_for(
898921
script_id,
899-
Duration::from_secs(BLACKLIST_COOLDOWN_SECS),
900-
reason,
922+
quarantine.cooldown(),
923+
&format!("{}: {}", quarantine.label(), reason),
901924
);
902925
}
903926

@@ -2547,8 +2570,12 @@ impl DomainFronter {
25472570
.chars()
25482571
.take(200)
25492572
.collect::<String>();
2550-
if should_blacklist(status, &body_txt) {
2551-
self.blacklist_script(&script_id, &format!("HTTP {}", status));
2573+
if let Some(quarantine) = classify_script_failure(status, &body_txt) {
2574+
self.quarantine_script(
2575+
&script_id,
2576+
quarantine,
2577+
&format!("HTTP {}", status),
2578+
);
25522579
}
25532580
return Err(FronterError::Relay(format!(
25542581
"Apps Script HTTP {}: {}",
@@ -2558,7 +2585,7 @@ impl DomainFronter {
25582585
return parse_relay_json(&resp_body).map_err(|e| {
25592586
if let FronterError::Relay(ref msg) = e {
25602587
if looks_like_quota_error(msg) {
2561-
self.blacklist_script(&script_id, msg);
2588+
self.quarantine_script(&script_id, ScriptQuarantine::Hard, msg);
25622589
}
25632590
}
25642591
e
@@ -2656,8 +2683,12 @@ impl DomainFronter {
26562683
.chars()
26572684
.take(200)
26582685
.collect::<String>();
2659-
if should_blacklist(status, &body_txt) {
2660-
self.blacklist_script(&script_id, &format!("HTTP {}", status));
2686+
if let Some(quarantine) = classify_script_failure(status, &body_txt) {
2687+
self.quarantine_script(
2688+
&script_id,
2689+
quarantine,
2690+
&format!("HTTP {}", status),
2691+
);
26612692
}
26622693
return Err(FronterError::Relay(format!(
26632694
"Apps Script HTTP {}: {}",
@@ -2669,7 +2700,11 @@ impl DomainFronter {
26692700
Err(e) => {
26702701
if let FronterError::Relay(ref msg) = e {
26712702
if looks_like_quota_error(msg) {
2672-
self.blacklist_script(&script_id, msg);
2703+
self.quarantine_script(
2704+
&script_id,
2705+
ScriptQuarantine::Hard,
2706+
msg,
2707+
);
26732708
}
26742709
}
26752710
Err(e)
@@ -3066,8 +3101,8 @@ impl DomainFronter {
30663101
.chars()
30673102
.take(200)
30683103
.collect::<String>();
3069-
if should_blacklist(status, &body_txt) {
3070-
self.blacklist_script(script_id, &format!("HTTP {}", status));
3104+
if let Some(quarantine) = classify_script_failure(status, &body_txt) {
3105+
self.quarantine_script(script_id, quarantine, &format!("HTTP {}", status));
30713106
}
30723107
return Err(FronterError::Relay(format!(
30733108
"tunnel HTTP {}: {}",
@@ -3256,8 +3291,8 @@ impl DomainFronter {
32563291
.chars()
32573292
.take(200)
32583293
.collect::<String>();
3259-
if should_blacklist(status, &body_txt) {
3260-
self.blacklist_script(script_id, &format!("HTTP {}", status));
3294+
if let Some(quarantine) = classify_script_failure(status, &body_txt) {
3295+
self.quarantine_script(script_id, quarantine, &format!("HTTP {}", status));
32613296
}
32623297
return Err(FronterError::Relay(format!(
32633298
"batch tunnel HTTP {}: {}",
@@ -4987,11 +5022,17 @@ impl StatsSnapshot {
49875022
}
49885023
}
49895024

4990-
fn should_blacklist(status: u16, body: &str) -> bool {
5025+
fn classify_script_failure(status: u16, body: &str) -> Option<ScriptQuarantine> {
49915026
if status == 429 || status == 403 {
4992-
return true;
5027+
return Some(ScriptQuarantine::Hard);
5028+
}
5029+
if looks_like_quota_error(body) {
5030+
return Some(ScriptQuarantine::Hard);
49935031
}
4994-
looks_like_quota_error(body)
5032+
if matches!(status, 500 | 502 | 503 | 504) && looks_like_transient_script_error(body) {
5033+
return Some(ScriptQuarantine::Transient);
5034+
}
5035+
None
49955036
}
49965037

49975038
fn looks_like_quota_error(msg: &str) -> bool {
@@ -5008,6 +5049,20 @@ fn looks_like_quota_error(msg: &str) -> bool {
50085049
|| lower.contains("limit exceeded")
50095050
}
50105051

5052+
fn looks_like_transient_script_error(msg: &str) -> bool {
5053+
let lower = msg.to_ascii_lowercase();
5054+
lower.contains("google")
5055+
|| lower.contains("apps script")
5056+
|| lower.contains("script.google")
5057+
|| lower.contains("googleusercontent")
5058+
|| lower.contains("gfe")
5059+
|| lower.contains("backend error")
5060+
|| lower.contains("service unavailable")
5061+
|| lower.contains("temporary")
5062+
|| lower.contains("timeout")
5063+
|| lower.contains("timed out")
5064+
}
5065+
50115066
fn mask_script_id(id: &str) -> String {
50125067
let n = id.chars().count();
50135068
if n <= 8 {
@@ -6447,18 +6502,40 @@ hello";
64476502

64486503
#[test]
64496504
fn blacklist_heuristics() {
6450-
assert!(should_blacklist(429, ""));
6451-
assert!(should_blacklist(403, "quota"));
6452-
assert!(should_blacklist(500, "Service invoked too many times per day: urlfetch"));
6453-
assert!(!should_blacklist(200, ""));
6454-
assert!(!should_blacklist(502, "bad gateway"));
6505+
assert_eq!(classify_script_failure(429, ""), Some(ScriptQuarantine::Hard));
6506+
assert_eq!(
6507+
classify_script_failure(403, "quota"),
6508+
Some(ScriptQuarantine::Hard)
6509+
);
6510+
assert_eq!(
6511+
classify_script_failure(500, "Service invoked too many times per day: urlfetch"),
6512+
Some(ScriptQuarantine::Hard)
6513+
);
6514+
assert_eq!(classify_script_failure(502, "bad gateway"), None);
6515+
assert_eq!(
6516+
classify_script_failure(502, "Google backend error"),
6517+
Some(ScriptQuarantine::Transient)
6518+
);
6519+
assert_eq!(classify_script_failure(200, ""), None);
64556520
assert!(looks_like_quota_error("Exception: Service invoked too many times per day"));
64566521
assert!(looks_like_quota_error(
64576522
"Exception: Bandbreitenkontingent überschritten: https://example.com. Verringern Sie die Datenübertragungsrate."
64586523
));
64596524
assert!(!looks_like_quota_error("bad url"));
64606525
}
64616526

6527+
#[test]
6528+
fn script_quarantine_durations_match_failure_class() {
6529+
assert_eq!(
6530+
ScriptQuarantine::Hard.cooldown(),
6531+
Duration::from_secs(HARD_SCRIPT_QUARANTINE_SECS)
6532+
);
6533+
assert_eq!(
6534+
ScriptQuarantine::Transient.cooldown(),
6535+
Duration::from_secs(TRANSIENT_SCRIPT_COOLDOWN_SECS)
6536+
);
6537+
}
6538+
64626539
#[test]
64636540
fn mask_script_id_hides_middle() {
64646541
assert_eq!(mask_script_id("short"), "***");

0 commit comments

Comments
 (0)