Skip to content

Commit 5d25c74

Browse files
committed
docs: soften the no-duplicate-scans claim to cover the 504 race
In the investigated failure mode (backend stops reading the body mid-upload) no scan is created server-side, so a retry cannot duplicate one. But a gateway timeout can in principle race a request the origin later completes, in which case a retry creates a second scan. That case is benign - the orphaned scan is superseded by the retried one as pending head, the same outcome as running the CLI twice - but the comments, docstring, and changelog should not claim duplicates are impossible.
1 parent 320dcb8 commit 5d25c74

2 files changed

Lines changed: 19 additions & 13 deletions

File tree

CHANGELOG.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,11 @@
99
request timeouts — up to 3 total attempts with increasing waits (~10s, then ~30s, plus
1010
jitter). Production gateways occasionally drop an upload mid-request when a backend pod
1111
stalls and stops reading the body (the client sees a 502 after ~30s); these episodes are
12-
transient and a retried upload almost always succeeds. Since the server never finished
13-
reading the request body, no scan was created, so retrying cannot duplicate a scan.
12+
transient and a retried upload almost always succeeds. In this failure mode the server
13+
never finished reading the request body, so no scan was created and a retry does not
14+
duplicate one; in the rare case where a gateway timeout races a request the server later
15+
completes, the extra scan is benign and superseded by the retried one (as if the CLI had
16+
run twice).
1417
Non-transient errors (400/401/403/404/429 and error payloads) are never retried. Each
1518
retry logs a warning explaining what failed and when the next attempt happens.
1619

socketsecurity/core/__init__.py

Lines changed: 14 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -85,8 +85,10 @@
8585
# Full scan upload retry policy. Production gateways occasionally drop an upload mid-request
8686
# (a backend pod stalls and stops reading the body; the client then sees a 502/408 or a reset
8787
# connection). Those episodes are transient and pod-local: a retried upload routed to another
88-
# backend almost always succeeds, and because the server never finished reading the request
89-
# body, no scan was created, so retrying cannot duplicate a scan.
88+
# backend almost always succeeds. In this failure mode the server never finished reading the
89+
# request body, so no scan was created and a retry does not duplicate one. (A duplicate is
90+
# possible only if a gateway timeout races a request the server later completes; that is
91+
# benign - the retried scan supersedes the orphaned one, same as running the CLI twice.)
9092
FULL_SCAN_UPLOAD_MAX_ATTEMPTS = 3
9193
# Wait before retry attempt 2 and attempt 3 respectively (plus a little jitter so a fleet of
9294
# CI jobs hitting the same episode doesn't retry in lock-step).
@@ -104,10 +106,10 @@
104106
def _is_transient_full_scan_upload_error(error: Exception) -> bool:
105107
"""Whether a full-scan upload failure is transient and safe to retry.
106108
107-
Transient means the failure happened at the gateway/connection level before the server
108-
finished reading the request body (so no scan was created server-side): HTTP 502/503/504/408,
109-
client-side timeouts, and dropped/reset connections. 4xx client errors (400/401/403/404/429)
110-
and success responses carrying an error payload are never retried.
109+
Transient means the failure happened at the gateway/connection level, normally before the
110+
server finished reading the request body (so no scan was created server-side): HTTP
111+
502/503/504/408, client-side timeouts, and dropped/reset connections. 4xx client errors
112+
(400/401/403/404/429) and success responses carrying an error payload are never retried.
111113
"""
112114
if isinstance(error, (APIBadGateway, APIConnectionError, APITimeout)):
113115
# 502 / connection reset-dropped / request timeout - the SDK raises dedicated classes.
@@ -835,11 +837,12 @@ def create_full_scan(self, files: List[str], params: FullScanParams, base_paths:
835837
try:
836838
# Retry transient gateway/timeout failures (502/503/504/408, dropped connections,
837839
# timeouts) with increasing waits; a stalled backend pod recovers or gets routed
838-
# around within minutes, and since it never finished reading the request body no
839-
# scan was created, so a retry cannot duplicate one. fullscans.post() rebuilds its
840-
# lazy file loaders from the plain paths in upload_files on every call, so simply
841-
# calling it again per attempt is safe. The loop must stay inside this try so the
842-
# temp .br files (cleaned up in the finally below) outlive every attempt.
840+
# around within minutes, and in this failure mode the server never finished reading
841+
# the request body, so no scan was created and a retry does not duplicate one (see
842+
# the retry-policy comment above FULL_SCAN_UPLOAD_MAX_ATTEMPTS). fullscans.post()
843+
# rebuilds its lazy file loaders from the plain paths in upload_files on every call,
844+
# so simply calling it again per attempt is safe. The loop must stay inside this try
845+
# so the temp .br files (cleaned up in the finally below) outlive every attempt.
843846
for attempt in range(1, FULL_SCAN_UPLOAD_MAX_ATTEMPTS + 1):
844847
try:
845848
res = self.sdk.fullscans.post(upload_files, params, use_types=True, use_lazy_loading=True, max_open_files=50, base_paths=base_paths)

0 commit comments

Comments
 (0)