Skip to content

Commit 743ff24

Browse files
author
Gereon Elvers
committed
Radboud: treat HTTP 500 as fail-fast not-found, skip retry chain
The Radboud WebDAV server returns HTTP 500 (with empty body) when any intermediate folder on a requested path does not exist, instead of the expected 404. Combined with the dataset H5/FIF fallback chain, which *always* probes derivative paths ("./derivatives/serialised/...") that do not exist on the read-only remote, this caused two ~60-second silent retry chains before the raw .ds download could even start. Fix: - treat 500 the same as 404 (FileNotFoundError) in both _propfind and the GET worker; only true transient errors (502/503/504, network errors) still hit the retry path. - _schedule_download now PROPFINDs the file before queueing the GET, so a missing path raises FileNotFoundError immediately and the dataset class falls through to local creation without waiting. Verified end-to-end on the VM: Schoffelen sub-A2002 task=rest cold- build now takes 87 s (was 260 s); no retry log lines emitted.
1 parent 5cb554f commit 743ff24

1 file changed

Lines changed: 26 additions & 10 deletions

File tree

pnpl/datasets/mixins/radboud_download.py

Lines changed: 26 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -221,10 +221,17 @@ def _schedule_download(self, fpath: str):
221221

222222
with self._lock:
223223
if fpath not in self._download_futures:
224-
# Resolve size lazily so we have a content length up front
225-
# for the progress bar (requests' Content-Length header is
226-
# also a fallback inside the worker).
227-
size = type(self)._lookup_remote_size(rel_path)
224+
# PROPFIND first: raises FileNotFoundError immediately if
225+
# the remote path doesn't exist (including Radboud's
226+
# quirky 500-on-missing-parent), so dataset classes can
227+
# fall back to local processing without waiting through
228+
# the worker's retry chain.
229+
stat = type(self)._stat(rel_path)
230+
size = stat.get("size")
231+
try:
232+
size = int(size) if size is not None else None
233+
except (TypeError, ValueError):
234+
size = None
228235
self._download_futures[fpath] = self._executor.submit(
229236
self._download_with_retry_static,
230237
fpath=fpath,
@@ -333,10 +340,14 @@ def _propfind(cls, url: str, auth: HTTPBasicAuth, depth: str) -> list[dict]:
333340
f"Check {cls.RADBOUD_USERNAME_ENV} / "
334341
f"{cls.RADBOUD_PASSWORD_ENV}."
335342
)
336-
if resp.status_code == 404:
337-
raise FileNotFoundError(
338-
f"Radboud WebDAV 404: {url}"
339-
)
343+
if resp.status_code == 404 or resp.status_code == 500:
344+
# Radboud's WebDAV returns 500 (not 404) when any
345+
# parent folder on the path is missing. Both mean
346+
# "the path does not exist" for our purposes — fail
347+
# fast so dataset classes can fall back to the
348+
# local-creation path instead of waiting through a
349+
# 60+ s retry chain.
350+
raise FileNotFoundError(f"Radboud WebDAV {resp.status_code}: {url}")
340351
if resp.status_code == 207:
341352
return cls._parse_multistatus(resp.text)
342353
if resp.status_code >= 500:
@@ -433,8 +444,13 @@ def _download_with_retry_static(
433444
f"Check {cls.RADBOUD_USERNAME_ENV} / "
434445
f"{cls.RADBOUD_PASSWORD_ENV}."
435446
)
436-
if r.status_code == 404:
437-
raise FileNotFoundError(f"Radboud WebDAV 404: {download_url}")
447+
if r.status_code == 404 or r.status_code == 500:
448+
# See note in _propfind: Radboud's WebDAV returns
449+
# 500 (not 404) when an intermediate folder
450+
# doesn't exist. Treat both as fail-fast not-found.
451+
raise FileNotFoundError(
452+
f"Radboud WebDAV {r.status_code}: {download_url}"
453+
)
438454
if r.status_code >= 500:
439455
raise HTTPError(f"WebDAV server error {r.status_code}")
440456
r.raise_for_status()

0 commit comments

Comments
 (0)