Skip to content

Commit 7c2ed4b

Browse files
authored
Improve documentation and clarity in fetch pipeline
While exploring the fetch pipeline in ScanCode.io for GSoC 2026, I reviewed the data fetching logic and identified areas where documentation and clarity could be improved. This PR: - Enhances docstrings for key functions - Improves inline comments for better readability These improvements help in understanding how external data is fetched and processed, especially in context of the advisory-based migration in VulnerableCode. Signed-off-by: Sujay Barui <sujaybarui4679@gmail.com>
1 parent dce8dbd commit 7c2ed4b

1 file changed

Lines changed: 23 additions & 5 deletions

File tree

scanpipe/pipes/fetch.py

Lines changed: 23 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,15 @@
6363

6464

6565
def get_request_session(uri):
66-
"""Return a Requests session setup with authentication and headers."""
66+
"""
67+
Return a configured Requests session for a given URI.
68+
69+
This includes:
70+
- Applying authentication (basic or digest) if configured
71+
- Attaching custom headers based on the target domain
72+
73+
The configuration is derived from ScanCode.io settings.
74+
"""
6775
session = requests.Session()
6876
netloc = urlparse(uri).netloc
6977

@@ -81,14 +89,23 @@ def get_request_session(uri):
8189

8290
def fetch_http(uri, to=None):
8391
"""
84-
Download a given `uri` in a temporary directory and return the directory's
85-
path.
92+
Download content from a given URI and store it locally.
93+
94+
- Uses a configured request session for authentication and headers
95+
- Determines filename from content-disposition or URL
96+
- Saves file to a temporary or provided directory
97+
- Computes file checksums (md5, sha1)
98+
99+
Returns:
100+
Download: metadata about the downloaded file
86101
"""
87102
request_session = get_request_session(uri)
88103
response = request_session.get(uri, timeout=HTTP_REQUEST_TIMEOUT)
89104

90105
if response.status_code != 200:
91-
raise requests.RequestException
106+
# Raise exception if the request did not succeed
107+
# (non-200 HTTP response from server)
108+
raise requests.RequestException
92109

93110
content_disposition = response.headers.get("content-disposition", "")
94111
_, params = parse_header_parameters(content_disposition)
@@ -397,7 +414,8 @@ def fetch_urls(urls):
397414
try:
398415
downloaded = fetch_url(url)
399416
except Exception:
400-
errors.append(url)
417+
# Capture failed URL fetch attempts for reporting
418+
errors.append(url)
401419
else:
402420
downloads.append(downloaded)
403421

0 commit comments

Comments
 (0)