fix(sources/virustotal): cap pagination + clearer 429 message#1788
fix(sources/virustotal): cap pagination + clearer 429 message#1788ChrisJr404 wants to merge 1 commit into
Conversation
The free VirusTotal API returns 40 results per page and grants 500 requests/day. Following the cursor blindly on a single popular root domain can therefore exhaust the daily quota in one invocation, after which every other VT call in the run gets HTTP 429 wrapped as a generic "unexpected status code 429" with no hint that the cause is the free tier (projectdiscovery#1718). Cap the pagination at 10 pages (~400 subdomains) by default so a single domain can no longer monopolise the quota, and surface a dedicated "virustotal free-tier quota exhausted" error when 429 is observed so operators know to either supply an enterprise key or accept the cap.
Neo - PR Security ReviewNo security issues found Comment |
WalkthroughThe VirusTotal subdomain source now implements bounded pagination to prevent quota exhaustion on the free tier API. A fixed maximum of 10 pages (with 40 results per page) replaces the previous unbounded loop, and HTTP 429 responses are handled explicitly with improved messaging. ChangesVirusTotal Pagination Bounds
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/subscraping/sources/virustotal/virustotal.go (1)
88-93:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
defer resp.Body.Close()inside aforloop accumulates open bodies until the goroutine exits.Defers run at function exit, not at end of each loop iteration. With
defaultMaxPages = 10and a long cursor chain you can hold up to 10 HTTP response bodies (and their underlying TCP connections) open simultaneously, even though only one is being read at any moment. Close per iteration instead.♻️ Proposed fix — close the body explicitly at the end of each iteration
resp, err := session.Get(ctx, url, "", map[string]string{"x-apikey": randomApiKey}) if err != nil { if resp != nil && resp.StatusCode == http.StatusTooManyRequests { results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: fmt.Errorf("virustotal free-tier quota exhausted (HTTP 429); some subdomains for %s may be missing", domain)} } else { results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} } s.errors++ session.DiscardHTTPResponse(resp) return } - defer func() { - if err := resp.Body.Close(); err != nil { - results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} - s.errors++ - } - }() var data response err = jsoniter.NewDecoder(resp.Body).Decode(&data) + if cerr := resp.Body.Close(); cerr != nil { + results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: cerr} + s.errors++ + } if err != nil { results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} s.errors++ return }Alternatively, wrap each iteration in an anonymous function so the
deferfires per page.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/subscraping/sources/virustotal/virustotal.go` around lines 88 - 93, The defer resp.Body.Close() inside the loop (see resp and resp.Body.Close() in virustotal.go) holds response bodies until the goroutine exits; replace the deferred close with an explicit resp.Body.Close() call at the end of each loop iteration (after you've finished reading/parsing the response) or wrap the per-page logic in an anonymous function so the defer runs per-iteration; ensure you still send a subscraping.Result error on close failure (the current results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err} handling should remain).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@pkg/subscraping/sources/virustotal/virustotal.go`:
- Around line 88-93: The defer resp.Body.Close() inside the loop (see resp and
resp.Body.Close() in virustotal.go) holds response bodies until the goroutine
exits; replace the deferred close with an explicit resp.Body.Close() call at the
end of each loop iteration (after you've finished reading/parsing the response)
or wrap the per-page logic in an anonymous function so the defer runs
per-iteration; ensure you still send a subscraping.Result error on close failure
(the current results <- subscraping.Result{Source: s.Name(), Type:
subscraping.Error, Error: err} handling should remain).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: fa375abc-affc-4d3d-8f2b-d16780f49539
📒 Files selected for processing (2)
pkg/subscraping/sources/virustotal/virustotal.gopkg/subscraping/sources/virustotal/virustotal_test.go
Closes #1718.
What the OP saw
Running
subfinder -nW -dL FILE -s virustotal -statsagainst ~110 hosts ran the VT free-tier 500-req/day quota dry after roughly twenty hosts. Every subsequent VT call returned HTTP 429, which the source surfaced asunexpected status code 429 received from ...with no indication that the free tier was the cause.The reason a single host can chew through so many requests is the cursor loop:
limit=40per page with no upper bound on pagination. A single popular root domain with thousands of subdomains will issue 50+ requests on its own.Fix
defaultMaxPages), giving up to 400 subdomains per domain on the default code path. That puts a single host at well under 1% of the daily quota even in the worst case.virustotal free-tier quota exhausted (HTTP 429); some subdomains for <domain> may be missingso the operator knows what happened and can either provide an enterprise key or accept the cap.Mirrors the pattern used in #1776 for netlas's Community-tier cap.
Tests
pkg/subscraping/sources/virustotal/virustotal_test.go(5 cases):Metadata/NoApiKeybaseline coveragePaginationCappedOnFreeTiermocks an endpoint that always returns a non-empty cursor and asserts the source stops at exactlydefaultMaxPagesrequestsStopsWhenCursorEmptyasserts the cursor terminator still wins when reached before the capRateLimitedReturnsActionableErrormocks a 429 response and asserts the new "quota exhausted" message is surfaced and pagination stops on the first 429Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Tests