Skip to content

fix(sources/virustotal): cap pagination + clearer 429 message#1788

Open
ChrisJr404 wants to merge 1 commit into
projectdiscovery:devfrom
ChrisJr404:fix/virustotal-free-tier-limits-1718
Open

fix(sources/virustotal): cap pagination + clearer 429 message#1788
ChrisJr404 wants to merge 1 commit into
projectdiscovery:devfrom
ChrisJr404:fix/virustotal-free-tier-limits-1718

Conversation

@ChrisJr404
Copy link
Copy Markdown

@ChrisJr404 ChrisJr404 commented May 6, 2026

Closes #1718.

What the OP saw

Running subfinder -nW -dL FILE -s virustotal -stats against ~110 hosts ran the VT free-tier 500-req/day quota dry after roughly twenty hosts. Every subsequent VT call returned HTTP 429, which the source surfaced as unexpected status code 429 received from ... with no indication that the free tier was the cause.

The reason a single host can chew through so many requests is the cursor loop: limit=40 per page with no upper bound on pagination. A single popular root domain with thousands of subdomains will issue 50+ requests on its own.

Fix

  • Bound pagination at 10 pages (defaultMaxPages), giving up to 400 subdomains per domain on the default code path. That puts a single host at well under 1% of the daily quota even in the worst case.
  • Detect HTTP 429 explicitly and emit virustotal free-tier quota exhausted (HTTP 429); some subdomains for <domain> may be missing so the operator knows what happened and can either provide an enterprise key or accept the cap.

Mirrors the pattern used in #1776 for netlas's Community-tier cap.

Tests

pkg/subscraping/sources/virustotal/virustotal_test.go (5 cases):

  • Metadata / NoApiKey baseline coverage
  • PaginationCappedOnFreeTier mocks an endpoint that always returns a non-empty cursor and asserts the source stops at exactly defaultMaxPages requests
  • StopsWhenCursorEmpty asserts the cursor terminator still wins when reached before the cap
  • RateLimitedReturnsActionableError mocks a 429 response and asserts the new "quota exhausted" message is surfaced and pagination stops on the first 429
$ go test ./pkg/subscraping/sources/virustotal/...
ok    github.com/projectdiscovery/subfinder/v2/pkg/subscraping/sources/virustotal    0.023s
$ go build ./... && go vet ./...
$

Summary by CodeRabbit

Release Notes

  • New Features

    • VirusTotal subdomain lookups now support paginated requests with built-in quota protection and request limits.
    • Added tracking for request counts and performance metrics.
  • Bug Fixes

    • Improved error handling for rate-limited API responses with actionable messaging.
  • Tests

    • Added comprehensive test suite covering pagination, rate limiting, and API integration scenarios.

The free VirusTotal API returns 40 results per page and grants 500
requests/day. Following the cursor blindly on a single popular root
domain can therefore exhaust the daily quota in one invocation, after
which every other VT call in the run gets HTTP 429 wrapped as a generic
"unexpected status code 429" with no hint that the cause is the free
tier (projectdiscovery#1718).

Cap the pagination at 10 pages (~400 subdomains) by default so a single
domain can no longer monopolise the quota, and surface a dedicated
"virustotal free-tier quota exhausted" error when 429 is observed so
operators know to either supply an enterprise key or accept the cap.
@neo-by-projectdiscovery-dev
Copy link
Copy Markdown

neo-by-projectdiscovery-dev Bot commented May 6, 2026

Neo - PR Security Review

No security issues found

Comment @pdneo help for available commands. · Open in Neo

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 6, 2026

Walkthrough

The VirusTotal subdomain source now implements bounded pagination to prevent quota exhaustion on the free tier API. A fixed maximum of 10 pages (with 40 results per page) replaces the previous unbounded loop, and HTTP 429 responses are handled explicitly with improved messaging.

Changes

VirusTotal Pagination Bounds

Layer / File(s) Summary
Constants & Imports
pkg/subscraping/sources/virustotal/virustotal.go
New constants pageLimit (40) and defaultMaxPages (10) introduced; imports updated to include net/http and time for request and timing support.
Core Pagination Logic
pkg/subscraping/sources/virustotal/virustotal.go
Subdomain enumeration loop replaced with bounded pagination: iterates up to defaultMaxPages, constructs per-page URLs with cursor support, sends requests, handles HTTP 429 (quota exhausted) with actionable error messaging, and decodes/streams subdomains while tracking cursor and respecting context cancellation.
Test Coverage
pkg/subscraping/sources/virustotal/virustotal_test.go
Comprehensive test suite added covering metadata validation, missing API key behavior, pagination cap enforcement, cursor exhaustion handling, and rate-limit error messaging; includes test helpers (rewriteTransport, newTestSession, runSource) for simulating VirusTotal API interactions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Ten pages, forty items deep,
With cursors tracked and quotas we keep,
No endless loops to drain away,
The free tier's precious requests per day!
Rate limits caught with grace so clear,
A bounded feast for all to share. 🌱

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the main changes: capping pagination at 10 pages and improving the HTTP 429 error message to clarify quota exhaustion.
Linked Issues check ✅ Passed The PR fully addresses the requirements from issue #1718: caps pagination (10 pages = ~400 subdomains) to prevent quota exhaustion, detects HTTP 429 explicitly, and emits a clear message indicating free-tier quota exhaustion.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing VirusTotal pagination and 429 error handling; no extraneous modifications to unrelated code or functionality.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/subscraping/sources/virustotal/virustotal.go (1)

88-93: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

defer resp.Body.Close() inside a for loop accumulates open bodies until the goroutine exits.

Defers run at function exit, not at end of each loop iteration. With defaultMaxPages = 10 and a long cursor chain you can hold up to 10 HTTP response bodies (and their underlying TCP connections) open simultaneously, even though only one is being read at any moment. Close per iteration instead.

♻️ Proposed fix — close the body explicitly at the end of each iteration
 			resp, err := session.Get(ctx, url, "", map[string]string{"x-apikey": randomApiKey})
 			if err != nil {
 				if resp != nil && resp.StatusCode == http.StatusTooManyRequests {
 					results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: fmt.Errorf("virustotal free-tier quota exhausted (HTTP 429); some subdomains for %s may be missing", domain)}
 				} else {
 					results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}
 				}
 				s.errors++
 				session.DiscardHTTPResponse(resp)
 				return
 			}
-			defer func() {
-				if err := resp.Body.Close(); err != nil {
-					results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}
-					s.errors++
-				}
-			}()
 
 			var data response
 			err = jsoniter.NewDecoder(resp.Body).Decode(&data)
+			if cerr := resp.Body.Close(); cerr != nil {
+				results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: cerr}
+				s.errors++
+			}
 			if err != nil {
 				results <- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}
 				s.errors++
 				return
 			}

Alternatively, wrap each iteration in an anonymous function so the defer fires per page.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/subscraping/sources/virustotal/virustotal.go` around lines 88 - 93, The
defer resp.Body.Close() inside the loop (see resp and resp.Body.Close() in
virustotal.go) holds response bodies until the goroutine exits; replace the
deferred close with an explicit resp.Body.Close() call at the end of each loop
iteration (after you've finished reading/parsing the response) or wrap the
per-page logic in an anonymous function so the defer runs per-iteration; ensure
you still send a subscraping.Result error on close failure (the current results
<- subscraping.Result{Source: s.Name(), Type: subscraping.Error, Error: err}
handling should remain).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@pkg/subscraping/sources/virustotal/virustotal.go`:
- Around line 88-93: The defer resp.Body.Close() inside the loop (see resp and
resp.Body.Close() in virustotal.go) holds response bodies until the goroutine
exits; replace the deferred close with an explicit resp.Body.Close() call at the
end of each loop iteration (after you've finished reading/parsing the response)
or wrap the per-page logic in an anonymous function so the defer runs
per-iteration; ensure you still send a subscraping.Result error on close failure
(the current results <- subscraping.Result{Source: s.Name(), Type:
subscraping.Error, Error: err} handling should remain).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fa375abc-affc-4d3d-8f2b-d16780f49539

📥 Commits

Reviewing files that changed from the base of the PR and between 0ed1b0b and 85e2354.

📒 Files selected for processing (2)
  • pkg/subscraping/sources/virustotal/virustotal.go
  • pkg/subscraping/sources/virustotal/virustotal_test.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Limiting Results With the Free Virustotal API

1 participant