Skip to content

Sitemap Extractor: Sitemap discovery fails when proxy is blocked; add fallback without proxy #240

@nikitachapovskii-dev

Description

@nikitachapovskii-dev

When running the actor with Apify Proxy enabled for some domains (for example https://www.ekklisiaonline.gr/), the run fails early with No valid sitemaps were discovered from the provided startUrls. even though the site may have valid sitemap endpoints.

The likely cause is that sitemap discovery is performed through a proxy URL and some sites bloc proxy IPs, HEAD requests, or access to common sitemap paths from that network path. In that scenario, discovery returns an empty result and the actor fails immediately. If I try without proxy then run passes.

Suggested fix: if discovery through proxy returns empty (or times out due to network/proxy-related conditions), retry sitemap discovery once without proxy before failing.

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions