Commit 8b2260c
feat(seo-client): fan-out queries across markets using site region (#1522)
## Summary
The SEO provider requires a `database` (country) parameter on every API
call — unlike the previous provider (Ahrefs), there is no "global" mode.
Until now, all queries were hardcoded to `database=us`, which meant
non-US domains (e.g. `www.deceuninck.es`) returned little to no data.
This PR makes the SEO client locale-aware by accepting the site's
`region` (from `Site.getRegion()`, ISO 3166-1 alpha-2) and fanning out
queries across multiple markets to build a global traffic picture.
## Why fan-out instead of single-region queries?
The site's region tells us _where the site primarily operates_, but
organic traffic is not confined to a single country. A `.es` domain may
rank in `es`, `fr`, `it`, and `br` simultaneously. Querying only the
site's region would undercount traffic just as badly as querying only
`us`.
**Smoke test data for `adobe.com` (region=US):**
| Method | Single DB (us) | Fan-out (12 DBs) |
|--------|---------------|-------------------|
| `getTopPages` top traffic | 72,331,413 | 72,331,413 |
| `getMetrics` org_traffic | ~48M | **153,060,534** |
| `getMetrics` org_keywords | ~11.5M | **26,101,114** |
| `getPaidPages` top_keyword_country | always `US` | `UK`, `US`, `IN`
(actual source) |
For `www.deceuninck.es` (region=ES), the previous `database=us` returned
**zero results**. With fan-out, ES database returns 10 pages with
traffic data, and the other markets gracefully return nothing.
## What changed
### New: `fanOut(items, fn, operation)` resilience primitive
A single batched fan-out method (batch size 10) that all multi-market
methods now use — including `getBrokenBacklinks`, which previously had
its own inline batching loop. Each call to `fn(item)` already has
per-request retry with exponential backoff via `sendRawRequest`;
`fanOut` adds:
- Batched `Promise.allSettled` to respect rate limits
- Consistent `log.warn` for items that fail after all retries
- Fulfilled results collected with their key for downstream merge
### New: `getDatabases(region)` helper
Builds the query set: `BIG_MARKETS` + site region if not already
present.
`BIG_MARKETS = ['us', 'uk', 'de', 'fr', 'es', 'it', 'br', 'ca', 'au',
'in', 'jp', 'nl']` — 12 major SEO provider databases by search volume.
If the site's region is already in `BIG_MARKETS` (e.g. `ES`), no
duplication. If not (e.g. `CZ`), it's appended as a 13th database.
### Refactored: positional params → options bags
All updated methods now use a clean options bag instead of positional
parameters. Callers no longer need to pass `undefined` placeholders to
reach later parameters:
| Method | Signature |
|--------|-----------|
| `getTopPages` | `(url, { limit, region })` |
| `getPaidPages` | `(url, { date, limit, region })` |
| `getMetrics` | `(url, { date, region })` |
| `getOrganicTraffic` | `(url, { startDate, endDate, region })` |
| `getBrokenBacklinks` | No signature change (refactored to use
`fanOut`) |
| `getOrganicKeywords` | No change needed (already uses options bag) |
### Merge strategies
| Method | Merge strategy |
|--------|----------------|
| `getTopPages` | Sum `sum_traffic` per URL across DBs, first keyword
wins |
| `getPaidPages` | Sum traffic per URL, `top_keyword_country` reflects
actual DB |
| `getMetrics` | Sum all numeric fields across DBs |
| `getOrganicTraffic` | Group by date, sum all fields across DBs |
### Fixed: `lastMonthISO()` default date
`getMetrics` and `getPaidPages` previously defaulted to `todayISO()`,
but the SEO provider publishes monthly snapshots with a delay — the
current month has no data yet. Changed default to `lastMonthISO()` (1st
of previous month) so callers without an explicit date get the most
recent available data.
## How callers use the region
```js
const site = await dataAccess.getSiteById(siteId);
const region = site.getRegion(); // ISO 3166-1 alpha-2, e.g. 'ES', 'CZ', or null
const topPages = await seoClient.getTopPages(url, { limit: 200, region });
const metrics = await seoClient.getMetrics(url, { region });
const traffic = await seoClient.getOrganicTraffic(url, { startDate, endDate, region });
const paid = await seoClient.getPaidPages(url, { limit: 200, region });
```
All methods are backward-compatible — the options bag is optional and
defaults to querying only big markets.
## Test plan
- [x] Unit tests: 140 passing, `client.js` at 100%
lines/statements/functions, 97.5% branches
- [x] Smoke tested against live API with `adobe.com` (US) and
`www.deceuninck.es` (ES)
- [ ] Verify `getMetrics`/`getPaidPages` return data without explicit
date (lastMonthISO fix)
- [ ] Verify non-US domain returns data (was zero before)
- [ ] Verify `top_keyword_country` in `getPaidPages` reflects actual
market, not hardcoded US
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent f5ef97f commit 8b2260c
6 files changed
Lines changed: 734 additions & 250 deletions
0 commit comments