feat(llmo): route URL Inspector citation data through Semrush BP API (LLMO-5180)#2588
feat(llmo): route URL Inspector citation data through Semrush BP API (LLMO-5180)#2588claudiaboldis wants to merge 7 commits into
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
This PR will trigger a minor release when merged. |
There was a problem hiding this comment.
Hey @claudiaboldis,
Verdict: Request changes - solid integration with clean fallback semantics, but the owned-urls fan-out has an error isolation gap and the domain filter parameter looks semantically wrong.
Changes: Routes URL Inspector citation data (stats, owned-urls, url-prompts) through the Semrush BP reporting API via Serenity proxy, with graceful fallback to Mysticat for non-onboarded orgs (6 files).
Must fix before merge
- [Important] Unbounded fan-out with no per-item error isolation in owned-urls handler -
src/controllers/llmo/llmo-url-inspector.js:370(details inline) - [Important] Full URL passed as
domainto aCBF_domain eqfilter - potential semantic mismatch -src/controllers/llmo/llmo-url-inspector.js:226(details inline) - [Important] Broad catch swallows configuration errors (503 from missing env var) alongside transient failures -
src/controllers/llmo/llmo-url-inspector.js:235(details inline) - [Important] Missing integration tests for modified endpoints - per CLAUDE.md rule: "New or modified endpoints must include integration tests in
test/it/". The three URL Inspector endpoints now have a conditional Semrush data-sourcing path that is not exercised by any integration test.
Non-blocking (2): minor issues and suggestions
- suggestion: Extract a shared helper for the repeated Semrush integration block (workspace check, token extraction, transport creation, query, catch/fallback) to reduce the three-location maintenance surface -
src/controllers/llmo/llmo-url-inspector.js - nit: The new
queryBrandPresenceResultsmethod skipsAPI_PREFIXand uses a hardcoded/apis/v4-raw/external-api/v1/path unlike all other transport methods. Add a named constant so the divergence is visible when proxy routing changes -src/support/serenity/rest-transport.js:304
Skill: pr-review | Model: us.anthropic.claude-opus-4-6-v1[1m] | Duration: 3m 5s | Cost: $4.20 | Commit: 4e7c47ada72fa4013f842cfefd36ee40e5d28f6a
If this code review was useful, please react with 👍. Otherwise, react with 👎.
| try { | ||
| const imsToken = ctx?.pathInfo?.headers?.authorization?.replace(/^Bearer\s+/i, ''); | ||
| const transport = createSerenityTransport({ env: ctx.env, imsToken }); | ||
| const urlsWithSemrush = await Promise.all( |
There was a problem hiding this comment.
issue (blocking): Unbounded fan-out with no per-item error isolation. The Promise.all fires one queryBpCitationsByUrl call per URL row, and each of those internally fans out across all BrandSemrushProject slices. For N URLs with M market slices, this is N*M concurrent HTTP calls per request. More critically, if ANY single URL's call fails, the entire Promise.all rejects and the outer catch discards ALL Semrush data for every URL - including the ones that succeeded.
How to fix: Add per-URL error handling inside the .map() callback so one failure preserves results for other URLs:
urlsFromMysticat.map(async (urlRow) => {
try {
const bpResult = await queryBpCitationsByUrl(...);
return bpResult ? { ...urlRow, citations: bpResult.citations, promptsCited: bpResult.promptsCited } : urlRow;
} catch {
return urlRow;
}
})Also consider adding a concurrency limiter (e.g. p-limit) for large URL sets.
There was a problem hiding this comment.
Fixed: added per-URL try/catch inside the .map() callback so one failed URL preserves Mysticat data for that row while others proceed. Re-throws status===503 errors upward. Also moved imsToken/transport construction outside the Promise.all (shared across rows).
There was a problem hiding this comment.
Fixed: added per-URL try/catch inside the .map() callback so one failed URL preserves Mysticat data for that row while others proceed. Re-throws status===503 errors upward. Also moved imsToken/transport construction outside Promise.all (shared across rows).
| filterByBrandId, | ||
| semrushWorkspaceId, | ||
| ctx.env, | ||
| { |
There was a problem hiding this comment.
issue (blocking): Full URL passed as domain to a CBF_domain eq filter. In all three handlers, the domain field receives a full URL (e.g. https://example.com/page) or a site ID - not a bare domain. This value flows into buildRenderData as { op: "eq", val: domain, col: "CBF_domain" }. If the upstream Semrush API's CBF_domain column stores actual domains, this equality filter will never match and citations will silently return zero (no error thrown, just empty results).
The PR description confirms the source contains filter was validated, but does not mention verifying the CBF_domain eq filter.
How to fix: Verify with the Semrush API what CBF_domain expects. If it is a bare domain, extract it: new URL(urlRow.url).hostname. If it expects a full URL, rename the parameter to avoid confusion.
There was a problem hiding this comment.
Fixed: now extracting hostname via new URL(url).hostname for the CBF_domain eq filter in all three handlers. Falls back to the raw value if the URL parse fails (e.g. for non-URL siteIds). urlFragment still passes the full URL for the source contains filter.
| ); | ||
| if (bpResult) { | ||
| stats.totalCitations = bpResult.citations; | ||
| stats.totalPromptsCited = bpResult.promptsCited; |
There was a problem hiding this comment.
issue (blocking): Catch block swallows configuration errors alongside transient failures. The ErrorWithStatusCode(503) thrown by resolveElementId when SEMRUSH_BP_ELEMENT_ID is missing signals a permanent deployment misconfiguration - not a transient upstream failure. Catching it here means every request from a Semrush-onboarded org silently returns stale Mysticat data with only a log line distinguishing it from a network glitch.
How to fix: Either check for config errors and re-throw:
} catch (e) {
if (e instanceof ErrorWithStatusCode && e.status === 503) throw e;
ctx.log.error(...);
}Or call resolveElementId(ctx.env) outside the try/catch so it fails fast before attempting the query. The same pattern applies to the other two handlers.
There was a problem hiding this comment.
Fixed: added 'if (e?.status === 503) throw e;' before the log+fallback in all three handlers, so missing SEMRUSH_BP_ELEMENT_ID surfaces as a 503 rather than silently returning Mysticat data.
40eb5f2 to
c688399
Compare
…(LLMO-5180) - Add queryBrandPresenceResults to SerenityTransport for the BP reporting endpoint - Add src/support/serenity/handlers/brand-presence.js with queryBpCitationsByUrl that fans out across all BrandSemrushProject slices and merges citation/prompt results - Modify stats, owned-urls, and url-prompts handlers: when the org has a Semrush workspace and a brand is scoped, fetch citation data from Semrush; fall back to Mysticat RPCs when workspace is null or Semrush throws - Agentic + referral traffic columns remain sourced from Mysticat throughout - 11 new esmock-based integration tests covering Semrush path, null-workspace fallback, and error fallback for all three modified handlers Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…sh integration - Extract hostname from full URL for CBF_domain eq filter (domain != full URL) - Add per-URL error isolation in owned-urls fan-out so one failure preserves other rows - Re-throw ErrorWithStatusCode(503) config errors instead of falling back silently - Add tests: 503 re-throw, per-URL isolation, hostname extraction verification Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
c770e8c to
6bb2e28
Compare
…ved enc/request helpers The file was refactored to use typed openapi-fetch clients, removing the hand-rolled enc() and request() helpers. The v4-raw endpoint is outside the typed client's path space so use createTimeoutFetch + native fetch directly instead. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
col: "source", op: "contains") that was the open blocker for this ticketChanges
src/support/serenity/rest-transport.js— newqueryBrandPresenceResults()transport methodsrc/support/serenity/handlers/brand-presence.js(new) — fans out BP citation queries across all project slices, merges resultssrc/controllers/llmo/llmo-url-inspector.js— stats, owned-urls, and url-prompts handlers now source citation data from Semrush whensemrushWorkspaceIdis presenttest/controllers/llmo/llmo-url-inspector.test.js— 11 new tests; all 97 passEnv var required
SEMRUSH_BP_ELEMENT_IDmust be set in Vault (dx_mysticat/<env>/api-service) before deploying — identifies the BP reporting element in the Semrush workspace. Obtain from Semrush team.Test plan
npm test)Jira: LLMO-5180
🤖 Generated with Claude Code