Skip to content

feat(llmo): route URL Inspector citation data through Semrush BP API (LLMO-5180)#2588

Open
claudiaboldis wants to merge 7 commits into
mainfrom
feat/llmo-5180-url-inspector-semrush-bp
Open

feat(llmo): route URL Inspector citation data through Semrush BP API (LLMO-5180)#2588
claudiaboldis wants to merge 7 commits into
mainfrom
feat/llmo-5180-url-inspector-semrush-bp

Conversation

@claudiaboldis

Copy link
Copy Markdown

Summary

  • Routes URL Inspector citation data (citations, promptsCited) through Semrush BP reporting API via the Serenity proxy, while agentic and referral traffic stay in Mysticat
  • Falls back to existing Mysticat RPC behaviour for orgs without a Semrush workspace (non-Semrush customers unaffected)
  • Confirms the URL-contains filter (col: "source", op: "contains") that was the open blocker for this ticket

Changes

  • src/support/serenity/rest-transport.js — new queryBrandPresenceResults() transport method
  • src/support/serenity/handlers/brand-presence.js (new) — fans out BP citation queries across all project slices, merges results
  • src/controllers/llmo/llmo-url-inspector.js — stats, owned-urls, and url-prompts handlers now source citation data from Semrush when semrushWorkspaceId is present
  • test/controllers/llmo/llmo-url-inspector.test.js — 11 new tests; all 97 pass

Env var required

SEMRUSH_BP_ELEMENT_ID must be set in Vault (dx_mysticat/<env>/api-service) before deploying — identifies the BP reporting element in the Semrush workspace. Obtain from Semrush team.

Test plan

  • All 97 existing + new unit tests pass (npm test)
  • Null-workspace path verified — non-Semrush customers see identical behaviour
  • Manual test against dev with a Semrush-onboarded org: stats, owned-urls, and URL drilldown show Semrush-sourced citation counts
  • Agentic + referral columns on owned-urls table unchanged

Jira: LLMO-5180

🤖 Generated with Claude Code

@codecov

codecov Bot commented Jun 10, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@github-actions

Copy link
Copy Markdown

This PR will trigger a minor release when merged.

@MysticatBot MysticatBot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @claudiaboldis,

Verdict: Request changes - solid integration with clean fallback semantics, but the owned-urls fan-out has an error isolation gap and the domain filter parameter looks semantically wrong.
Changes: Routes URL Inspector citation data (stats, owned-urls, url-prompts) through the Semrush BP reporting API via Serenity proxy, with graceful fallback to Mysticat for non-onboarded orgs (6 files).

Must fix before merge

  1. [Important] Unbounded fan-out with no per-item error isolation in owned-urls handler - src/controllers/llmo/llmo-url-inspector.js:370 (details inline)
  2. [Important] Full URL passed as domain to a CBF_domain eq filter - potential semantic mismatch - src/controllers/llmo/llmo-url-inspector.js:226 (details inline)
  3. [Important] Broad catch swallows configuration errors (503 from missing env var) alongside transient failures - src/controllers/llmo/llmo-url-inspector.js:235 (details inline)
  4. [Important] Missing integration tests for modified endpoints - per CLAUDE.md rule: "New or modified endpoints must include integration tests in test/it/". The three URL Inspector endpoints now have a conditional Semrush data-sourcing path that is not exercised by any integration test.
Non-blocking (2): minor issues and suggestions
  • suggestion: Extract a shared helper for the repeated Semrush integration block (workspace check, token extraction, transport creation, query, catch/fallback) to reduce the three-location maintenance surface - src/controllers/llmo/llmo-url-inspector.js
  • nit: The new queryBrandPresenceResults method skips API_PREFIX and uses a hardcoded /apis/v4-raw/external-api/v1/ path unlike all other transport methods. Add a named constant so the divergence is visible when proxy routing changes - src/support/serenity/rest-transport.js:304

Skill: pr-review | Model: us.anthropic.claude-opus-4-6-v1[1m] | Duration: 3m 5s | Cost: $4.20 | Commit: 4e7c47ada72fa4013f842cfefd36ee40e5d28f6a
If this code review was useful, please react with 👍. Otherwise, react with 👎.

try {
const imsToken = ctx?.pathInfo?.headers?.authorization?.replace(/^Bearer\s+/i, '');
const transport = createSerenityTransport({ env: ctx.env, imsToken });
const urlsWithSemrush = await Promise.all(

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (blocking): Unbounded fan-out with no per-item error isolation. The Promise.all fires one queryBpCitationsByUrl call per URL row, and each of those internally fans out across all BrandSemrushProject slices. For N URLs with M market slices, this is N*M concurrent HTTP calls per request. More critically, if ANY single URL's call fails, the entire Promise.all rejects and the outer catch discards ALL Semrush data for every URL - including the ones that succeeded.

How to fix: Add per-URL error handling inside the .map() callback so one failure preserves results for other URLs:

urlsFromMysticat.map(async (urlRow) => {
  try {
    const bpResult = await queryBpCitationsByUrl(...);
    return bpResult ? { ...urlRow, citations: bpResult.citations, promptsCited: bpResult.promptsCited } : urlRow;
  } catch {
    return urlRow;
  }
})

Also consider adding a concurrency limiter (e.g. p-limit) for large URL sets.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: added per-URL try/catch inside the .map() callback so one failed URL preserves Mysticat data for that row while others proceed. Re-throws status===503 errors upward. Also moved imsToken/transport construction outside the Promise.all (shared across rows).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: added per-URL try/catch inside the .map() callback so one failed URL preserves Mysticat data for that row while others proceed. Re-throws status===503 errors upward. Also moved imsToken/transport construction outside Promise.all (shared across rows).

filterByBrandId,
semrushWorkspaceId,
ctx.env,
{

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (blocking): Full URL passed as domain to a CBF_domain eq filter. In all three handlers, the domain field receives a full URL (e.g. https://example.com/page) or a site ID - not a bare domain. This value flows into buildRenderData as { op: "eq", val: domain, col: "CBF_domain" }. If the upstream Semrush API's CBF_domain column stores actual domains, this equality filter will never match and citations will silently return zero (no error thrown, just empty results).

The PR description confirms the source contains filter was validated, but does not mention verifying the CBF_domain eq filter.

How to fix: Verify with the Semrush API what CBF_domain expects. If it is a bare domain, extract it: new URL(urlRow.url).hostname. If it expects a full URL, rename the parameter to avoid confusion.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: now extracting hostname via new URL(url).hostname for the CBF_domain eq filter in all three handlers. Falls back to the raw value if the URL parse fails (e.g. for non-URL siteIds). urlFragment still passes the full URL for the source contains filter.

);
if (bpResult) {
stats.totalCitations = bpResult.citations;
stats.totalPromptsCited = bpResult.promptsCited;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (blocking): Catch block swallows configuration errors alongside transient failures. The ErrorWithStatusCode(503) thrown by resolveElementId when SEMRUSH_BP_ELEMENT_ID is missing signals a permanent deployment misconfiguration - not a transient upstream failure. Catching it here means every request from a Semrush-onboarded org silently returns stale Mysticat data with only a log line distinguishing it from a network glitch.

How to fix: Either check for config errors and re-throw:

} catch (e) {
  if (e instanceof ErrorWithStatusCode && e.status === 503) throw e;
  ctx.log.error(...);
}

Or call resolveElementId(ctx.env) outside the try/catch so it fails fast before attempting the query. The same pattern applies to the other two handlers.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: added 'if (e?.status === 503) throw e;' before the log+fallback in all three handlers, so missing SEMRUSH_BP_ELEMENT_ID surfaces as a 503 rather than silently returning Mysticat data.

@MysticatBot MysticatBot added the ai-reviewed Reviewed by AI label Jun 11, 2026
@claudiaboldis claudiaboldis force-pushed the feat/llmo-5180-url-inspector-semrush-bp branch from 40eb5f2 to c688399 Compare June 18, 2026 08:44
cboldis and others added 6 commits July 2, 2026 19:37
…(LLMO-5180)

- Add queryBrandPresenceResults to SerenityTransport for the BP reporting endpoint
- Add src/support/serenity/handlers/brand-presence.js with queryBpCitationsByUrl that
  fans out across all BrandSemrushProject slices and merges citation/prompt results
- Modify stats, owned-urls, and url-prompts handlers: when the org has a Semrush
  workspace and a brand is scoped, fetch citation data from Semrush; fall back to
  Mysticat RPCs when workspace is null or Semrush throws
- Agentic + referral traffic columns remain sourced from Mysticat throughout
- 11 new esmock-based integration tests covering Semrush path, null-workspace
  fallback, and error fallback for all three modified handlers

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…sh integration

- Extract hostname from full URL for CBF_domain eq filter (domain != full URL)
- Add per-URL error isolation in owned-urls fan-out so one failure preserves other rows
- Re-throw ErrorWithStatusCode(503) config errors instead of falling back silently
- Add tests: 503 re-throw, per-URL isolation, hostname extraction verification

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@claudiaboldis claudiaboldis force-pushed the feat/llmo-5180-url-inspector-semrush-bp branch from c770e8c to 6bb2e28 Compare July 2, 2026 16:38
…ved enc/request helpers

The file was refactored to use typed openapi-fetch clients, removing the
hand-rolled enc() and request() helpers. The v4-raw endpoint is outside
the typed client's path space so use createTimeoutFetch + native fetch
directly instead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-reviewed Reviewed by AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants