Skip to content

feat: add URL Inspector PG API endpoints | LLMO-4030#2113

Open
JayKid wants to merge 19 commits intomainfrom
feat/find-way-to-store-retrieve-LLMO-4030
Open

feat: add URL Inspector PG API endpoints | LLMO-4030#2113
JayKid wants to merge 19 commits intomainfrom
feat/find-way-to-store-retrieve-LLMO-4030

Conversation

@JayKid
Copy link
Copy Markdown
Contributor

@JayKid JayKid commented Apr 2, 2026

Summary

  • 6 API endpoints calling URL Inspector RPCs in mysticat-data-service:
    • GET .../url-inspector/stats — aggregate stats + weekly sparklines
    • GET .../url-inspector/owned-urls — paginated owned URL citations
    • GET .../url-inspector/trending-urls — paginated non-owned URLs (groups flat RPC rows by URL)
    • GET .../url-inspector/cited-domains — domain-level aggregations
    • GET .../url-inspector/domain-urls — paginated URLs within a domain (with urlId, promptsCited, categories, regions)
    • GET .../url-inspector/url-prompts — prompt breakdown for a specific URL
  • Exports 5 shared utilities from llmo-brand-presence.js for reuse
  • Unit tests for all handlers

Key changes since initial PR

  • domain-urls response now includes urlId, promptsCited, categories, and regions fields from the enriched RPC
  • url-prompts endpoint added for Phase 3 drilldown (prompt analysis per URL)

Related PRs

Test plan

  • Unit tests pass (lint + syntax checks confirmed; full test suite requires Node 24)
  • Verified end-to-end via local PostgREST → dev Aurora
  • Full-stack validation: UI → API → PostgREST → Aurora with enriched domain-urls response
  • Deploy to dev and verify via deployed UI

JayKid and others added 2 commits April 2, 2026 14:29
4 new endpoints calling the URL Inspector RPCs in mysticat-data-service:
- GET .../url-inspector/stats — aggregate stats + weekly sparklines
- GET .../url-inspector/owned-urls — paginated owned URL citations
- GET .../url-inspector/trending-urls — paginated non-owned URLs
- GET .../url-inspector/cited-domains — domain-level aggregations

Exports shared utilities from llmo-brand-presence.js for reuse.

LLMO-4030

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

This PR will trigger a minor release when merged.

JayKid added 6 commits April 15, 2026 15:29
…e-retrieve-LLMO-4030

Made-with: Cursor

# Conflicts:
#	src/controllers/llmo/llmo-brand-presence.js
#	src/controllers/llmo/llmo-mysticat-controller.js
…O-4030

Add urlId field to domain-urls handler response mapping so the UI can
pass it to the url-prompts endpoint for Phase 3 drilldown.

Made-with: Cursor
… LLMO-4030

Replace broken parseBody(response) helper with standard response.json()
to match the Web Response API used by spacecat-shared-http-utils. Add
comprehensive tests for all error paths (model validation, RPC errors,
missing params, site-org validation) and null-field handling to reach
100% line/branch/statement/function coverage.

Made-with: Cursor
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

…- LLMO-4030

Map the new prompts_cited, categories, and regions fields from the
enriched rpc_url_inspector_domain_urls RPC into the API response.

Made-with: Cursor
@calvarezg
Copy link
Copy Markdown
Contributor

Code Review

Nice work on this PR — the architecture is clean, the plan doc is excellent, and the test coverage is thorough. A few items to flag:

Issues

1. Stale JSDoc on createUrlInspectorCitedDomainsHandler

The comment says "No pagination — domain count per site is bounded" but the handler actually implements pagination via parsePaginationParams and passes p_limit/p_offset to the RPC. The plan doc (section 5) confirms this was intentionally changed to be paginated. The JSDoc should be updated to match.

2. domain-urls and url-prompts don't pass brandId filter

The domain-urls handler and url-prompts handler only destructure spaceCatId from ctx.params — they skip brandId entirely and don't pass p_brand_id to the RPC. However, both have routes registered with :brandId variants. The other 4 handlers all pass p_brand_id. If the RPCs accept this parameter, this is a bug that would return unfiltered-by-brand data during drilldown. If the RPCs don't support it, the :brandId route variants are misleading.

3. domain-urls doesn't pass p_category / p_region filters

The domain-urls RPC call omits category and region filters that the parent endpoints (cited-domains, stats) do pass. If a user has filtered by category/region at the domain level and then drills into a domain, those filters would silently be dropped. Same applies to url-prompts which also omits category/region. This could cause user confusion when drilldown results don't match the parent view. If the RPCs don't support these params, a comment explaining why would help.

4. RPC errors exposed to clients

All handlers return badRequest(error.message) for PostgREST/RPC errors. This could leak internal PostgreSQL error details to API consumers. Consider returning a generic "Internal error processing request" message and keeping the detailed error only in the log.

Minor

  • eslint-disable-next-line max-len added for withBrandPresenceAuth export — consider splitting the function signature across lines instead of suppressing the lint rule.
  • cited-domains test happy path: the test data doesn't include a total_count field in the RPC response rows, so totalCount will be 0 even though 2 domains are returned. Add total_count: 2 to the test data to verify the pagination metadata path.
  • Trending URLs grouping with null URL: if the RPC returns rows where url is null, they all group under a single null key in the Map, producing one URL entry with url: "" that aggregates all null rows. Verify this is desired vs. filtering them out.

JayKid added 6 commits April 16, 2026 16:14
- Fix stale JSDoc on cited-domains handler (now paginated)
- Document why domain-urls and url-prompts don't pass brandId/category/region
  (underlying RPCs don't accept these params; filtering is at parent level)
- Return internalServerError for RPC failures instead of leaking raw
  PostgreSQL error messages to clients (details remain in server logs)
- Add total_count to cited-domains test happy path data
- Filter out null URL rows in trending-urls grouping

Made-with: Cursor
Summary-table RPCs no longer accept p_brand_id (brand_id is not in
the summary table). Update handlers and tests accordingly.

Made-with: Cursor
Add test for trending URL rows with valid url but null content_type,
prompt, category, region, topics, citation_count, execution_count,
and total_non_owned_urls to reach 100% branch coverage.

Made-with: Cursor
Document the 6 URL Inspector API endpoints under the org-scoped
brand-presence path:
- stats: aggregate citation stats + weekly sparklines
- owned-urls: paginated owned URL citations with WoW trends
- trending-urls: paginated non-owned URLs grouped by URL
- cited-domains: domain-level citation aggregations
- domain-urls: Phase 2 drilldown into domain URLs
- url-prompts: Phase 3 drilldown into prompts per URL

Adds response schemas and path references. Validated with docs:lint
and docs:build.

Made-with: Cursor
…e-retrieve-LLMO-4030

Resolve merge conflict in docs/index.html by regenerating from
merged OpenAPI sources.

Made-with: Cursor
Addresses review feedback asking for brandalf-style .md docs for the new
URL Inspector endpoints so AI / on-call engineers have a reliable reference.

Adds one doc per endpoint (stats, owned-urls, trending-urls, cited-domains,
domain-urls, url-prompts) plus a consolidated url-inspector-apis-overview.md,
and links the new overview from brand-presence-apis-overview.md. Each doc
covers path shape, query params (incl. aliases and defaults), RPC signature
with conceptual SQL, response shape, sample URLs, error responses, and
auth/access expectations.

Made-with: Cursor
JayKid added 4 commits April 21, 2026 16:37
…-4030

Replace the single rpc_url_inspector_stats call with a Promise.all fanout
across the four per-KPI RPCs that land in mysticat-data-service
(total_prompts, total_prompts_cited, unique_urls, total_citations). The
response shape is unchanged; the controller unions weeks across the four
streams and keeps missing metrics at 0.

Thread ctx.params.brandId through as p_brand_id on every call (with
'all' -> NULL), matching the pattern the other URL Inspector handlers
already use. This enables proper brand scoping, which the old
summary-table RPC could not support.

End-to-end latency is now max-of-four across the RPCs instead of sum-of-
four (or the 60s+ of the broken summary RPC), so ~1.5s warm on adobe.com
28-day without a brand filter, ~3.5s with one (bounded by total_prompts).

Also:
- Wrap Promise.all in try/catch so thrown exceptions log and return 500
  cleanly instead of crashing the handler.
- Enrich the RPC-error log with PostgREST code/details/hint so the next
  failure is easier to diagnose.
- Rewrite the stats tests to target the four split RPCs, add a
  statsRpcResults() helper, and add coverage for brandId threading,
  parallel fanout, and the union-of-weeks reassembly.
- Rewrite the URL Inspector Stats API doc to describe the fanout, the
  p_brand_id semantics, and the plan-shape rule.

See mysticat-data-service/docs/plans/2026-04-02-url-inspector-performance.md
Experiment 6 for the rationale and benchmarks.

Made-with: Cursor
…dler - LLMO-4030

The stats handler's try/catch around Promise.all and the
code/details/hint string-building in the RPC-error log were uncovered,
dropping coverage below the 100% global threshold. Add four focused
tests: one for message-only PostgREST errors (exercises the falsy
branches of the code/details/hint ternaries), one with all three
populated (truthy branches), one for a rejected Promise with an Error
instance, and one for a bare-string rejection (exercises the
`e?.message || e` fallback).

Made-with: Cursor
Resolve docs/index.html conflict by regenerating via npm run docs:build
against the merged docs/openapi/ sources (auto-merged cleanly). All
other files auto-merged without conflicts.

Made-with: Cursor
The pinned v1.56.0 tag has been evicted from ECR (ECR lifecycle
retention), causing ci/it-postgres to fail on both main and this branch
with "manifest unknown". Bump to v1.67.8 (released 2026-04-21, latest
available image) to restore integration tests.

This is a fix for a pre-existing main-branch issue that surfaced on
this PR after the last merge; landing it here unblocks both branches.

LLMO-4030

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants