Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
041d166
feat: add URL Inspector PG API endpoints
JayKid Apr 2, 2026
a2548dd
feat: add domain URL drilldown and URL prompt breakdown endpoints - L…
JayKid Apr 15, 2026
d1ab5e4
feat: add pagination to cited-domains handler - LLMO-4030
JayKid Apr 15, 2026
7ac2d81
Merge remote-tracking branch 'origin/main' into feat/find-way-to-stor…
JayKid Apr 15, 2026
b2eaeed
fix: review
JayKid Apr 15, 2026
25e7823
fix: add curly braces to satisfy ESLint curly rule
JayKid Apr 15, 2026
4e622b7
fix: include urlId in domain-urls response for prompt drilldown - LLM…
JayKid Apr 16, 2026
4ca6921
fix: repair URL Inspector test assertions and achieve 100% coverage -…
JayKid Apr 16, 2026
8dbf8e1
feat: pass promptsCited, categories, regions in domain-urls response …
JayKid Apr 16, 2026
25e5919
fix: address PR review feedback from calvarezg
JayKid Apr 16, 2026
43b467d
fix: remove p_brand_id from stats and cited-domains RPC calls
JayKid Apr 16, 2026
8a52464
fix: cover null-field branches in trending-urls handler - LLMO-4030
JayKid Apr 16, 2026
801fe2b
feat: add URL Inspector endpoints to OpenAPI specification - LLMO-4030
JayKid Apr 16, 2026
73ea9f2
Merge remote-tracking branch 'origin/main' into feat/find-way-to-stor…
JayKid Apr 16, 2026
ac497ef
docs: add URL Inspector API reference docs - LLMO-4030
JayKid Apr 20, 2026
c39d694
refactor: fan out URL Inspector stats to four RPCs in parallel - LLMO…
JayKid Apr 21, 2026
59c9466
test: cover thrown-reject and code/details/hint branches in stats han…
JayKid Apr 21, 2026
3474af7
Merge branch 'main' into feat/find-way-to-store-retrieve-LLMO-4030
JayKid Apr 21, 2026
857212b
fix(ci): bump mysticat-data-service IT image from v1.56.0 to v1.67.8
JayKid Apr 21, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
470 changes: 421 additions & 49 deletions docs/index.html

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions docs/llmo-brandalf-apis/brand-presence-apis-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ Parameters are typically supplied as **query string** fields (merged into reques

Deep-dive docs: [filter-dimensions](filter-dimensions-api.md), [weeks](brand-presence-weeks-api.md), [sentiment-overview](sentiment-overview-api.md), [market-tracking-trends](market-tracking-trends-api.md), [topics & prompts](topics-api.md), [search](search-api.md), [topic detail](topic-detail-api.md), [prompt detail](prompt-detail-api.md), [sentiment-movers](sentiment-movers-api.md), [share-of-voice](share-of-voice-api.md), [stats](brand-presence-stats-api.md), [execution sources](execution-sources-api.md).

**URL Inspector APIs** (site-scoped, separate suite): see [url-inspector-apis-overview.md](url-inspector-apis-overview.md) — covers `stats`, `owned-urls`, `trending-urls`, `cited-domains`, `domain-urls`, `url-prompts`.

---

## Master table
Expand Down
181 changes: 181 additions & 0 deletions docs/llmo-brandalf-apis/url-inspector-apis-overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
# URL Inspector APIs — consolidated reference

Single entry point for all URL Inspector HTTP APIs backed by mysticat-data-service (PostgREST). These endpoints power the **URL Inspector** dashboard (Owned URLs, Trending URLs, Domains tabs, plus domain and URL drill-downs).

**Path pattern:** `GET /org/:spaceCatId/brands/{all|:brandId}/brand-presence/url-inspector/...`

- `:spaceCatId` — organization UUID.
- `all` — aggregate across brands; `:brandId` — filter to one brand (**only** applied by `owned-urls` and `trending-urls`; other endpoints ignore it because their RPC or summary table does not carry `brand_id`).

All URL Inspector endpoints are **site-scoped**. `siteId` is **required** on every call and is validated against the organization. Unlike the sibling `/brand-presence/*` endpoints, the model (`model` / `platform`) has **no default** — when omitted, no model filter is applied.

Deep-dive docs: [stats](url-inspector-stats-api.md), [owned-urls](url-inspector-owned-urls-api.md), [trending-urls](url-inspector-trending-urls-api.md), [cited-domains](url-inspector-cited-domains-api.md), [domain-urls](url-inspector-domain-urls-api.md), [url-prompts](url-inspector-url-prompts-api.md).

---

## Master table

| # | Method | API path | Purpose | Required query params | Optional query params | Detail doc |
|---|--------|----------|---------|-----------------------|-----------------------|------------|
| 1 | GET | `/org/:spaceCatId/brands/{all\|:brandId}/brand-presence/url-inspector/stats` | Aggregate citation stats + per-ISO-week sparkline trends from the summary table. | `siteId` | `startDate`, `endDate`, `model`, `categoryId`, `regionCode` | [url-inspector-stats-api.md](url-inspector-stats-api.md) |
| 2 | GET | `/org/:spaceCatId/brands/{all\|:brandId}/brand-presence/url-inspector/owned-urls` | Paginated owned URLs with per-URL citations, prompts, products, regions, and weekly arrays. | `siteId` | `startDate`, `endDate`, `model`, `categoryId`, `regionCode`, `page`, `pageSize` (default `50`). `:brandId` applied. | [url-inspector-owned-urls-api.md](url-inspector-owned-urls-api.md) |
| 3 | GET | `/org/:spaceCatId/brands/{all\|:brandId}/brand-presence/url-inspector/trending-urls` | Paginated non-owned URLs grouped by URL with a `prompts[]` breakdown per URL. | `siteId` | `startDate`, `endDate`, `model`, `categoryId`, `regionCode`, `channel`, `page`, `pageSize` (default `50`). `:brandId` applied. | [url-inspector-trending-urls-api.md](url-inspector-trending-urls-api.md) |
| 4 | GET | `/org/:spaceCatId/brands/{all\|:brandId}/brand-presence/url-inspector/cited-domains` | Paginated domain aggregation (total citations / URLs / prompts + dominant content type). | `siteId` | `startDate`, `endDate`, `model`, `categoryId`, `regionCode`, `channel`, `page`, `pageSize` (default `50`). `:brandId` **not** applied. | [url-inspector-cited-domains-api.md](url-inspector-cited-domains-api.md) |
| 5 | GET | `/org/:spaceCatId/brands/{all\|:brandId}/brand-presence/url-inspector/domain-urls` | Paginated URLs inside a single hostname. | `siteId`, `hostname` (alias `domain`) | `startDate`, `endDate`, `model`, `channel`, `page`, `pageSize` (default `50`). `:brandId` / category / region **not** applied. | [url-inspector-domain-urls-api.md](url-inspector-domain-urls-api.md) |
| 6 | GET | `/org/:spaceCatId/brands/{all\|:brandId}/brand-presence/url-inspector/url-prompts` | All prompts (with category, region, topics, citations) that cited a single URL. Unpaginated. | `siteId`, `urlId` (alias `url_id`) | `startDate`, `endDate`, `model`. `:brandId` / category / region **not** applied. | [url-inspector-url-prompts-api.md](url-inspector-url-prompts-api.md) |

---

## Data sources

Two backing stores, selected per RPC for performance reasons:

| Store | Used by | Why |
|-------|---------|-----|
| `url_inspector_domain_stats` (summary table, keyed by `site_id, execution_date, model, hostname, content_type`) | `stats`, `cited-domains` | 100× faster than raw tables on large sites. Carries pre-aggregated `unique_prompts`, `unique_urls`, `citation_count`, plus `categories TEXT[]` and `regions TEXT[]`. **No `brand_id`** — that's why brand filtering is not available on these endpoints. |
| `brand_presence_sources` + `brand_presence_executions` + `source_urls` (raw tables) | `owned-urls`, `trending-urls`, `domain-urls`, `url-prompts` | Need exact per-URL counts, per-prompt breakdowns, or ISO-week arrays that the summary does not carry. |

Summary-table endpoints return **approximate** `unique_urls` / `prompts_cited` — a URL or prompt appearing across multiple (hostname, date, model, content_type) groups is counted once per group. This is an accepted trade-off documented in migration `20260428120100_url_inspector_rpcs_summary_table.sql`.

---

## Common parameter semantics

- `model` / `platform` — when present, validated against the `llmo_execution_model` enum (`chatgpt`, `gemini`, `claude`, etc.) via `map_llmo_execution_model_input()`. When absent, the RPC applies **no** model filter (the SQL pattern is `v_platform IS NULL OR model = v_platform`).
- `channel` / `selectedChannel` — exact match on `brand_presence_sources.content_type` (`owned`, `earned`, `paid`, `partner`). `trending-urls` further hardcodes `content_type != 'owned'`.
- `categoryId` — summary-table endpoints match via `ANY(categories)` (array containment); raw-table endpoints match exactly on `brand_presence_executions.category_name`.
- `regionCode` — analogous (`ANY(regions)` vs exact `region_code`).
- Pagination — `parsePaginationParams(ctx, { defaultPageSize: 50 })`; `pageSize` is clamped to `[1, 1000]`. All paginated RPCs return a `total_count` (or `total_non_owned_urls` for trending) on every row; the controller reads it from the first row.

---

## Example responses

### 1. Stats

```json
{
"stats": { "totalPromptsCited": 312, "totalPrompts": 1250, "uniqueUrls": 187, "totalCitations": 964 },
"weeklyTrends": [
{ "week": "2026-W10", "totalPromptsCited": 48, "totalPrompts": 180, "uniqueUrls": 42, "totalCitations": 155 }
]
}
```

### 2. Owned URLs

```json
{
"urls": [
{
"url": "https://www.example.com/pdf-editor",
"citations": 42,
"promptsCited": 18,
"products": ["Acrobat"],
"regions": ["US", "GB"],
"weeklyCitations": [{ "week": "2026-W10", "value": 15 }],
"weeklyPromptsCited": [{ "week": "2026-W10", "value": 7 }]
}
],
"totalCount": 187
}
```

### 3. Trending URLs

```json
{
"urls": [
{
"url": "https://review-site.example.com/pdf-editors",
"contentType": "earned",
"totalCitations": 57,
"prompts": [
{
"prompt": "best pdf editor for mac",
"category": "Acrobat",
"region": "US",
"topics": "PDF Editing",
"citationCount": 32,
"executionCount": 4
}
]
}
],
"totalNonOwnedUrls": 412
}
```

### 4. Cited Domains

```json
{
"domains": [
{
"domain": "www.example.com",
"totalCitations": 128,
"totalUrls": 17,
"promptsCited": 63,
"contentType": "earned",
"categories": "Acrobat,Analytics",
"regions": "US,GB,DE"
}
],
"totalCount": 412
}
```

### 5. Domain URLs

```json
{
"urls": [
{
"urlId": "019cba12-b404-7077-9aa1-2992346a1767",
"url": "https://www.example.com/pdf-editor",
"contentType": "earned",
"citations": 42,
"promptsCited": 18,
"categories": "Acrobat,Analytics",
"regions": "US,GB"
}
],
"totalCount": 17
}
```

### 6. URL Prompts

```json
{
"prompts": [
{
"prompt": "best pdf editor for mac",
"category": "Acrobat",
"region": "US",
"topics": "PDF Editing",
"citations": 32
}
]
}
```

---

## Authentication & errors

- Protected by `withBrandPresenceAuth` + `getOrgAndValidateAccess`; requires LLMO product entitlement and org access.
- All endpoints require a `siteId` that belongs to the organization.
- Common errors: **400** (missing `siteId` / `hostname` / `urlId`, invalid `model`, RPC error); **403** (site not in org or no org access); **500** (RPC exception, logged with endpoint-specific message).
- Routes are defined in `INTERNAL_ROUTES` — not exposed to S2S consumers.

---

## Ticket context

Introduced to solve the "brand_presence_sources too big to retrieve" problem from the URL Inspector dashboard (JIRA: LLMO-4030). The summary table + dedicated RPCs replace previous direct PostgREST queries that timed out on large sites.

See related PRs:
- mysticat-data-service RPCs: [#194](https://github.com/adobe/mysticat-data-service/pull/194)
- spacecat-api-service backend: [#2012](https://github.com/adobe/spacecat-api-service/pull/2012) (+ follow-up)
- project-elmo-ui frontend: [#1304](https://github.com/adobe/project-elmo-ui/pull/1304) / [#1429](https://github.com/adobe/project-elmo-ui/pull/1429)
147 changes: 147 additions & 0 deletions docs/llmo-brandalf-apis/url-inspector-cited-domains-api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# URL Inspector Cited Domains API

Paginated domain-level citation aggregation: one row per hostname with total citations, distinct URLs, distinct prompts, dominant content type, plus comma-separated category/region breakdowns. Backs the "Domains" tab in the URL Inspector dashboard.

Computed by `rpc_url_inspector_cited_domains` against the `url_inspector_domain_stats` summary table (~1 s on large sites, same source as `stats`). Category and region strings are resolved via LATERAL joins on the **paginated** result rows only, so the aggregation cost scales with `pageSize`, not total domain count.

---

## API Paths

| Method | Path | Description |
|--------|------|-------------|
| GET | `/org/:spaceCatId/brands/all/brand-presence/url-inspector/cited-domains` | All brands for the site |
| GET | `/org/:spaceCatId/brands/:brandId/brand-presence/url-inspector/cited-domains` | Brand UUID accepted but **not applied** (summary table has no `brand_id`) |

---

## Scope

Like `stats`, this endpoint is **site-scoped**. The summary table does not carry `brand_id`, so the `:brandId` path segment does not filter results.

---

## Query Parameters

| Parameter | Aliases | Type | Required | Default | Description |
|-----------|---------|------|----------|---------|-------------|
| `siteId` | `site_id` | string (UUID) | **yes** | — | Site UUID |
| `startDate` | `start_date` | string | no | 28 days ago | |
| `endDate` | `end_date` | string | no | today | |
| `model` | `platform` | string | no | unset | LLM model enum |
| `categoryId` | `category_id`, `category` | string | no | — | Matches via `ANY(categories)` on the summary row |
| `regionCode` | `region_code`, `region` | string | no | — | Matches via `ANY(regions)` on the summary row |
| `channel` | `selectedChannel` | string | no | — | Exact match on `content_type` (`owned`, `earned`, `paid`, `partner`) |
| `page` | — | integer ≥ 0 | no | `0` | |
| `pageSize` | — | integer 1–1000 | no | `50` | |

---

## RPC Usage

**Function:** `rpc_url_inspector_cited_domains(UUID, DATE, DATE, TEXT, TEXT, TEXT, TEXT, INTEGER, INTEGER)`

| RPC Parameter | API Source |
|---------------|------------|
| `p_site_id` | `siteId` |
| `p_start_date` | `startDate` |
| `p_end_date` | `endDate` |
| `p_category` | `categoryId` |
| `p_region` | `regionCode` |
| `p_channel` | `channel` |
| `p_platform` | `model` |
| `p_limit` | `pageSize` |
| `p_offset` | `page * pageSize` |

**Conceptual SQL:**
```sql
WITH grouped AS (
SELECT hostname,
SUM(citation_count) AS total_citations,
SUM(unique_urls) AS total_urls,
SUM(unique_prompts) AS prompts_cited,
MODE() WITHIN GROUP (ORDER BY content_type::TEXT) AS dom_content_type
FROM url_inspector_domain_stats
WHERE site_id = p_site_id
AND execution_date BETWEEN p_start_date AND p_end_date
AND (v_platform IS NULL OR model = v_platform)
AND (p_channel IS NULL OR content_type::TEXT = p_channel)
AND (p_category IS NULL OR p_category = ANY(categories))
AND (p_region IS NULL OR p_region = ANY(regions))
GROUP BY hostname
),
total AS (SELECT COUNT(*) AS cnt FROM grouped),
ranked AS (SELECT *, (SELECT cnt FROM total) AS total_cnt FROM grouped
ORDER BY total_citations DESC LIMIT p_limit OFFSET p_offset)
SELECT hostname AS domain, total_citations, total_urls, prompts_cited,
dom_content_type AS content_type,
cat_lateral AS categories, reg_lateral AS regions, total_cnt AS total_count
FROM ranked
LEFT JOIN LATERAL (...) cat_lateral ON true
LEFT JOIN LATERAL (...) reg_lateral ON true;
```

**Notes:**
- `total_urls` and `prompts_cited` are **approximate** for the same reason as `stats`: they sum per-group distinct counts from the summary table. Exact counts require the raw-table path.
- `content_type` per domain is the **statistical mode** across that domain's rows.
- `categories` and `regions` strings are comma-joined distinct tokens from all summary rows for each paginated hostname.

---

## Response Shape

```json
{
"domains": [
{
"domain": "www.example.com",
"totalCitations": 128,
"totalUrls": 17,
"promptsCited": 63,
"contentType": "earned",
"categories": "Acrobat,Analytics",
"regions": "US,GB,DE"
}
],
"totalCount": 412
}
```

- `domains[]` — up to `pageSize` entries, ordered by `totalCitations DESC`.
- `totalCount` — total distinct hostnames across the full window; lifted from the first RPC row (`0` for empty pages).

---

## Sample URLs

```
GET /org/44568c3e-.../brands/all/brand-presence/url-inspector/cited-domains?siteId=c2473d89-...&page=0&pageSize=50
```

```
GET /org/44568c3e-.../brands/all/brand-presence/url-inspector/cited-domains?siteId=c2473d89-...&channel=earned&regionCode=US&model=chatgpt
```

---

## Error Responses

| Status | Condition |
|--------|-----------|
| 400 | `siteId` missing; invalid `model`; RPC error |
| 403 | Site not in organization; no org access |
| 500 | RPC exception |

---

## Authentication & Access

Standard URL Inspector auth pipeline — `withBrandPresenceAuth`, `getOrgAndValidateAccess`, site–org validation.

---

## Related APIs

- [URL Inspector APIs Overview](./url-inspector-apis-overview.md)
- [URL Inspector Domain URLs API](./url-inspector-domain-urls-api.md) — drill down into a single domain
- [URL Inspector Stats API](./url-inspector-stats-api.md) — shares the summary table
Loading
Loading