feat(url-inspector-refresh): worker handler + PostgREST client (LLMO-4563) by JayKid · Pull Request #280 · adobe/spacecat-task-processor

JayKid · 2026-05-18T14:30:18Z

Summary

Adds the task-processor worker side of the LLMO-4563 ongoing-refresh strategy: a new url-inspector-refresh handler that, given { type, siteId }, calls the per-site staleness RPC and refreshes any stale (site, month) slices via the existing wrpc_refresh_url_inspector_domain_stats. The companion every-30-min dispatcher (which feeds this queue) and the SQL migration land in separate PRs:

Dispatcher (one-per-site SQS fan-out on the every-30-min schedule): adobe/spacecat-jobs-dispatcher#714
SQL migration (rpc_url_inspector_stale_slices_for_site + supporting index + dev runbook): adobe/mysticat-data-service#611 — must merge first or this handler 404s on the staleness RPC

What's in here

File	Purpose
`src/utils/postgrest-client.js` (new)	~180-line `fetch`-based PostgREST client, single `.rpc(name, params)` method
`src/tasks/url-inspector-refresh/handler.js` (new)	The worker handler — staleness query + per-month refresh loop
`src/index.js`	Registers `'url-inspector-refresh': urlInspectorRefresh` in `HANDLERS`
`test/utils/postgrest-client.test.js` (new)	24 unit tests, 100% coverage on the client
`test/tasks/url-inspector-refresh/url-inspector-refresh.test.js` (new)	16 unit tests, 100% coverage on the handler

Design notes

Why a new mini PostgREST client instead of `@adobe/spacecat-shared-data-access` v3

The shared v3 package is activated by setting DATA_SERVICE_PROVIDER=postgres in the runtime env, but task-processor today is in DynamoDB mode for every other handler — disable-import-audit-processor, opportunity-status-processor, agent-executor, etc. all read dataAccess.Site/dataAccess.Configuration from DDB via src/support/data-access.js. Flipping the global flag would change all of them in one step.

postgrest-client.js lets a single handler opt into PostgREST without that side effect. It exposes the subset of the supabase-js shape that the api-service already uses (const { data, error } = await client.rpc(...)), so a future global migration to v3 is a one-line swap.

Failure model: no throws, no DLQ — leans on the next 30-min tick instead

The spacecat-task-processor-jobs queue runs with maxReceiveCount=1 (spacecat-infrastructure/modules/sqs/queues.tf:31). Combined with the fact that processTask in src/index.js catches all handler throws and returns internalServerError() (which from SQS's perspective is a successful Lambda invocation), there is no path by which a thrown error here would actually reach the DLQ.

So the handler is built to never throw to processTask:

Per-RPC retry: the staleness query and every per-month refresh call retry up to PER_RPC_ATTEMPTS=2 times in-handler with backoff.
Per-month isolation: a failed month is logged + counted + skipped; the loop continues with the next month. The failed month stays "stale" in url_inspector_domain_stats (the refresh RPC DELETEs before INSERT-ing under pg_advisory_xact_lock), so the next 30-min schedule tick will see it again and retry naturally.
Catastrophic staleness failure: same shape — log error, return ok({ stalenessFailed: true }), let the next tick re-detect and re-attempt.

The whole pipeline is idempotent + self-healing: at most one site falls behind by one 30-min tick before catch-up. CloudWatch alarms on the structured per-month error log lines (see below) cover the "we silently stopped refreshing" failure mode.

Per-invocation budget

SQS visibility timeout on this queue is 900s. We cap wall time at PER_INVOCATION_BUDGET_MS = 12 min and defer remaining stale months to the next schedule tick. This bounds the worst case (very-stale site after a long outage) at (staleness RPC limit) × (next tick) instead of an unbounded queue-time runaway.

Observability via structured log lines, not a CloudWatch SDK call

Each per-month outcome emits a single JSON-stringified log line:

{"event":"url-inspector-refresh.refresh","siteId":"...","month_start":"2026-04-01","status":"ok","attempts":1,"durationMs":2150}

A CloudWatch metric filter (next infra PR, tf_alarms todo) turns these into RefreshCalls{result=ok|error} counters + RefreshDurationMs distributions without an @aws-sdk/client-cloudwatch dep on the hot path. Same approach for the staleness-failed and dispatch-summary log lines.

Runtime config required

These env vars must be present in the task-processor Lambda for this handler to function. They are NOT currently set (task-processor has no /helix-deploy/spacecat-services/task-processor/latest secret at all — task-processor reads everything from Lambda env + the shared catalog via vaultSecrets). They will be provisioned in a separate ops step (tp_provision_secret):

Var	Source	Why
`POSTGREST_URL`	`/helix-deploy/spacecat-services/all`	Where to send `/rpc/...` POSTs
`POSTGREST_API_KEY`	`/helix-deploy/spacecat-services/all` (writer JWT)	Auth — `wrpc_refresh_*` requires `postgrest_writer` role
`POSTGREST_SCHEMA`	`/helix-deploy/spacecat-services/all` (default `public`)	Schema selector

⚠️ Deliberately not setting DATA_SERVICE_PROVIDER=postgres — that would flip every existing handler to v3 PostgREST and is out of scope.

Test plan

npm run lint clean
npm test — 482 passing (up from 466), 100/100/100/100 coverage on both new files
Probe invocation in dev with a synthetic SQS payload { type: 'url-inspector-refresh', siteId: '9ae8877a-bbf3-407d-9adb-d6a72ce3c5e3' } once the dev secret is provisioned + the SQL migration is applied → verify a successful run end-to-end against real Aurora
CI deploy-dev succeeds
Post-deploy: trigger a synthetic BP ingest on adobe.com and verify <30min freshness on url_inspector_domain_stats

Out of scope (follow-ups, in separate PRs)

Every-30-min dispatcher in spacecat-jobs-dispatcher that fans this queue out per-site
aws_scheduler_schedule (every-30-min cron) + task-processor secret shell in spacecat-infrastructure — adobe/spacecat-infrastructure#531
New secrets/secrets.env for task-processor + npm run deploy-secrets to expose the PostgREST trio above
CloudWatch metric filters + alarms on the structured log lines (DispatchRuns, RefreshFailures, StalenessQueryFailures, RefreshSuccesses) — adobe/spacecat-infrastructure#531
Seeding feature_flags(product=LLMO, flag_name=url_inspector_pg, flag_value=true) for the adobe.com pilot org

…4563) Implements the task-processor side of LLMO-4563 Strategy B: a new url-inspector-refresh handler that, given { type, siteId }, calls the per-site staleness RPC and refreshes any stale (site, month) slices via the existing wrpc_refresh_url_inspector_domain_stats. The companion dispatcher (every-30-min fan-out from spacecat-jobs-dispatcher) and SQL migration land in separate PRs. Depends on the staleness RPC introduced in adobe/mysticat-data-service#611 (rpc_url_inspector_stale_slices_for_site). src/utils/postgrest-client.js (new): minimal ~180-line fetch-based client with a single .rpc(name, params) method. Supabase-shaped { data, error } return, Content-Profile / Accept-Profile / Authorization: Bearer headers, per-request AbortController with composable external signal, never-throws contract. Built this instead of importing @adobe/spacecat-shared-data-access v3 because flipping DATA_SERVICE_PROVIDER=postgres would change every other handler's data backend from DynamoDB to PostgREST. src/tasks/url-inspector-refresh/handler.js (new): validates siteId UUID, queries staleness with 2x retry, loops stale months with per-month isolation and 12-min wall-time budget (SQS visibility timeout is 900s), and emits one structured log line per outcome for downstream CloudWatch metric filters. Does NOT throw on per-month failures or on staleness errors: the queue runs maxReceiveCount=1 and processTask in src/index.js swallows handler throws anyway, so throwing would not produce DLQ messages. Instead, failed months stay stale in the DB and the next 30-min schedule tick retries them naturally — leaning on the per-site advisory lock in wrpc_refresh_url_inspector_domain_stats for idempotency. src/index.js: registers url-inspector-refresh in HANDLERS. Tests: unit tests for the client (24 tests) and handler (16 tests) cover happy path, retry-then-success, retry-exhausted (no throw), per-month failure isolation, budget-exhausted deferral, structured log line emission, and the default-sleep fallback. 100% statements / branches / functions / lines on both new files; full task-processor suite up from 466 to 482 passing.

codecov · 2026-05-18T14:38:04Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

JayKid temporarily deployed to dev-branches May 18, 2026 14:31 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(url-inspector-refresh): worker handler + PostgREST client (LLMO-4563)#280

feat(url-inspector-refresh): worker handler + PostgREST client (LLMO-4563)#280
JayKid wants to merge 1 commit into
mainfrom
feat/ongoing-refresh-strategy-for-url-LLMO-4563

JayKid commented May 18, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JayKid commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's in here

Design notes

Why a new mini PostgREST client instead of @adobe/spacecat-shared-data-access v3

Failure model: no throws, no DLQ — leans on the next 30-min tick instead

Per-invocation budget

Observability via structured log lines, not a CloudWatch SDK call

Runtime config required

Test plan

Out of scope (follow-ups, in separate PRs)

Related

Uh oh!

codecov Bot commented May 18, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JayKid commented May 18, 2026 •

edited

Loading

Why a new mini PostgREST client instead of `@adobe/spacecat-shared-data-access` v3