Skip to content

docs: ADR-002 preflight API REST redesign#2377

Open
GeezerPelletier wants to merge 19 commits into
mainfrom
sites-44290-preflight-api-rest-redesign
Open

docs: ADR-002 preflight API REST redesign#2377
GeezerPelletier wants to merge 19 commits into
mainfrom
sites-44290-preflight-api-rest-redesign

Conversation

@GeezerPelletier
Copy link
Copy Markdown
Contributor

@GeezerPelletier GeezerPelletier commented May 8, 2026

Summary

  • Adds docs/decisions/002-preflight-api-rest-redesign.md capturing the decision to replace /preflight/beta/jobs with site-scoped REST endpoints under /sites/:siteId/preflights, and to deprecate both legacy endpoint pairs
  • No code changes — this is a decision record for team review before implementation begins

What's being proposed (SITES-44290)

POST /sites/:siteId/preflights

Request body:

{
  "url": "https://main--site--org.hlx.page/some-path"
}

url stays in the body — Mysticat requires it to identify the specific page to analyze. promiseToken passed via cookie for authenticated CMS pages. createdBy is derived server-side from the caller's IMS profile and never supplied by the client. step is removed — Mysticat's agent always runs the full identify/suggest flow.

Response 202 Accepted:

Headers:

Location: https://spacecat.experiencecloud.live/api/v1/sites/{siteId}/preflights/{preflightId}

Body:

{
  "preflightId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "status": "IN_PROGRESS",
  "createdAt": "2026-05-11T10:00:00.000Z",
  "createdBy": "user@example.com"
}

Error responses:

Status errorCode Condition
400 Bad Request PREFLIGHT_INVALID_REQUEST url is missing or invalid
403 Forbidden PREFLIGHT_ACCESS_DENIED Caller does not have access to the site
403 Forbidden PREFLIGHT_NOT_ENABLED Preflight is not enabled for the site
404 Not Found PREFLIGHT_SITE_NOT_FOUND siteId does not exist
500 Internal Server Error PREFLIGHT_INTERNAL_ERROR Mysticat call failed or unexpected error

Error response body:

{
  "errorCode": "PREFLIGHT_NOT_ENABLED",
  "message": "Preflight is not enabled for this site"
}

No job record is created for 400, 403, or 404 responses.

GET /sites/:siteId/preflights

Response 200 OK — grouped by URL (a site can have preflights across multiple pages):

[
  {
    "url": "https://main--site--org.hlx.page/some-path",
    "preflights": [
      {
        "preflightId": "3fa85f64-...",
        "status": "COMPLETED",
        "createdAt": "2026-05-11T10:00:00.000Z",
        "updatedAt": "2026-05-11T10:00:05.000Z",
        "createdBy": "user@example.com"
      },
      {
        "preflightId": "7c9b1e32-...",
        "status": "COMPLETED",
        "createdAt": "2026-05-11T10:05:00.000Z",
        "updatedAt": "2026-05-11T10:05:04.000Z",
        "createdBy": "user@example.com"
      }
    ]
  }
]

GET /sites/:siteId/preflights/:preflightId

Response 200 OK — full detail:

{
  "preflightId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "status": "COMPLETED",
  "url": "https://main--site--org.hlx.page/some-path",
  "createdAt": "2026-05-11T10:00:00.000Z",
  "createdBy": "user@example.com",
  "updatedAt": "2026-05-11T10:00:05.000Z",
  "startedAt": "2026-05-11T10:00:01.000Z",
  "endedAt": "2026-05-11T10:00:05.000Z",
  "result": {},
  "error": null
}

Data model

AsyncJob will be extended in spacecat-shared-data-access with siteId as a top-level indexed attribute and an allBySiteIdAndJobType(siteId, jobType) collection method. A purpose-built Preflight entity is not needed — AsyncJob records are TTL-based (~7 days) and inherently transient, so a separate model would just add cleanup complexity for no gain.

createdBy (IMS email) is stored in job metadata at creation time and surfaces in all three endpoint responses for audit purposes.

Current endpoint fate

Both legacy endpoint pairs are deprecated and remain functional in parallel with the new endpoints until consumers have migrated and a deletion milestone is agreed upon.

Endpoint Action
POST /preflight/beta/jobs Deprecated — internal use only; no external consumers
GET /preflight/beta/jobs/:jobId Deprecated — internal use only; no external consumers
POST /preflight/jobs Deprecated — queue-based; migration timeline to be coordinated with consumers
GET /preflight/jobs/:jobId Deprecated — queue-based; migration timeline to be coordinated with consumers

Test plan

  • Read the ADR and validate the design rationale
  • Confirm the proposed endpoint shape matches team expectations before implementation starts

🤖 Generated with Claude Code

Guy Pelletier and others added 2 commits May 8, 2026 14:25
Captures the decision to replace /preflight/jobs and /preflight/beta/jobs
with site-scoped endpoints under /sites/:siteId/preflights.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

This PR will trigger no release when merged.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 8, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Guy Pelletier and others added 2 commits May 11, 2026 08:54
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@GeezerPelletier GeezerPelletier marked this pull request as draft May 11, 2026 13:07
Guy Pelletier and others added 4 commits May 11, 2026 09:09
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
AsyncJob has no siteId index; GET /sites/:siteId/preflights requires
either extending AsyncJob or a new Preflight entity in shared-data-access.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…eId index

AsyncJob TTL (~7 days) makes a purpose-built Preflight entity unnecessary.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Existing job creation paths must not be affected; new endpoints
populate siteId at creation time and query by it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Guy Pelletier and others added 2 commits May 11, 2026 09:54
Capture IMS caller identity server-side at job creation for audit purposes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…sponse

Per RFC 7231 §6.3.3 the created resource URL belongs in the Location
response header, not a custom body field.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Guy Pelletier and others added 3 commits May 11, 2026 10:26
- Add error response table to POST endpoint (400/403/404/500)
- Disabled-site case returns 403 immediately; no job record created
- Previous CANCELLED-job behavior on disabled preflight is explicitly removed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Both /preflight/beta/jobs and /preflight/jobs are deprecated and run in
parallel with the new endpoints until consumers migrate and a deletion
milestone is agreed upon.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@GeezerPelletier GeezerPelletier marked this pull request as ready for review May 11, 2026 14:42
Body:
```json
{
"preflightId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this not just be id ? Is this not the id of the aysnc job?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rename is intentional — jobId leaks the AsyncJob backing model to consumers, which is an implementation detail they shouldn't need to know about. preflightId is the domain name for this resource. Internally it maps 1:1 to the AsyncJob ID.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, preflightId is fine, I figured that id would be enough. The response is a preflight response and here's the id.

{
"preflightId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"status": "IN_PROGRESS",
"createdAt": "2026-05-11T10:00:00.000Z",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include last updatedAt as well.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call — updatedAt is already on the GET by ID response. Worth adding to the list items too so clients can detect state changes without fetching each one individually.

**Request body** (`application/json`):
```json
{
"url": "https://main--site--org.hlx.page/some-path",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today you can pass a list of urls.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and that's an intentional change — one URL per request, one preflightId per URL. Each preflight is an independently pollable resource. If the MFE needs bulk in future, clients fire parallel requests. This avoids partial-failure complexity and keeps each job clean. See the key changes section for the full rationale.

Copy link
Copy Markdown
Contributor

@bhellema bhellema May 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Client's could select hundreds of pages. This is too much load to manage. Use case, customer is ready to run a promotion and they have many different areas of the site they want to run preflight on. Allowing them to send 1 job with many urls makes sense. It could take a while, but running in bulk this way would be expected to be slower.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same response as above — see the thread on line 44 for the full proposal.

```json
{
"url": "https://main--site--org.hlx.page/some-path",
"step": "identify",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concept of step should be going away as we always will perform identify/suggest as part of the mysticat agent step.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point and worth nailing down now. If the Mysticat agent always runs both identify and suggest as a single flow, step is redundant and we should drop it from the new endpoint entirely rather than carry it forward and deprecate it later. Can you confirm that's the direction? If so we'll remove it from the design.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the direction that we want to go.

{
"url": "https://main--site--org.hlx.page/some-path",
"step": "identify",
"mystiqueUrl": "optional-ephemeral-host.stage.cloud.adobe.io"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the value of sending this back to the client.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify — mystiqueUrl is a request-only field, never returned in any response. It lets developers route a preflight to an ephemeral Mysticat instance for testing without hitting prod, and is blocked in non-dev environments. If you're questioning whether it belongs in the API at all that's a fair debate — it could move to an env-level config instead.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove it from the API response.

|-------|------|----------|-------------|
| `url` | string (URI) | Yes | The single page URL to analyze |
| `step` | enum: `identify` \| `suggest` | Yes | Audit step to perform |
| `mystiqueUrl` | string | No | Dev-only override for the Mysticat service URL |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

url i don't think needs to be exposed here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The url field is required in the request body because Mysticat needs it to scrape the specific page via DRS headless browser — siteId alone isn't enough since a site has many pages and preflight is per-page. As for the response, it is useful in GET by ID so the consumer knows what was analyzed. Could reasonably be omitted from the 202 POST response body given the Location header already points to the resource.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this should be exposed as part of the public api.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the misread — we interpreted this as being about the url field rather than mystiqueUrl. Agreed, mystiqueUrl is removed from the public documentation. It stays in the implementation as an undocumented dev override for testing against ephemeral Mysticat instances, just not part of the public API contract.

| `step` | enum: `identify` \| `suggest` | Yes | Audit step to perform |
| `mystiqueUrl` | string | No | Dev-only override for the Mysticat service URL |

`promiseToken` is passed via cookie for authenticated CMS pages (CS/CS_CW/AMS sites); it is not part of the request body.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a fix that should be set via the header not cookie.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed a header would be cleaner. The cookie approach is inherited from the /auth/promise browser flow where it's set automatically. Switching to a header would require the MFE to read and forward it explicitly — worth a separate cleanup ticket but out of scope for this redesign.

`promiseToken` is passed via cookie for authenticated CMS pages (CS/CS_CW/AMS sites); it is not part of the request body.

`createdBy` is derived server-side from the caller's IMS profile (`authInfo.getProfile().email`) and is never supplied by the client.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createdAt - would be good to report too

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createdAt is already in the 202 response body — you may have been looking at an earlier version of the doc before we expanded the response shapes.

| `500 Internal Server Error` | Mysticat call failed or unexpected error |

No job record is created for `400`, `403`, or `404` responses. The current `/preflight/beta/jobs`
behavior of creating a job and immediately setting it to `CANCELLED` when preflight is disabled
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cancelled is not the correct status here it feels like cancelled is a user action. Maybe something like ABORTED.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point on the semantics, but we've resolved this differently — the new design returns 403 Forbidden immediately when preflight is disabled, without creating a job record at all. No phantom CANCELLED or ABORTED record is created. See the error responses table in the POST section.

Copy link
Copy Markdown
Contributor

@bhellema bhellema May 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I like it. However as part of the 403, what does the response body look like to allow the UI to provide a meaningful message to the user as if they do not have access to the site, or that the audit was disabled.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. The error body includes an errorCode for machine-readable handling and a message as a human-readable hint:

{
  "errorCode": "PREFLIGHT_NOT_ENABLED",
  "message": "Preflight is not enabled for this site"
}

All codes are PREFLIGHT_-prefixed for consistency: PREFLIGHT_ACCESS_DENIED, PREFLIGHT_NOT_ENABLED, PREFLIGHT_INVALID_REQUEST, PREFLIGHT_SITE_NOT_FOUND, PREFLIGHT_INTERNAL_ERROR. Consumers translate errorCode to their own messaging — message is informational only. ADR updated.

Open to other suggestions on the error code style if there's a preferred convention.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy that at least there's an error code! Using a numeric value just requires another lookup, I don't mind the string.

Comment on lines +116 to +134
{
"url": "https://main--site--org.hlx.page/some-path",
"preflights": [
{
"preflightId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"status": "COMPLETED",
"step": "identify",
"createdAt": "2026-05-11T10:00:00.000Z",
"createdBy": "user@example.com"
},
{
"preflightId": "7c9b1e32-1234-4abc-b3fc-9f8a7c6d5e4f",
"status": "COMPLETED",
"step": "suggest",
"createdAt": "2026-05-11T10:05:00.000Z",
"createdBy": "user@example.com"
}
]
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This structure is not what I would expect, but rather a list of jobs with some minor details:

[
  {
      "preflightId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
      "status": "COMPLETED",
      "urlCount": 12,   <----- maybe show the number of urls they ran with
      "opportunities": 5,  <------ the number of opportunities that were created
      "createdAt": "2026-05-11T10:00:00.000Z",
      "createdBy": "user@example.com"
  }
]

Listing the preflight jobs should be lightweight and fast. Details of the jobs can be fetched with a follow up with the id of the preflight job.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flat list is simpler and we can drop the URL grouping — agreed. The opportunities count is an interesting addition; does Mysticat write that back to the job result today? If so it's worth surfacing in the list.

One question on urlCount: that field implies a job can span multiple URLs, but the new design is one URL per preflight. With that model urlCount would always be 1. Was the intent to keep multi-URL jobs, or is urlCount a carry-over from the old design?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we must consider the ability for multiple urls. If we get to the point of providing a different experience where they can select many pages and initiate a preflight request, we need a way to make this experience simply for the client, being a headless agent or client ui.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the thread on line 44 — the proposal is to keep this endpoint single-URL for now since it was never actually used with multiple URLs in practice, and add a dedicated POST /sites/:siteId/preflights/bulk endpoint when that use case is confirmed. Adding it later is cheap and keeps the design clean in the meantime.

Copy link
Copy Markdown
Contributor

@bhellema bhellema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really need to consider what the bulk use case looks like.

Guy Pelletier and others added 3 commits May 11, 2026 13:34
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Confirmed by Ben: identify/suggest are no longer separate steps.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Dev-only override, not part of the public contract.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Guy Pelletier and others added 2 commits May 11, 2026 13:45
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants