Skip to content

fix: image-load retry policy + typed errors#343

Merged
appflowy merged 7 commits into
mainfrom
fix/image-load-retry-policy
May 15, 2026
Merged

fix: image-load retry policy + typed errors#343
appflowy merged 7 commits into
mainfrom
fix/image-load-retry-policy

Conversation

@appflowy
Copy link
Copy Markdown
Contributor

@appflowy appflowy commented May 14, 2026

Summary

User reports: images uploaded from desktop sometimes don't appear on web until the page is refreshed multiple times. This PR addresses the web client half of the problem (server-side cache-control fix is in AppFlowy-Cloud-Premium#TBD).

Root causes addressed

  1. Unauthenticated fallback poisons the browser cache. When the authenticated fetch to AppFlowy storage fails, checkImage falls through to validateImageLoad — a plain new Image() load with no auth headers. AppFlowy storage requires a Bearer token, so this fallback is guaranteed to 401/403 and the browser caches that failure under the URL. Subsequent polling retries short-circuit to the cached failure.

  2. Polling gives up after 30 s with no recovery. Img.tsx polls 5× with a 6 s interval, then shows a permanent "Image not found" error. If the upload pipeline takes longer than 30 s (common for first-render right after a desktop upload), the only fix is a page refresh — and Cloudflare's negative cache often extends the broken window past a single refresh.

Changes

src/utils/image.ts

  • Drop the no-auth validateImageLoad fallback for AppFlowy URLs — return the real status instead.
  • Typed CheckImageErrorKind: no-auth | auth-rejected | forbidden | not-ready | not-found | server-error | network | format. Lets callers pick a per-category backoff.
  • cache: 'no-store' on the auth fetch as defense-in-depth.

src/components/editor/components/blocks/image/Img.tsx

  • Replace 5×6s polling with exponential backoff ([0.5s, 1s, 2s, 4s, 8s] fast / [2s, 5s, 10s, 20s, 40s, 80s] slow), ±20% jitter, ~5 min wall-clock budget.
  • Three-state UI: loading (spinner) → pending after 10 s (spinner + "waiting for upload…") → failed (manual retry button). The pending state matters because "we're still waiting" is very different from "this is broken."
  • visibilitychange + focus listeners re-check when the user returns to the tab in pending or failed state — matches the user's mental model ("I came back and it appeared").
  • Skip retries when URL is empty (YJS sync in flight).
  • Per-run "generation" counter cancels in-flight retry chains when URL changes, component unmounts, or user manually retries.

Companion server PR

This is the consumer side. The producer side — Cache-Control: no-store on 4xx, 425 Too Early for the S3-vs-metadata race window, structured warn! logging — lives in AppFlowy-Cloud-Premium. The web client's not-ready retry category exists specifically to consume the new 425 signal.

Test plan

  • pnpm run lint clean
  • playwright test playwright/e2e/embeded/image/ — all 5 image tests pass (download, copy, toolbar hover, regression)
  • Manual: upload via desktop, open web → confirm image appears within ~5s instead of needing refresh
  • Manual: switch tabs during slow upload, return → confirm visibility re-check fires
  • Manual: paste image, click "Retry" button after a failed load

🤖 Generated with Claude Code

Summary by Sourcery

Improve reliability and user feedback for loading editor images from AppFlowy storage by introducing typed image load errors and a more robust, long-lived retry flow.

New Features:

  • Add typed image load error categories to support differentiated retry policies and user messaging.
  • Introduce a multi-phase image loading UI with loading, pending, and failed states including a manual retry action and permission-specific messaging.
  • Re-check image loads when the tab regains focus or visibility while an image is pending or failed.

Bug Fixes:

  • Prevent unauthenticated image load fallbacks from poisoning browser cache for AppFlowy storage URLs by relying solely on authenticated fetch with no-store caching.
  • Extend and harden image load retry behavior with exponential backoff and jitter to better handle slow or delayed uploads without requiring page refresh.

Enhancements:

  • Refine image loading logic to cancel stale retry chains when URLs change, components unmount, or users trigger a manual retry, and to clean up blob URLs more reliably.
  • Classify HTTP error responses and client-side failures into structured error kinds to drive more appropriate retry backoff and error handling.

Tests:

  • Preserve existing end-to-end image tests that validate common image interactions and regression coverage.

Users report that images uploaded from desktop sometimes don't appear on
web until the page is refreshed multiple times. Two web-side causes:

1. `checkImage` for AppFlowy URLs falls back to `validateImageLoad`
   (an unauthenticated `<img>` load) when the authenticated fetch fails.
   AppFlowy storage *requires* a Bearer token, so this fallback is
   guaranteed to 401/403 and the browser caches that failure under the
   URL. The polling loop then burns its retry attempts on a request
   that physically cannot succeed.

2. `Img.tsx` polls only 5× over 30 seconds with a single error state.
   If the upload pipeline takes longer than that (which it often does
   for first-render right after desktop uploads), polling exhausts and
   the user sees a permanent "Image not found" until they refresh.

Changes:

- `image.ts`: drop the no-auth fallback for AppFlowy URLs; return a
  typed `CheckImageErrorKind` (`no-auth`, `auth-rejected`, `forbidden`,
  `not-ready`, `not-found`, `server-error`, `network`, `format`) so the
  caller can pick a backoff per category. Add `cache: 'no-store'` on
  the fetch as defense-in-depth against stale CDN/browser entries.

- `Img.tsx`: replace 5×6s polling with a typed-error-driven exponential
  backoff (~5 min wall-clock budget, ±20% jitter), three-state UI
  (loading → pending after 10 s → failed after budget), and a
  visibilitychange/focus listener that re-checks when the user returns
  to the tab. Skip retries entirely when the URL is empty (YJS sync
  hasn't delivered it yet). Manual retry button when finally failed.

Together with the server-side cache-control fix (no-store on 4xx,
425 on metadata/S3 race), this turns "refresh five times" into
"wait two seconds, image appears" for the common slow-upload case.
@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented May 14, 2026

Reviewer's Guide

Refactors image loading and retry behavior for editor image blocks to avoid poisoned browser cache and add typed error handling with richer, longer-lived retry/backoff logic and user-visible states, while tightening authenticated image fetching for AppFlowy storage URLs.

Sequence diagram for updated image load and retry flow

sequenceDiagram
  actor User
  participant ImgComponent as Img
  participant checkImage
  participant checkAppFlowyImage
  participant validateImageLoad
  participant AppFlowyStorage

  User->>ImgComponent: mount Img(url)
  ImgComponent->>checkImage: runCheck(url)
  alt isAppFlowyFileStorageUrl
    checkImage->>checkAppFlowyImage: checkAppFlowyImage(url)
    checkAppFlowyImage->>AppFlowyStorage: fetch(fullUrl, Authorization, cache no-store)
    alt response.ok
      AppFlowyStorage-->>checkAppFlowyImage: 200 + image blob
      checkAppFlowyImage-->>checkImage: { ok true, validatedUrl }
    else response !ok or network error
      AppFlowyStorage-->>checkAppFlowyImage: 4xx/5xx
      checkAppFlowyImage-->>checkImage: { ok false, errorKind }
    end
  else external URL
    checkImage->>validateImageLoad: validateImageLoad(url)
    validateImageLoad-->>checkImage: { ok | errorKind network }
  end

  checkImage-->>ImgComponent: CheckImageResult
  alt ok
    ImgComponent->>ImgComponent: setPhase loading
    ImgComponent->>ImgComponent: setLocalUrl(validatedUrl)
    ImgComponent-->>User: image visible after <img> onLoad
  else !ok
    ImgComponent->>ImgComponent: backoffSchedule(errorKind)
    ImgComponent->>ImgComponent: schedule jittered retry
    loop until success or budget exhausted
      ImgComponent->>checkImage: runCheck(url)
    end
    alt retries exhausted
      ImgComponent-->>User: phase failed + Retry button
    end
  end

  User->>ImgComponent: click Retry or tab focus/visible
  ImgComponent->>checkImage: runCheck(url) (new generation)
Loading

File-Level Changes

Change Details Files
Introduce typed image load error classification and centralize AppFlowy-auth image fetching, removing the unauthenticated fallback that could poison browser cache.
  • Add CheckImageErrorKind union and extend CheckImageResult with errorKind for categorized failures.
  • Refactor AppFlowy storage image handling into checkAppFlowyImage with authenticated fetch, blob validation, and no-store caching behavior.
  • Remove unauthenticated validateImageLoad fallback for AppFlowy URLs and instead return typed error results; keep validateImageLoad for external URLs with normalized error handling.
src/utils/image.ts
Redesign Img component’s polling and UI state to use an exponential backoff with jitter, a multi-phase load state, and focus/visibility-triggered rechecks, while managing blob URLs safely.
  • Replace fixed-count polling with backoffSchedule-based exponential retry sequences keyed to CheckImageErrorKind, with jitter and a 5-minute wall-clock cap.
  • Add LoadPhase state (loading/pending/failed), pending-after timeout, and manual retry handler, wiring them into a new three-state UI including a retry button and forbidden-specific messaging.
  • Track run generations and mounted state to cancel obsolete retry loops, clean up blob URLs via a revoke helper, and re-trigger checks on URL changes, tab visibility, and window focus events; adjust visibility and onLoad handling accordingly.
src/components/editor/components/blocks/image/Img.tsx

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The LoadPhase state never leaves 'loading' on successful load (the <img> onLoad handler calls setPhase('loading') again and there is no 'loaded' phase), so hideImage will remain true and keep the image hidden; consider introducing a distinct success state or setting phase to something non-hidden on successful load.
  • In runCheck, the success path creates a blob URL but relies on the caller to revoke it; it might be safer to centralize blob URL lifecycle management (creation and revocation) in one place to avoid future leaks when checkAppFlowyImage is reused.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `LoadPhase` state never leaves `'loading'` on successful load (the `<img>` onLoad handler calls `setPhase('loading')` again and there is no `'loaded'` phase), so `hideImage` will remain true and keep the image hidden; consider introducing a distinct success state or setting `phase` to something non-hidden on successful load.
- In `runCheck`, the success path creates a blob URL but relies on the caller to revoke it; it might be safer to centralize blob URL lifecycle management (creation and revocation) in one place to avoid future leaks when `checkAppFlowyImage` is reused.

## Individual Comments

### Comment 1
<location path="src/utils/image.ts" line_range="126-129" />
<code_context>
     const img = new Image();

     // Set a timeout to handle very slow loads
     const timeoutId = setTimeout(() => {
-      resolve({
-        ok: false,
-        status: 408,
-        statusText: 'Request Timeout',
-        error: 'Image loading timed out',
-      });
+      resolve(errorResult(408, 'Request Timeout', 'not-ready', 'Image loading timed out'));
     }, 10000); // 10 second timeout

</code_context>
<issue_to_address>
**suggestion (performance):** Timeouts for external image loads are classified as 'not-ready', which may cause overly aggressive fast retries.

In `validateImageLoad`, the timeout path currently resolves with:

```ts
resolve(errorResult(408, 'Request Timeout', 'not-ready', 'Image loading timed out'));
```

For external URLs, a timeout is more likely a slow/unstable host than an in-progress upload, so treating it as `'not-ready'` drives `backoffSchedule` to use `FAST_BACKOFF_MS` and retry too aggressively.

Consider classifying timeouts as `'network'` instead (or using `classifyHttpStatus(408)` after updating its mapping), so these failures follow the slower backoff path and avoid unnecessary pressure on third-party hosts.

```suggestion
    // Set a timeout to handle very slow loads
    // External timeouts are treated as 'network' failures so retries follow
    // the slower backoff path rather than the aggressive 'not-ready' schedule.
    const timeoutId = setTimeout(() => {
      resolve(errorResult(408, 'Request Timeout', 'network', 'Image loading timed out'));
    }, 10000); // 10 second timeout
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread src/utils/image.ts
Comment on lines 126 to 129
// Set a timeout to handle very slow loads
const timeoutId = setTimeout(() => {
resolve({
ok: false,
status: 408,
statusText: 'Request Timeout',
error: 'Image loading timed out',
});
resolve(errorResult(408, 'Request Timeout', 'not-ready', 'Image loading timed out'));
}, 10000); // 10 second timeout
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (performance): Timeouts for external image loads are classified as 'not-ready', which may cause overly aggressive fast retries.

In validateImageLoad, the timeout path currently resolves with:

resolve(errorResult(408, 'Request Timeout', 'not-ready', 'Image loading timed out'));

For external URLs, a timeout is more likely a slow/unstable host than an in-progress upload, so treating it as 'not-ready' drives backoffSchedule to use FAST_BACKOFF_MS and retry too aggressively.

Consider classifying timeouts as 'network' instead (or using classifyHttpStatus(408) after updating its mapping), so these failures follow the slower backoff path and avoid unnecessary pressure on third-party hosts.

Suggested change
// Set a timeout to handle very slow loads
const timeoutId = setTimeout(() => {
resolve({
ok: false,
status: 408,
statusText: 'Request Timeout',
error: 'Image loading timed out',
});
resolve(errorResult(408, 'Request Timeout', 'not-ready', 'Image loading timed out'));
}, 10000); // 10 second timeout
// Set a timeout to handle very slow loads
// External timeouts are treated as 'network' failures so retries follow
// the slower backoff path rather than the aggressive 'not-ready' schedule.
const timeoutId = setTimeout(() => {
resolve(errorResult(408, 'Request Timeout', 'network', 'Image loading timed out'));
}, 10000); // 10 second timeout

appflowy added 6 commits May 14, 2026 21:41
Previously, `checkAppFlowyImage` set `cache: 'no-store'` on every fetch
as defense-in-depth against stale negative cache entries. But that
disables the browser HTTP cache for *all* responses — including the
200s that the server now tags `Cache-Control: public, immutable,
max-age=31536000`. Result: every component mount re-downloaded the
image even though the server told the browser it could keep the bytes
for a year.

Switch to a per-attempt cache policy:

- First attempt: `cache: 'default'` — honor server headers. Successful
  fetches enter the HTTP cache; the next mount of the same Img is free.
  Failures (no-store from the server) still skip the cache, so no
  poisoning risk.
- Retry attempts: `cache: 'reload'` — force a server round-trip to
  defeat any stale cached response from a misbehaving proxy. Unlike
  `no-store`, `reload` still lets the *new* response enter the cache,
  so once we finally succeed we stop paying network for re-renders.

Adds a `CheckImageOptions { retry?: boolean }` parameter to `checkImage`
and threads it from the polling loop in Img.tsx (`{ retry: attempt > 0 }`).
Six fixes to Img.tsx + image.ts, in order of severity:

1. Image was hidden forever after successful load. `phase === 'loading'`
   left `visibility: hidden` on the <img>, and the <img>'s own onLoad
   handler also wrote 'loading'. Split into two state machines: `phase`
   tracks the fetch lifecycle (loading|pending|failed), `isImageReady`
   tracks the <img>'s decode lifecycle (set by native onLoad). Visibility
   keys on `isImageReady` only.

2. Inline `onLoad={() => ...}` props from parent re-mounted on every
   render churned `runCheck`'s identity, which made the URL-effect cancel
   the in-flight retry chain and start fresh — so every parent re-render
   re-fetched the image. Now `onLoad` is read through `onLoadRef` and
   `runCheck` has a stable identity (deps: [cancelAllInflight, scheduleTimer]).

3. The visibilitychange/focus listener effect re-subscribed every time
   phase or url changed. Now attached once; reads via `phaseRef`, `urlRef`.

4. The auth fetch in `image.ts::checkAppFlowyImage` ignored cancellation —
   the polling loop checked refs between awaits, but the fetch itself ran
   to completion. Added `signal?: AbortSignal` to `CheckImageOptions` and
   plumbed an `AbortController` per `runCheck` invocation. AbortError is
   classified separately so it doesn't log as a network error.

5. Untracked `setTimeout`s (pendingTimer, retry-delay) now go through a
   `timersRef` Set. On cancellation `cancelAllInflight()` aborts every
   controller and clears every pending timer in one call. Retry-delay
   `sleep` resolves early when the abort signal fires.

6. Free perf wins on the rendered <img>:
   - `loading="lazy"` — defer offscreen images until they near viewport
   - `decoding="async"` — let the browser decode off-thread
   - Added `onError` handler that flips to 'failed' on decode failures
     (rare but previously silent).

Skipped from the review:
- Better `alt` text (would require parent prop plumbing; pre-existing).
- Extracting state machine into a `useImageWithRetry` hook (worthwhile
  but a structural change; tracked for a follow-up).
Two follow-ups to the React best-practices review (PR comments #7, #8):

**#8 — Extract `useImageWithRetry` hook.** The fetch state machine
(backoff schedules, abort controllers, timer tracking, ref-stable
listener attachment) was ~200 lines of imperative async logic awkwardly
inlined in a presentation component. Moved to
`src/components/editor/components/blocks/image/useImageWithRetry.ts`,
which returns `{ src, phase, isImageReady, lastError, retry,
onImageLoaded, onImageError }`. `Img.tsx` is now ~100 lines of pure
rendering. The hook is unit-testable in isolation with a fetch mock,
and the JSX no longer hides correctness concerns behind a wall of
useCallback/useEffect plumbing.

**#7 — Accessibility.** Pre-existing issue: every image rendered with
`alt={''}`, which tells assistive tech "this is decorative" — wrong
for content images in a document editor. `Img` now accepts an `alt`
prop. When omitted (current callers), it derives a label from the URL
filename — strips extension, percent-encoding, AppFlowy upload prefix,
collapses separators. Falls back to "Embedded image" if nothing
sensible can be extracted. Future work: thread an explicit `alt` from
ImageBlockData once the data model gains a caption field.

Also tightened the failure/loading overlays with `role="status"`,
`aria-live="polite"`, `aria-label`, and `aria-hidden` on the
decorative error icon — screen readers now announce loading state
transitions instead of going silent.

Behavior unchanged; lint + type-check clean. Image E2E tests not
re-run in this commit (local docker stack down — port 8000 not
listening); they passed against the previous commit on this branch
and the refactor preserves all observable behavior.
Five small fixes from the React best-practices re-review:

1. Pending-timer false flash (#1): the loading→pending promotion timer
   could fire AFTER fetch success but BEFORE <img> decode, briefly
   showing "Waiting for upload to finish…" when the upload was in fact
   finished. Track a local `settled` flag for each run; the timer
   short-circuits once settled.

2. Rename `previousBlobUrlRef` → `currentBlobUrlRef` (#2). It holds the
   current rendered blob URL, not a previous one — name now matches
   meaning. Pure cosmetics, no behavior change.

3. Smarter alt-text fallback (#3): AppFlowy storage URLs end in opaque
   hashes (file_id) that produce nonsense alts like "bGTk 4nxVz" — no
   better than empty alt for screen-reader users. Short-circuit those
   URLs to a generic "Image" via `isAppFlowyFileStorageUrl`. Also reject
   non-word-looking segments from external URLs (heuristic: needs a
   vowel and 3+ chars).

4. Collapse three ref-mirror useEffects into one `useLatest` helper (#4),
   matching Vercel's `advanced-use-latest` pattern. One effect per ref
   instead of three; reusable if more refs are needed later.

5. Simplify hook signature (#5): `useImageWithRetry(url, onReady?)`
   instead of `useImageWithRetry(url, { onReady })`. Drops the per-render
   options-object allocation and the indirection inside the hook.

Lint, type-check, and 5 image E2E tests pass.
@appflowy appflowy merged commit ac6dd52 into main May 15, 2026
13 checks passed
@appflowy appflowy deleted the fix/image-load-retry-policy branch May 15, 2026 02:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant