fix: image-load retry policy + typed errors#343
Conversation
Users report that images uploaded from desktop sometimes don't appear on web until the page is refreshed multiple times. Two web-side causes: 1. `checkImage` for AppFlowy URLs falls back to `validateImageLoad` (an unauthenticated `<img>` load) when the authenticated fetch fails. AppFlowy storage *requires* a Bearer token, so this fallback is guaranteed to 401/403 and the browser caches that failure under the URL. The polling loop then burns its retry attempts on a request that physically cannot succeed. 2. `Img.tsx` polls only 5× over 30 seconds with a single error state. If the upload pipeline takes longer than that (which it often does for first-render right after desktop uploads), polling exhausts and the user sees a permanent "Image not found" until they refresh. Changes: - `image.ts`: drop the no-auth fallback for AppFlowy URLs; return a typed `CheckImageErrorKind` (`no-auth`, `auth-rejected`, `forbidden`, `not-ready`, `not-found`, `server-error`, `network`, `format`) so the caller can pick a backoff per category. Add `cache: 'no-store'` on the fetch as defense-in-depth against stale CDN/browser entries. - `Img.tsx`: replace 5×6s polling with a typed-error-driven exponential backoff (~5 min wall-clock budget, ±20% jitter), three-state UI (loading → pending after 10 s → failed after budget), and a visibilitychange/focus listener that re-checks when the user returns to the tab. Skip retries entirely when the URL is empty (YJS sync hasn't delivered it yet). Manual retry button when finally failed. Together with the server-side cache-control fix (no-store on 4xx, 425 on metadata/S3 race), this turns "refresh five times" into "wait two seconds, image appears" for the common slow-upload case.
Reviewer's GuideRefactors image loading and retry behavior for editor image blocks to avoid poisoned browser cache and add typed error handling with richer, longer-lived retry/backoff logic and user-visible states, while tightening authenticated image fetching for AppFlowy storage URLs. Sequence diagram for updated image load and retry flowsequenceDiagram
actor User
participant ImgComponent as Img
participant checkImage
participant checkAppFlowyImage
participant validateImageLoad
participant AppFlowyStorage
User->>ImgComponent: mount Img(url)
ImgComponent->>checkImage: runCheck(url)
alt isAppFlowyFileStorageUrl
checkImage->>checkAppFlowyImage: checkAppFlowyImage(url)
checkAppFlowyImage->>AppFlowyStorage: fetch(fullUrl, Authorization, cache no-store)
alt response.ok
AppFlowyStorage-->>checkAppFlowyImage: 200 + image blob
checkAppFlowyImage-->>checkImage: { ok true, validatedUrl }
else response !ok or network error
AppFlowyStorage-->>checkAppFlowyImage: 4xx/5xx
checkAppFlowyImage-->>checkImage: { ok false, errorKind }
end
else external URL
checkImage->>validateImageLoad: validateImageLoad(url)
validateImageLoad-->>checkImage: { ok | errorKind network }
end
checkImage-->>ImgComponent: CheckImageResult
alt ok
ImgComponent->>ImgComponent: setPhase loading
ImgComponent->>ImgComponent: setLocalUrl(validatedUrl)
ImgComponent-->>User: image visible after <img> onLoad
else !ok
ImgComponent->>ImgComponent: backoffSchedule(errorKind)
ImgComponent->>ImgComponent: schedule jittered retry
loop until success or budget exhausted
ImgComponent->>checkImage: runCheck(url)
end
alt retries exhausted
ImgComponent-->>User: phase failed + Retry button
end
end
User->>ImgComponent: click Retry or tab focus/visible
ImgComponent->>checkImage: runCheck(url) (new generation)
File-Level ChangesTips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The
LoadPhasestate never leaves'loading'on successful load (the<img>onLoad handler callssetPhase('loading')again and there is no'loaded'phase), sohideImagewill remain true and keep the image hidden; consider introducing a distinct success state or settingphaseto something non-hidden on successful load. - In
runCheck, the success path creates a blob URL but relies on the caller to revoke it; it might be safer to centralize blob URL lifecycle management (creation and revocation) in one place to avoid future leaks whencheckAppFlowyImageis reused.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `LoadPhase` state never leaves `'loading'` on successful load (the `<img>` onLoad handler calls `setPhase('loading')` again and there is no `'loaded'` phase), so `hideImage` will remain true and keep the image hidden; consider introducing a distinct success state or setting `phase` to something non-hidden on successful load.
- In `runCheck`, the success path creates a blob URL but relies on the caller to revoke it; it might be safer to centralize blob URL lifecycle management (creation and revocation) in one place to avoid future leaks when `checkAppFlowyImage` is reused.
## Individual Comments
### Comment 1
<location path="src/utils/image.ts" line_range="126-129" />
<code_context>
const img = new Image();
// Set a timeout to handle very slow loads
const timeoutId = setTimeout(() => {
- resolve({
- ok: false,
- status: 408,
- statusText: 'Request Timeout',
- error: 'Image loading timed out',
- });
+ resolve(errorResult(408, 'Request Timeout', 'not-ready', 'Image loading timed out'));
}, 10000); // 10 second timeout
</code_context>
<issue_to_address>
**suggestion (performance):** Timeouts for external image loads are classified as 'not-ready', which may cause overly aggressive fast retries.
In `validateImageLoad`, the timeout path currently resolves with:
```ts
resolve(errorResult(408, 'Request Timeout', 'not-ready', 'Image loading timed out'));
```
For external URLs, a timeout is more likely a slow/unstable host than an in-progress upload, so treating it as `'not-ready'` drives `backoffSchedule` to use `FAST_BACKOFF_MS` and retry too aggressively.
Consider classifying timeouts as `'network'` instead (or using `classifyHttpStatus(408)` after updating its mapping), so these failures follow the slower backoff path and avoid unnecessary pressure on third-party hosts.
```suggestion
// Set a timeout to handle very slow loads
// External timeouts are treated as 'network' failures so retries follow
// the slower backoff path rather than the aggressive 'not-ready' schedule.
const timeoutId = setTimeout(() => {
resolve(errorResult(408, 'Request Timeout', 'network', 'Image loading timed out'));
}, 10000); // 10 second timeout
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| // Set a timeout to handle very slow loads | ||
| const timeoutId = setTimeout(() => { | ||
| resolve({ | ||
| ok: false, | ||
| status: 408, | ||
| statusText: 'Request Timeout', | ||
| error: 'Image loading timed out', | ||
| }); | ||
| resolve(errorResult(408, 'Request Timeout', 'not-ready', 'Image loading timed out')); | ||
| }, 10000); // 10 second timeout |
There was a problem hiding this comment.
suggestion (performance): Timeouts for external image loads are classified as 'not-ready', which may cause overly aggressive fast retries.
In validateImageLoad, the timeout path currently resolves with:
resolve(errorResult(408, 'Request Timeout', 'not-ready', 'Image loading timed out'));For external URLs, a timeout is more likely a slow/unstable host than an in-progress upload, so treating it as 'not-ready' drives backoffSchedule to use FAST_BACKOFF_MS and retry too aggressively.
Consider classifying timeouts as 'network' instead (or using classifyHttpStatus(408) after updating its mapping), so these failures follow the slower backoff path and avoid unnecessary pressure on third-party hosts.
| // Set a timeout to handle very slow loads | |
| const timeoutId = setTimeout(() => { | |
| resolve({ | |
| ok: false, | |
| status: 408, | |
| statusText: 'Request Timeout', | |
| error: 'Image loading timed out', | |
| }); | |
| resolve(errorResult(408, 'Request Timeout', 'not-ready', 'Image loading timed out')); | |
| }, 10000); // 10 second timeout | |
| // Set a timeout to handle very slow loads | |
| // External timeouts are treated as 'network' failures so retries follow | |
| // the slower backoff path rather than the aggressive 'not-ready' schedule. | |
| const timeoutId = setTimeout(() => { | |
| resolve(errorResult(408, 'Request Timeout', 'network', 'Image loading timed out')); | |
| }, 10000); // 10 second timeout |
Previously, `checkAppFlowyImage` set `cache: 'no-store'` on every fetch
as defense-in-depth against stale negative cache entries. But that
disables the browser HTTP cache for *all* responses — including the
200s that the server now tags `Cache-Control: public, immutable,
max-age=31536000`. Result: every component mount re-downloaded the
image even though the server told the browser it could keep the bytes
for a year.
Switch to a per-attempt cache policy:
- First attempt: `cache: 'default'` — honor server headers. Successful
fetches enter the HTTP cache; the next mount of the same Img is free.
Failures (no-store from the server) still skip the cache, so no
poisoning risk.
- Retry attempts: `cache: 'reload'` — force a server round-trip to
defeat any stale cached response from a misbehaving proxy. Unlike
`no-store`, `reload` still lets the *new* response enter the cache,
so once we finally succeed we stop paying network for re-renders.
Adds a `CheckImageOptions { retry?: boolean }` parameter to `checkImage`
and threads it from the polling loop in Img.tsx (`{ retry: attempt > 0 }`).
Six fixes to Img.tsx + image.ts, in order of severity:
1. Image was hidden forever after successful load. `phase === 'loading'`
left `visibility: hidden` on the <img>, and the <img>'s own onLoad
handler also wrote 'loading'. Split into two state machines: `phase`
tracks the fetch lifecycle (loading|pending|failed), `isImageReady`
tracks the <img>'s decode lifecycle (set by native onLoad). Visibility
keys on `isImageReady` only.
2. Inline `onLoad={() => ...}` props from parent re-mounted on every
render churned `runCheck`'s identity, which made the URL-effect cancel
the in-flight retry chain and start fresh — so every parent re-render
re-fetched the image. Now `onLoad` is read through `onLoadRef` and
`runCheck` has a stable identity (deps: [cancelAllInflight, scheduleTimer]).
3. The visibilitychange/focus listener effect re-subscribed every time
phase or url changed. Now attached once; reads via `phaseRef`, `urlRef`.
4. The auth fetch in `image.ts::checkAppFlowyImage` ignored cancellation —
the polling loop checked refs between awaits, but the fetch itself ran
to completion. Added `signal?: AbortSignal` to `CheckImageOptions` and
plumbed an `AbortController` per `runCheck` invocation. AbortError is
classified separately so it doesn't log as a network error.
5. Untracked `setTimeout`s (pendingTimer, retry-delay) now go through a
`timersRef` Set. On cancellation `cancelAllInflight()` aborts every
controller and clears every pending timer in one call. Retry-delay
`sleep` resolves early when the abort signal fires.
6. Free perf wins on the rendered <img>:
- `loading="lazy"` — defer offscreen images until they near viewport
- `decoding="async"` — let the browser decode off-thread
- Added `onError` handler that flips to 'failed' on decode failures
(rare but previously silent).
Skipped from the review:
- Better `alt` text (would require parent prop plumbing; pre-existing).
- Extracting state machine into a `useImageWithRetry` hook (worthwhile
but a structural change; tracked for a follow-up).
Two follow-ups to the React best-practices review (PR comments #7, #8): **#8 — Extract `useImageWithRetry` hook.** The fetch state machine (backoff schedules, abort controllers, timer tracking, ref-stable listener attachment) was ~200 lines of imperative async logic awkwardly inlined in a presentation component. Moved to `src/components/editor/components/blocks/image/useImageWithRetry.ts`, which returns `{ src, phase, isImageReady, lastError, retry, onImageLoaded, onImageError }`. `Img.tsx` is now ~100 lines of pure rendering. The hook is unit-testable in isolation with a fetch mock, and the JSX no longer hides correctness concerns behind a wall of useCallback/useEffect plumbing. **#7 — Accessibility.** Pre-existing issue: every image rendered with `alt={''}`, which tells assistive tech "this is decorative" — wrong for content images in a document editor. `Img` now accepts an `alt` prop. When omitted (current callers), it derives a label from the URL filename — strips extension, percent-encoding, AppFlowy upload prefix, collapses separators. Falls back to "Embedded image" if nothing sensible can be extracted. Future work: thread an explicit `alt` from ImageBlockData once the data model gains a caption field. Also tightened the failure/loading overlays with `role="status"`, `aria-live="polite"`, `aria-label`, and `aria-hidden` on the decorative error icon — screen readers now announce loading state transitions instead of going silent. Behavior unchanged; lint + type-check clean. Image E2E tests not re-run in this commit (local docker stack down — port 8000 not listening); they passed against the previous commit on this branch and the refactor preserves all observable behavior.
Five small fixes from the React best-practices re-review: 1. Pending-timer false flash (#1): the loading→pending promotion timer could fire AFTER fetch success but BEFORE <img> decode, briefly showing "Waiting for upload to finish…" when the upload was in fact finished. Track a local `settled` flag for each run; the timer short-circuits once settled. 2. Rename `previousBlobUrlRef` → `currentBlobUrlRef` (#2). It holds the current rendered blob URL, not a previous one — name now matches meaning. Pure cosmetics, no behavior change. 3. Smarter alt-text fallback (#3): AppFlowy storage URLs end in opaque hashes (file_id) that produce nonsense alts like "bGTk 4nxVz" — no better than empty alt for screen-reader users. Short-circuit those URLs to a generic "Image" via `isAppFlowyFileStorageUrl`. Also reject non-word-looking segments from external URLs (heuristic: needs a vowel and 3+ chars). 4. Collapse three ref-mirror useEffects into one `useLatest` helper (#4), matching Vercel's `advanced-use-latest` pattern. One effect per ref instead of three; reusable if more refs are needed later. 5. Simplify hook signature (#5): `useImageWithRetry(url, onReady?)` instead of `useImageWithRetry(url, { onReady })`. Drops the per-render options-object allocation and the indirection inside the hook. Lint, type-check, and 5 image E2E tests pass.
Summary
User reports: images uploaded from desktop sometimes don't appear on web until the page is refreshed multiple times. This PR addresses the web client half of the problem (server-side cache-control fix is in AppFlowy-Cloud-Premium#TBD).
Root causes addressed
Unauthenticated fallback poisons the browser cache. When the authenticated
fetchto AppFlowy storage fails,checkImagefalls through tovalidateImageLoad— a plainnew Image()load with no auth headers. AppFlowy storage requires a Bearer token, so this fallback is guaranteed to 401/403 and the browser caches that failure under the URL. Subsequent polling retries short-circuit to the cached failure.Polling gives up after 30 s with no recovery.
Img.tsxpolls 5× with a 6 s interval, then shows a permanent "Image not found" error. If the upload pipeline takes longer than 30 s (common for first-render right after a desktop upload), the only fix is a page refresh — and Cloudflare's negative cache often extends the broken window past a single refresh.Changes
src/utils/image.tsvalidateImageLoadfallback for AppFlowy URLs — return the real status instead.CheckImageErrorKind:no-auth|auth-rejected|forbidden|not-ready|not-found|server-error|network|format. Lets callers pick a per-category backoff.cache: 'no-store'on the auth fetch as defense-in-depth.src/components/editor/components/blocks/image/Img.tsx[0.5s, 1s, 2s, 4s, 8s]fast /[2s, 5s, 10s, 20s, 40s, 80s]slow), ±20% jitter, ~5 min wall-clock budget.loading(spinner) →pendingafter 10 s (spinner + "waiting for upload…") →failed(manual retry button). The pending state matters because "we're still waiting" is very different from "this is broken."visibilitychange+focuslisteners re-check when the user returns to the tab inpendingorfailedstate — matches the user's mental model ("I came back and it appeared").Companion server PR
This is the consumer side. The producer side —
Cache-Control: no-storeon 4xx,425 Too Earlyfor the S3-vs-metadata race window, structuredwarn!logging — lives in AppFlowy-Cloud-Premium. The web client'snot-readyretry category exists specifically to consume the new 425 signal.Test plan
pnpm run lintcleanplaywright test playwright/e2e/embeded/image/— all 5 image tests pass (download, copy, toolbar hover, regression)🤖 Generated with Claude Code
Summary by Sourcery
Improve reliability and user feedback for loading editor images from AppFlowy storage by introducing typed image load errors and a more robust, long-lived retry flow.
New Features:
Bug Fixes:
Enhancements:
Tests: