🤖 core: Render Arabic / RTL text in TextField#23625
Conversation
Library::default_font was returning at most one font per chain because of a stale `// TODO: Return multiple fonts when it's needed.` `break`. The downstream FontSet treats element 0 as the main font and the rest as fallbacks, so discarding the rest meant glyph misses on the primary font had nowhere to fall through to even when the caller had configured a multi-name chain via `set_default_font(_, vec![a, b, ...])`. Both passes (exact match, then compatible match) now collect the full chain. The compatible-match pass deduplicates against the exact-match pass by FontDescriptor so the same font isn't listed twice. Single-element chains behave identically to before, so no caller that configured one font per default sees a behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| /// fall through to it — in particular the Arabic Presentation Forms that | ||
| /// `font::evaluate` produces after reshaping base Arabic letters. | ||
| #[cfg(feature = "default_font")] | ||
| pub const FALLBACK_DEVICE_FONT_ARABIC: &[u8] = |
There was a problem hiding this comment.
We don't want to support Arabic in the fallback font—as the name suggests, it's a fallback, you should provide Arabic fonts yourself if you need them, the default font has to be small in order to be included everywhere.
This should get resolved automatically when we finish implementing canvas font renderer though.
| /// Arabic strings). Bundled separately from the main Noto Sans fallback so | ||
| /// the ~76 KB Arabic glyph table is only carried when the `default_font` | ||
| /// feature is enabled. Registered as a second device font and appended to |
There was a problem hiding this comment.
Bundled separately from the main Noto Sans fallback so the ~76 KB Arabic glyph table is only carried when the
default_fontfeature is enabled.
Bundled separately from the main fallback so the Arabic glyph table is only carried... exactly when the main fallback is?
Arabic in any path that flows through `font::FontLike::evaluate` (plain
TextField, EditText, FTE TextLine since `text_block::create_text_line`
synthesises an EditText, and TLF since it uses FTE) was unreadable
because:
- `font::evaluate` walks codepoints one by one through ttf_parser's raw
cmap lookup, with no GSUB shaping. Even when a font ships isolated
forms in cmap (Tahoma, Segoe UI, etc.) the text appeared as detached
letters in logical (LTR) order.
- Hebrew, Syriac, Thaana, N'Ko etc. didn't need shaping but suffered
the same logical-order problem.
This is a pragmatic fix that does not introduce a full shaping engine:
modern Arabic-aware fonts carry the Arabic Presentation Forms-A and -B
blocks (U+FB50..U+FDFF, U+FE70..U+FEFF) in their cmap as a compatibility
encoding for exactly this case. The joined initial/medial/final/isolated
shapes are addressable directly by codepoint, so reshaping base Arabic
letters into Presentation Forms makes them resolvable through the
existing cmap-only lookup.
* core/src/font.rs: new `maybe_reshape_rtl`, called from `evaluate`
before the per-codepoint loop. Detects RTL codepoints; for runs with
base Arabic letters, runs `arabic_reshaper::arabic_reshape` to map
each base letter to its Presentation Forms-B codepoint (and produce
the lam-alef ligatures from Forms-A). Runs `unicode_bidi::BidiInfo`
and reverses RTL runs into visual order, mirroring paired ASCII
punctuation. Returns None (fast path) for text without RTL
codepoints, so the vast majority of content pays only an O(n) scan.
Returns None for FontType::Embedded — embedded SWF fonts typically
ship only the base Arabic block, so substituting Presentation Forms
there would resolve to nothing. Skips re-running the joiner on text
that is already in Presentation Forms (some SWF-side helpers emit
those directly; re-shaping corrupts them).
* core/Cargo.toml: adds `arabic_reshaper = "0.4.2"` and
`unicode-bidi = "0.3.18"`. unicode-bidi was already an indirect
dependency via the desktop egui chrome.
The actual font with Presentation Forms in its cmap is not bundled. On
desktop the OS fonts (Tahoma, Segoe UI on Windows; equivalents
elsewhere) cover them. On web, host pages can register an Arabic font
themselves via the `addFont` + `setDefaultFont` JS APIs — the
fallback-chain fix in the previous commit makes a multi-font default
chain actually fall through on glyph misses.
Out of scope for this change, deliberately kept small:
* No real OpenType shaper (rustybuzz). The right long-term answer but
requires reworking Glyph/GlyphSource to be keyed by glyph ID rather
than `char`.
* No GPOS positioning of combining marks (harakat). Vowel marks still
get positive advance and lay out as separate spacing characters.
* No RTL paragraph alignment. Lines of Arabic now read correctly but a
default-aligned (left) paragraph stays left-anchored. That's a
layout-engine concern, not a font concern.
* `pos` in `evaluate`'s callback no longer round-trips for shaped
runs. Cursor positioning, hit testing and selection highlighting in
`display_object::edit_text` will be off for Arabic spans. Editing
Arabic in a TextField is rare; preserving correctness here would
need a source<->shaped position map that arabic_reshaper doesn't
expose.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5562f1d to
d82f0f9
Compare
|
Thanks for the review @kjarosh — agreed on both points. Pushed a force-update that drops the bundled Noto Sans Arabic, the build script and the player.rs registration. The PR is now just the shaping/bidi logic plus the Rationale for keeping the rest:
Also slimmed the doc comments to match the surrounding file's density (the long block on |
Description
Arabic in any path that flows through
font::FontLike::evaluate(plainTextField,EditText, FTETextLinesincetext_block::create_text_linesynthesises anEditText, and TLF since it uses FTE) is unreadable today:font::evaluatewalks codepoints one by one throughttf_parser's raw cmap lookup, with no GSUB shaping. Even when a font ships isolated forms in cmap (Tahoma, Segoe UI, etc.) the text appeared as detached letters in logical (LTR) order.This is a pragmatic fix that does not introduce a full shaping engine. Modern Arabic-aware fonts carry the Arabic Presentation Forms-A and -B blocks in their cmap as a compatibility encoding for exactly this case — the joined initial/medial/final/isolated shapes are addressable directly by codepoint. Reshaping base Arabic letters into Presentation Forms makes them resolvable through the existing cmap-only lookup.
Two commits, both load-bearing
core: Collect every font in the default_font fallback chain—Library::default_fonthad a stale// TODO: Return multiple fonts when it's needed.break;that returned at most one font per chain. The downstreamFontSettreats element 0 as the main font and the rest as fallbacks, so discarding the rest meant glyph misses on the primary font had nowhere to fall through to even when the caller had configured a multi-name chain viaset_default_font(_, vec![a, b, ...]). Single-element chains behave identically to before — no caller that configured one font per default sees a behavior change. This is a real latent bug independent of the Arabic work but the Arabic work needs it (so a host page's Arabic font becomes a real fallback).core: Reshape Arabic and reorder RTL runs in font::evaluate— the rendering fix. Newmaybe_reshape_rtlincore/src/font.rsruns fromevaluatebefore the per-codepoint loop. For text without RTL codepoints it's a single linear scan that returnsNoneand the existing fast path runs unchanged. For RTL text it reshapes Arabic viaarabic_reshaper, runsunicode_bidi, mirrors paired ASCII punctuation in RTL runs. ReturnsNoneforFontType::Embedded— embedded SWF fonts typically ship only the base Arabic block, so substituting Presentation Forms there would resolve to nothing. Skips re-running the joiner on text already in Presentation Forms. Addsarabic_reshaper = "0.4.2"andunicode-bidi = "0.3.18"tocore/Cargo.toml(unicode-bidi was already an indirect dependency).Where Arabic-bearing glyphs come from
addFont+setDefaultFontJS APIs). Thelibrary.rschain fix above is what makes a multi-name default chain actually fall through on glyph misses.This PR does not bundle a font itself — that would inflate the universal fallback for every user. (An earlier revision of this PR did bundle Noto Sans Arabic as a separate
FALLBACK_DEVICE_FONT_ARABICconst; dropped on review.)Out of scope (deliberate)
Glyph/GlyphSourceto be keyed by glyph ID rather thanchar— much larger change.left) paragraph stays left-anchored. That's a layout-engine concern (core/src/html/layout.rs), not a font concern.posno longer round-trips for shaped runs. Theposargumentevaluatepasses to its glyph callback is the byte offset into the reshapedWString, not the sourceWStr. Cursor positioning, hit testing and selection highlighting indisplay_object::edit_textwill be off for Arabic spans. Editing Arabic in aTextFieldis rare in practice; preserving correctness here would need a source↔shaped position map thatarabic-reshaperdoesn't expose.Transparency for non-RTL content
The
evaluatechange is two lines:maybe_reshape_rtlreturnsNonefor any text without RTL codepoints, and the rest ofevaluateruns against the originaltext. So English/Latin/CJK content goes through one extraO(n)codepoint scan and otherwise hits the existing path identically.Testing
Verified against
safari2025.swffrom cdn.safariislandsgame.com (Cocolani), which uses TLF withDirection.RTLand feeds raw logical-order Arabic from a runtimeLanguagetable. Before: empty rectangles where Arabic should be. After: correctly joined, right-to-left Arabic on desktop (using OS Tahoma/Segoe UI) and on web with a host-page-supplied Arabic font.cargo fmt --allclean.cargo clippy -p ruffle_core --features default_font --testsproduces zero warnings.I have not added an automated visual regression test in
tests/tests/swfs/. Happy to add one if reviewers point me at an existing visual-text fixture I can adapt.Notes for reviewers
Why
arabic_reshaperover rolling our own joining table? The crate is a Python-port and weighs trivially in the WASM (~30 KB). The alternative (in-tree joining table) would duplicate Unicode property data we'd then need to keep current.The
break;inlibrary.rs::default_fontis a real latent bug, not just a fix in service of this PR. Anyone who configuresset_default_fontwith a multi-name chain onmastertoday is silently losing every name after the first. The TODO comment from the original author suggests this was planned work; this PR is the smallest patch that does it.Checklist