Skip to content

Fuseji lookup (eg. マ○ドナ○ド) #2437

Open
noatdk wants to merge 11 commits into
yomidevs:masterfrom
noatdk:fuseji-lookup
Open

Fuseji lookup (eg. マ○ドナ○ド) #2437
noatdk wants to merge 11 commits into
yomidevs:masterfrom
noatdk:fuseji-lookup

Conversation

@noatdk
Copy link
Copy Markdown

@noatdk noatdk commented Jun 3, 2026

Summary

Adds support for looking up fuseji (伏せ字) (per #2436).

マ◯ド◯ルド    → マクドナルド
打〇込む      → 打ち込む
〇ち込む      → 打ち込む   (leading mask)
打ち込〇      → 打ち込む   (trailing mask)

Screenshots

Screenshot 2026-06-04 at 17 17 27

||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||越えられない壁||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

image

||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||越えられない壁||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

image image image
[fuseji] anchor "マ" (prefix) skip-scan steps expr=323 read=300; 171 survivors; scan 56.7ms, gets 6.0ms
[fuseji] "マ○ド」 は おそらく 「マッド" via prefix "マ": 171 match(es) in 64.30ms

[fuseji] anchor "マ" (prefix) skip-scan steps expr=337 read=315; 172 survivors; scan 31.3ms, gets 6.0ms
[fuseji] "マ○ドナ○ドよりモスバーガ○のが" via prefix "マ": 172 match(es) in 37.70ms

[fuseji] anchor "マ" (prefix) skip-scan steps expr=2840 read=2720; 1031 survivors; scan 362.9ms, gets 34.0ms
[fuseji] "マ〇〇ド」 は おそらく 「マイ" via prefix "マ": 1031 match(es) in 399.90ms

[fuseji] anchor "お" (prefix) skip-scan steps expr=633 read=272; 256 survivors; scan 59.1ms, gets 8.6ms
[fuseji] "お◯子" via prefix "お": 256 match(es) in 69.10ms

[fuseji] anchor "お" (prefix) skip-scan steps expr=1954 read=3431; 1183 survivors; scan 342.9ms, gets 35.3ms
[fuseji] "お◯◯こ◯◯の中に文字を入 .." via prefix "お": 1183 match(es) in 382.50ms

...
[fuseji] anchor "ちゃん" (suffix) skip-scan steps expr=2120 read=1987; 1092 survivors; scan 232.1ms, gets 34.0ms
[fuseji] "〇〇ちゃんとお ...Read" via suffix "ちゃん": 1092 match(es) in 276.70ms

...
[fuseji] anchor "ちゃん" (suffix) skip-scan steps expr=2120 read=1987; 1092 survivors; scan 283.4ms, gets 63.5ms
[fuseji] "〇〇ちゃん•君」と呼ぶのは普通な" via suffix "ちゃん": 1092 match(es) in 348.60ms

Chrome, macOS, only PixivLight installed

How it works

Triggers are single-character wildcards. Since IndexedDB only does prefix scans, the lookup is anchored on the unmasked text:

  1. Anchor on the unmasked run before the first trigger (prefix scan), or if the text starts with a trigger after the last trigger (suffix scan via the reversed indices).
  2. Scan for matches via a skip-scan (loose index scan) driven by the masked pattern: it seeks straight to the required char at literal positions, lets the cursor enumerate only the chars that actually occur at mask positions, and skips subtrees that cant match, so cost scales with distinct mask-position characters + matches, not the size of the (low-selectivity) anchor range. Suffix anchors run the same scan over the reversed indices.
  3. Build only the surviving records into entries, matching each against the pattern to record the matched length (code-point matcher, one wildcard per trigger).
skip_scan_flow_rect.mp4

Settings

Two new rows under Advanced translation settings:

  • Enable fuseji lookup - translation.enableFusejiLookup (toggle, default false)
  • Fuseji trigger characters - translation.fusejiTriggers (text, default ◯○〇●)

Trigger characters act as one-character wildcards during lookup; the default set covers the common circle variants used as masks.

Limitations

  • Consecutive masks before a literal (e.g. マ◯◯ド) are slower and noisy: every word that fits the all-masked span counts as a partial match, so each extra adjacent trigger multiplies the number of results (and the time to build them).
  • Deflection skipped. When the mask covers the stem (e.g. ◯んでる for 死んでる), the visible tail (んでる) is not a dictionary-form headword and the anchor we look up is the stripped tail, so it cant resolve to 死ぬ. Supporting it would require deinflecting the visible tail and reconstructing the pattern (◯んでる → ◯ぬ/◯む) before lookup.

@noatdk noatdk marked this pull request as ready for review June 4, 2026 13:09
@noatdk noatdk requested a review from a team as a code owner June 4, 2026 13:09
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2a09aec3e7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread ext/js/language/translator.js
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant