Skip to content

Bug: Handle Non Chinese simplified form in CJKRadicals-15.1.0.txt #10

@russcam

Description

@russcam

CJKRadicals-15.1.0.txt uses apostrophes after the radical number to indicate that the ideograph uses a standard simplification. From Unicode® Standard Annex #38 UNICODE HAN DATABASE (UNIHAN):

A single apostrophe indicates the Chinese simplified form of the radical (for example, U+9F7F 齿 for U+9F52 齒) and two apostrophes indicate the non-Chinese simplified form of the radical (for example, U+6B6F 歯 for U+9F52 齒).

The ProcessCjkRadicalsFile method handles the single apostrophe case, but throws on the two apostrophe case at

int radicalIndex = int.Parse(isSimplified ? radicalIndexText.Substring(0, radicalIndexText.Length - 1) : radicalIndexText);

Note also that the non-Chinese simplified form of the radical can have an empty CJK radical character if the CJK radical character is not included in the Kangxi Radicals block or the CJK Radicals Supplement block, so the following would also need to handle an empty character

char radicalCodePoint = checked((char)int.Parse(reader.ReadTrimmedField(), NumberStyles.HexNumber));

I'd be happy to add support for the non-Chinese simplified form. How would you prefer to represent an empty character on CjkRadicalData - as char?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions