pywubi — Chinese Character to Wubi Encoding

中文文档

A Python library for converting Chinese characters to Wubi (五笔) input method encoding. Currently supports the 86-version scheme with a built-in dictionary of ~21,004 characters.

Features

Single-character encoding — convert individual Chinese characters to Wubi codes
Phrase encoding — generate codes following Wubi phrase rules (2-char, 3-char, 4+ char)
Multi-code query — return all possible encodings for a character
Reverse lookup — find characters by Wubi code
Fuzzy reverse lookup — use z in place of unknown radicals to guess characters
Brief code query — get the shortest code and its level (1st / 2nd / 3rd / full)
Mixed text — automatically split Chinese and non-Chinese; punctuation is preserved as-is
Zero dependencies — no third-party packages required

Installation

pip install pywubi

Quick Start

from pywubi import wubi

# Character-by-character (default)
wubi('我爱你')
# ['trnt', 'epdc', 'wqiy']

# Return all possible codes
wubi('我爱你', multicode=True)
# [['trnt', 'trn', 'q'], ['epdc', 'epd', 'ep'], ['wqiy', 'wqi', 'wq']]

# Phrase mode
wubi('我爱你', single=False)
# ['tewq']

# Mixed text — punctuation preserved
wubi('天气不错，出去走走!')
# ['gdi', 'rnb', 'gii', 'qajg', '，', 'bmt', 'fcu', 'tfht', 'tfht', '!']

API Reference

`wubi(hans, multicode=False, single=True)`

Convert a Chinese string to Wubi encodings.

Parameter	Type	Default	Description
`hans`	`str`	—	Chinese character string
`multicode`	`bool`	`False`	Return all possible codes
`single`	`bool`	`True`	`True` for char-by-char, `False` for phrase mode

Returns: list — list of Wubi codes

`single_wubi(han, multicode=False)`

Convert a single Chinese character to Wubi encoding.

Parameter	Type	Default	Description
`han`	`str`	—	A single Chinese character
`multicode`	`bool`	`False`	Return all possible codes

Returns: str (single code) or list[str] (multiple codes)

`combine_wubi(hans)`

Convert a phrase to Wubi encoding.

Parameter	Type	Description
`hans`	`str`	Chinese phrase

Returns: str — Wubi code for the phrase

Encoding rules:

2-char phrase: first 2 codes of each character (4 codes total)
3-char phrase: 1st code of char 1 & 2 + first 2 codes of char 3 (4 codes total)
4+ char phrase: 1st code of char 1, 2, 3, and last (4 codes total)

`lookup(char)`

Look up all Wubi codes for a single character.

from pywubi import lookup

lookup('为')   # ['ylyi', 'yly', 'yl', 'o']
lookup('?')    # []

`reverse_lookup(code)`

Reverse-lookup characters by Wubi code.

from pywubi import reverse_lookup

reverse_lookup('trnt')  # ['我']
reverse_lookup('q')     # ['我']
reverse_lookup('ggll')  # ['一']

`fuzzy_reverse_lookup(code, limit=10)`

Fuzzy reverse-lookup characters by Wubi code; use z for unknown radical keys.

Wubi 86 only uses keys a-y; z is naturally unused and serves as a wildcard matching any radical key. When the input contains no z, it behaves the same as an exact reverse lookup. Input length determines the matched code length.

Parameter	Type	Default	Description
`code`	`str`	—	Wubi code; use `z`/`Z` for unknown positions
`limit`	`int`	`10`	Max results to return; `0` for unlimited

Returns: list[tuple[str, str]] — [(character, matched_code), ...] sorted by code

from pywubi import fuzzy_reverse_lookup

fuzzy_reverse_lookup('vz')       # [('姑', 'vd'), ('灵', 'vo'), ...]
fuzzy_reverse_lookup('zzzg')     # only last key is 'g', find all 4-code chars ending in g
fuzzy_reverse_lookup('trnt')     # no z — degrades to exact reverse lookup
fuzzy_reverse_lookup('zz', limit=5)  # limit to 5 results

`brief_code(char)`

Get the shortest (brief) code for a character.

from pywubi import brief_code

brief_code('我')  # 'q'
brief_code('一')  # 'g'
brief_code('?')   # None

`brief_level(char)`

Get the brief-code level (1 = 1st-level, 2 = 2nd-level, 3 = 3rd-level, 4 = full code).

from pywubi import brief_level

brief_level('我')  # 1
brief_level('一')  # 1
brief_level('〇')  # 4
brief_level('?')   # None

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

Changelog

0.2.0

Dictionary storage changed from Python source to JSON — faster loading, smaller size
Added lazy-loading: import pywubi no longer loads the full dictionary immediately
Added lookup() to query all codes for a character
Added reverse_lookup() to find characters by code
Added brief_code() to get the shortest code
Added brief_level() to get the brief-code level
Added comprehensive unit tests

0.1.0

Fixed single_seg bug where trailing non-Chinese characters were lost
Fixed typos (utlis → utils, conbin_wubi → combine_wubi)
Switched to relative imports within the package
Added type hints
Added .gitignore, removed .idea/ from tracking
Fixed README typos

0.0.2

Initial release

PyPI Account Verification

I am the owner of the PyPI account "sfyc23" and the maintainer of this repository: https://github.com/sfyc23/python-wubi

I am currently requesting account recovery for the PyPI project/package "pywubi".

This note is added to help PyPI administrators verify that I still control the source repository associated with the package.

GitHub profile: https://github.com/sfyc23
PyPI project: https://pypi.org/project/pywubi/
Date: 2026-03-30

License

MIT License — see LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pywubi — Chinese Character to Wubi Encoding

Features

Installation

Quick Start

API Reference

`wubi(hans, multicode=False, single=True)`

`single_wubi(han, multicode=False)`

`combine_wubi(hans)`

`lookup(char)`

`reverse_lookup(code)`

`fuzzy_reverse_lookup(code, limit=10)`

`brief_code(char)`

`brief_level(char)`

Development

Changelog

0.2.0

0.1.0

0.0.2

PyPI Account Verification

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

pywubi — Chinese Character to Wubi Encoding

Features

Installation

Quick Start

API Reference

wubi(hans, multicode=False, single=True)

single_wubi(han, multicode=False)

combine_wubi(hans)

lookup(char)

reverse_lookup(code)

fuzzy_reverse_lookup(code, limit=10)

brief_code(char)

brief_level(char)

Development

Changelog

0.2.0

0.1.0

0.0.2

PyPI Account Verification

License

`wubi(hans, multicode=False, single=True)`

`single_wubi(han, multicode=False)`

`combine_wubi(hans)`

`lookup(char)`

`reverse_lookup(code)`

`fuzzy_reverse_lookup(code, limit=10)`

`brief_code(char)`

`brief_level(char)`