A Python library for converting Chinese characters to Wubi (五笔) input method encoding. Currently supports the 86-version scheme with a built-in dictionary of ~21,004 characters.
- Single-character encoding — convert individual Chinese characters to Wubi codes
- Phrase encoding — generate codes following Wubi phrase rules (2-char, 3-char, 4+ char)
- Multi-code query — return all possible encodings for a character
- Reverse lookup — find characters by Wubi code
- Brief code query — get the shortest code and its level (1st / 2nd / 3rd / full)
- Mixed text — automatically split Chinese and non-Chinese; punctuation is preserved as-is
- Zero dependencies — no third-party packages required
pip install pywubifrom pywubi import wubi
# Character-by-character (default)
wubi('我爱你')
# ['trnt', 'epdc', 'wqiy']
# Return all possible codes
wubi('我爱你', multicode=True)
# [['trnt', 'trn', 'q'], ['epdc', 'epd', 'ep'], ['wqiy', 'wqi', 'wq']]
# Phrase mode
wubi('我爱你', single=False)
# ['tewq']
# Mixed text — punctuation preserved
wubi('天气不错,出去走走!')
# ['gdi', 'rnb', 'gii', 'qajg', ',', 'bmt', 'fcu', 'tfht', 'tfht', '!']Convert a Chinese string to Wubi encodings.
| Parameter | Type | Default | Description |
|---|---|---|---|
hans |
str |
— | Chinese character string |
multicode |
bool |
False |
Return all possible codes |
single |
bool |
True |
True for char-by-char, False for phrase mode |
Returns: list — list of Wubi codes
Convert a single Chinese character to Wubi encoding.
| Parameter | Type | Default | Description |
|---|---|---|---|
han |
str |
— | A single Chinese character |
multicode |
bool |
False |
Return all possible codes |
Returns: str (single code) or list[str] (multiple codes)
Convert a phrase to Wubi encoding.
| Parameter | Type | Description |
|---|---|---|
hans |
str |
Chinese phrase |
Returns: str — Wubi code for the phrase
Encoding rules:
- 2-char phrase: first 2 codes of each character (4 codes total)
- 3-char phrase: 1st code of char 1 & 2 + first 2 codes of char 3 (4 codes total)
- 4+ char phrase: 1st code of char 1, 2, 3, and last (4 codes total)
Look up all Wubi codes for a single character.
from pywubi import lookup
lookup('为') # ['ylyi', 'yly', 'yl', 'o']
lookup('?') # []Reverse-lookup characters by Wubi code.
from pywubi import reverse_lookup
reverse_lookup('trnt') # ['我']
reverse_lookup('q') # ['我']
reverse_lookup('ggll') # ['一']Get the shortest (brief) code for a character.
from pywubi import brief_code
brief_code('我') # 'q'
brief_code('一') # 'g'
brief_code('?') # NoneGet the brief-code level (1 = 1st-level, 2 = 2nd-level, 3 = 3rd-level, 4 = full code).
from pywubi import brief_level
brief_level('我') # 1
brief_level('一') # 1
brief_level('〇') # 4
brief_level('?') # None# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest- Dictionary storage changed from Python source to JSON — faster loading, smaller size
- Added lazy-loading:
import pywubino longer loads the full dictionary immediately - Added
lookup()to query all codes for a character - Added
reverse_lookup()to find characters by code - Added
brief_code()to get the shortest code - Added
brief_level()to get the brief-code level - Added comprehensive unit tests
- Fixed
single_segbug where trailing non-Chinese characters were lost - Fixed typos (
utlis→utils,conbin_wubi→combine_wubi) - Switched to relative imports within the package
- Added type hints
- Added
.gitignore, removed.idea/from tracking - Fixed README typos
- Initial release
I am the owner of the PyPI account "sfyc23" and the maintainer of this repository: https://github.com/sfyc23/python-wubi
I am currently requesting account recovery for the PyPI project/package "pywubi".
This note is added to help PyPI administrators verify that I still control the source repository associated with the package.
GitHub profile: https://github.com/sfyc23
PyPI project: https://pypi.org/project/pywubi/
Date: 2026-03-30
MIT License — see LICENSE for details.