pywubi — Chinese Character to Wubi Encoding

A Python library for converting Chinese characters to Wubi (五笔) input method encoding. Currently supports the 86-version scheme with a built-in dictionary of ~21,004 characters.

Features

Single-character encoding — convert individual Chinese characters to Wubi codes
Phrase encoding — generate codes following Wubi phrase rules (2-char, 3-char, 4+ char)
Multi-code query — return all possible encodings for a character
Reverse lookup — find characters by Wubi code
Brief code query — get the shortest code and its level (1st / 2nd / 3rd / full)
Mixed text — automatically split Chinese and non-Chinese; punctuation is preserved as-is
Zero dependencies — no third-party packages required

Installation

pip install pywubi

Quick Start

from pywubi import wubi

# Character-by-character (default)
wubi('我爱你')
# ['trnt', 'epdc', 'wqiy']

# Return all possible codes
wubi('我爱你', multicode=True)
# [['trnt', 'trn', 'q'], ['epdc', 'epd', 'ep'], ['wqiy', 'wqi', 'wq']]

# Phrase mode
wubi('我爱你', single=False)
# ['tewq']

# Mixed text — punctuation preserved
wubi('天气不错，出去走走!')
# ['gdi', 'rnb', 'gii', 'qajg', '，', 'bmt', 'fcu', 'tfht', 'tfht', '!']

API Reference

`wubi(hans, multicode=False, single=True)`

Convert a Chinese string to Wubi encodings.

Parameter	Type	Default	Description
`hans`	`str`	—	Chinese character string
`multicode`	`bool`	`False`	Return all possible codes
`single`	`bool`	`True`	`True` for char-by-char, `False` for phrase mode

Returns: list — list of Wubi codes

`single_wubi(han, multicode=False)`

Convert a single Chinese character to Wubi encoding.

Parameter	Type	Default	Description
`han`	`str`	—	A single Chinese character
`multicode`	`bool`	`False`	Return all possible codes

Returns: str (single code) or list[str] (multiple codes)

`combine_wubi(hans)`

Convert a phrase to Wubi encoding.

Parameter	Type	Description
`hans`	`str`	Chinese phrase

Returns: str — Wubi code for the phrase

Encoding rules:

2-char phrase: first 2 codes of each character (4 codes total)
3-char phrase: 1st code of char 1 & 2 + first 2 codes of char 3 (4 codes total)
4+ char phrase: 1st code of char 1, 2, 3, and last (4 codes total)

`lookup(char)`

Look up all Wubi codes for a single character.

from pywubi import lookup

lookup('为')   # ['ylyi', 'yly', 'yl', 'o']
lookup('?')    # []

`reverse_lookup(code)`

Reverse-lookup characters by Wubi code.

from pywubi import reverse_lookup

reverse_lookup('trnt')  # ['我']
reverse_lookup('q')     # ['我']
reverse_lookup('ggll')  # ['一']

`brief_code(char)`

Get the shortest (brief) code for a character.

from pywubi import brief_code

brief_code('我')  # 'q'
brief_code('一')  # 'g'
brief_code('?')   # None

`brief_level(char)`

Get the brief-code level (1 = 1st-level, 2 = 2nd-level, 3 = 3rd-level, 4 = full code).

from pywubi import brief_level

brief_level('我')  # 1
brief_level('一')  # 1
brief_level('〇')  # 4
brief_level('?')   # None

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

Changelog

0.2.0

Dictionary storage changed from Python source to JSON — faster loading, smaller size
Added lazy-loading: import pywubi no longer loads the full dictionary immediately
Added lookup() to query all codes for a character
Added reverse_lookup() to find characters by code
Added brief_code() to get the shortest code
Added brief_level() to get the brief-code level
Added comprehensive unit tests

0.1.0

Fixed single_seg bug where trailing non-Chinese characters were lost
Fixed typos (utlis → utils, conbin_wubi → combine_wubi)
Switched to relative imports within the package
Added type hints
Added .gitignore, removed .idea/ from tracking
Fixed README typos

0.0.2

Initial release

PyPI Account Verification

I am the owner of the PyPI account "sfyc23" and the maintainer of this repository: https://github.com/sfyc23/python-wubi

I am currently requesting account recovery for the PyPI project/package "pywubi".

This note is added to help PyPI administrators verify that I still control the source repository associated with the package.

GitHub profile: https://github.com/sfyc23
PyPI project: https://pypi.org/project/pywubi/
Date: 2026-03-30

License

MIT License — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
pywubi		pywubi
resource		resource
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pywubi — Chinese Character to Wubi Encoding

Features

Installation

Quick Start

API Reference

`wubi(hans, multicode=False, single=True)`

`single_wubi(han, multicode=False)`

`combine_wubi(hans)`

`lookup(char)`

`reverse_lookup(code)`

`brief_code(char)`

`brief_level(char)`

Development

Changelog

0.2.0

0.1.0

0.0.2

PyPI Account Verification

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

pywubi — Chinese Character to Wubi Encoding

Features

Installation

Quick Start

API Reference

wubi(hans, multicode=False, single=True)

single_wubi(han, multicode=False)

combine_wubi(hans)

lookup(char)

reverse_lookup(code)

brief_code(char)

brief_level(char)

Development

Changelog

0.2.0

0.1.0

0.0.2

PyPI Account Verification

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 1

Languages

`wubi(hans, multicode=False, single=True)`

`single_wubi(han, multicode=False)`

`combine_wubi(hans)`

`lookup(char)`

`reverse_lookup(code)`

`brief_code(char)`

`brief_level(char)`

Packages