Botok

🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python

Features
Prerequisites
Installation
Configuration
Usage
Development
Testing
Contributing
How to get help
Terms of use

Features

Tibetan word tokenization using statistical and rule-based methods
Support for both classical and modern Tibetan
Integration with PyBo (Python Buddhist)
Customizable segmentation rules -字典-based word lookup

Prerequisites

Python 3.8+
pip

Installation

# Clone the repository
git clone https://github.com/OpenPecha/Botok.git
cd Botok

# Install dependencies
pip install -r requirements.txt

# Install the package
pip install -e .

Configuration

Botok can be configured via:

Environment variables
YAML configuration files in config/
Python API

Usage

import botok

# Create a tokenizer instance
tokenizer = botok.Tok()

# Tokenize Tibetan text
text = "བོད་ཡིག་གི་དཔེ་ཆ་"
tokens = tokenizer.tokenize(text)
print(tokens)

Development

# Install dev dependencies
pip install -e .[dev]

# Run tests
pytest

# Lint
flake8 botok/

Testing

pytest tests/

Contributing

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Open a Pull Request

Please read CONTRIBUTING.md for details.

How to get help

File an issue.
Join our discord.

Terms of use

Botok is licensed under the Apache-2.0 License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Botok

Table of Contents

Features

Prerequisites

Installation

Configuration

Usage

Development

Testing

Contributing

How to get help

Terms of use

Uh oh!

FilesExpand file tree

openclaw_documentation_README.md

Latest commit

History

openclaw_documentation_README.md

File metadata and controls

Botok

Table of Contents

Features

Prerequisites

Installation

Configuration

Usage

Development

Testing

Contributing

How to get help

Terms of use