Draft
Conversation
Member
Member
|
@copilot Example: |
Agent-Logs-Url: https://github.com/PyThaiNLP/pythainlp/sessions/de31de74-3c85-4d05-bd45-6f719e129da4 Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
Member
|
@copilot Include model files to |
Agent-Logs-Url: https://github.com/PyThaiNLP/pythainlp/sessions/de31de74-3c85-4d05-bd45-6f719e129da4 Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Add ThaiG2P v3 model file to PyThaiNLP
Add ThaiG2P v3 ONNX engine to transliterate
Apr 4, 2026
Agent-Logs-Url: https://github.com/PyThaiNLP/pythainlp/sessions/95ab0043-37c8-4570-94e3-92d0467ac853 Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
Contributor
Author
Done in commit 7749340. The three model files ( |
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



ThaiG2P v3 is a char-level Transformer G2P model exported to ONNX, offering improved training data over v2 with a smaller model footprint.
What do these changes do
pythainlp/corpus/thaig2p_v3_encoder.onnx— Bundled encoder ONNX modelpythainlp/corpus/thaig2p_v3_decoder.onnx— Bundled decoder ONNX modelpythainlp/corpus/thaig2p_v3_vocab.json— Bundled vocabulary (character-to-index mapping)pythainlp/corpus/default_db.json— Added entries for the three bundled model filespythainlp/transliterate/thaig2p_v3.py— New module: ONNX inference viaonnxruntimewith greedy decoder loop; loads model files frompythainlp/corpusviaget_corpus_path()pythainlp/transliterate/core.py— Addedthaig2p_v3engine dispatch + docstring entrytests/extra/testx_transliterate.py— Addedthaig2p_v3test cases alongside existing v2/umt5 testsdocs/api/transliterate.rst— Documented the new engineCHANGELOG.md— Logged under[Unreleased]Usage
What was wrong
ThaiG2P v3 existed upstream (
wannaphong/thai-g2p-v3) but was not integrated into PyThaiNLP.How this fixes it
Adds
thaig2p_v3as a first-classtransliterate()engine. Unlike v2 (HuggingFace Transformers pipeline), v3 usesonnxruntimedirectly, matching the ONNX-first design of the upstream model. The three model files (thaig2p_v3_encoder.onnx,thaig2p_v3_decoder.onnx,thaig2p_v3_vocab.json) are bundled inpythainlp/corpus/and registered indefault_db.json, consistent with howthai2rom_onnxis packaged. The module loads them viaget_corpus_path()with no custom download logic required.Your checklist for this pull request