Skip to content

Releases: PyThaiNLP/spaCy-PyThaiNLP

spaCy-PyThaiNLP v1.0

Choose a tag to compare

@wannaphong wannaphong released this 05 Feb 09:26
27ae9a9

What's Changed

  • Fix dependency parsing ValueError on variable-length CoNLL-U output by @Copilot in #5
  • Add type hints, docstrings, and improve code quality by @Copilot in #6
  • Enhance README with comprehensive examples and structured documentation by @Copilot in #7

New Contributors

  • @Copilot made their first contribution in #5

Full Changelog: v0.1...v1.0

spaCy-PyThaiNLP v0.1

Choose a tag to compare

@wannaphong wannaphong released this 03 Jan 15:29

spaCy-PyThaiNLP

This package wraps the PyThaiNLP library to add support Thai for spaCy.

Support List

  • Word segmentation
  • Part-of-speech
  • Named entity recognition
  • Sentence segmentation
  • Dependency parsing
  • Word vector

Install

pip install spacy-pythainlp

How to use

Example

import spacy
import spacy_pythainlp.core

nlp = spacy.blank("th")
# Segment the Doc into sentences
nlp.add_pipe(
   "pythainlp", 
)

data=nlp("ผมเป็นคนไทย   แต่มะลิอยากไปโรงเรียนส่วนผมจะไปไหน  ผมอยากไปเที่ยว")
print(list(list(data.sents)))
# output: [ผมเป็นคนไทย   แต่มะลิอยากไปโรงเรียนส่วนผมจะไปไหน  , ผมอยากไปเที่ยว]

You can config the setting in the nlp.add_pipe.

nlp.add_pipe(
    "pythainlp", 
    config={
        "pos_engine": "perceptron",
        "pos": True,
        "pos_corpus": "orchid_ud",
        "sent_engine": "crfcut",
        "sent": True,
        "ner_engine": "thainer",
        "ner": True,
        "tokenize_engine": "newmm",
        "tokenize": False,
        "dependency_parsing": False,
        "dependency_parsing_engine": "esupar",
        "dependency_parsing_model": None,
        "word_vector": True,
        "word_vector_model": "thai2fit_wv"
    }
)
  • tokenize: Bool (True or False) to change the word tokenize. (the default spaCy is newmm of PyThaiNLP)
  • tokenize_engine: The tokenize engine. You can read more: Options for engine
  • sent: Bool (True or False) to turn on the sentence tokenizer.
  • sent_engine: The sentence tokenizer engine. You can read more: Options for engine
  • pos: Bool (True or False) to turn on the part-of-speech.
  • pos_engine: The part-of-speech engine. You can read more: Options for engine
  • ner: Bool (True or False) to turn on the NER.
  • ner_engine: The NER engine. You can read more: Options for engine
  • dependency_parsing: Bool (True or False) to turn on the Dependency parsing.
  • dependency_parsing_engine: The Dependency parsing engine. You can read more: Options for engine
  • dependency_parsing_model: The Dependency parsing model. You can read more: Options for model
  • word_vector: Bool (True or False) to turn on the word vector.
  • word_vector_model: The word vector model. You can read more: Options for model

Note: If you turn on Dependency parsing, word segmentation and sentence segmentation are turn off to use word segmentation and sentence segmentation from Dependency parsing.

License

   Copyright 2016-2023 PyThaiNLP Project

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.

v0.1dev8

v0.1dev8 Pre-release
Pre-release

Choose a tag to compare

@wannaphong wannaphong released this 03 Jan 15:22

Full Changelog: v0.1dev7...v0.1dev8

v0.1dev7

v0.1dev7 Pre-release
Pre-release

Choose a tag to compare

@wannaphong wannaphong released this 03 Jan 06:14
f6fd89b

Full Changelog: v0.1dev6...v0.1dev7

v0.1dev6

v0.1dev6 Pre-release
Pre-release

Choose a tag to compare

@wannaphong wannaphong released this 01 Jan 14:51
  • Add Word vector

Full Changelog: v0.1dev5...v0.1dev6

v0.1dev5

v0.1dev5 Pre-release
Pre-release

Choose a tag to compare

@wannaphong wannaphong released this 01 Jan 05:33

Full Changelog: v0.1dev4...v0.1dev5

v0.1dev4

v0.1dev4 Pre-release
Pre-release

Choose a tag to compare

@wannaphong wannaphong released this 31 Dec 04:02

Full Changelog: v0.1dev3...v0.1dev4

v0.1dev3

v0.1dev3 Pre-release
Pre-release

Choose a tag to compare

@wannaphong wannaphong released this 30 Dec 17:12

Full Changelog: v0.1dev2...v0.1dev3

v0.1dev2

v0.1dev2 Pre-release
Pre-release

Choose a tag to compare

@wannaphong wannaphong released this 30 Dec 17:06

Full Changelog: v0.1dev1...v0.1dev2

v0.1dev1

v0.1dev1 Pre-release
Pre-release

Choose a tag to compare

@wannaphong wannaphong released this 30 Dec 17:00

Full Changelog: v0.1dev0...v0.1dev1