You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You can customize the PyThaiNLP pipeline component by passing a configuration dictionary to `nlp.add_pipe()`:
161
+
37
162
```python
38
163
nlp.add_pipe(
39
-
"pythainlp",
164
+
"pythainlp",
40
165
config={
41
166
"pos_engine": "perceptron",
42
167
"pos": True,
@@ -56,21 +181,39 @@ nlp.add_pipe(
56
181
)
57
182
```
58
183
59
-
- tokenize: Bool (True or False) to change the word tokenize. (the default spaCy is newmm of PyThaiNLP)
60
-
- tokenize_engine: The tokenize engine. You can read more: [Options for engine](https://pythainlp.github.io/docs/3.1/api/tokenize.html#pythainlp.tokenize.word_tokenize)
61
-
- sent: Bool (True or False) to turn on the sentence tokenizer.
62
-
- sent_engine: The sentence tokenizer engine. You can read more: [Options for engine](https://pythainlp.github.io/docs/3.1/api/tokenize.html#pythainlp.tokenize.sent_tokenize)
63
-
- pos: Bool (True or False) to turn on the part-of-speech.
64
-
- pos_engine: The part-of-speech engine. You can read more: [Options for engine](https://pythainlp.github.io/docs/3.1/api/tag.html#pythainlp.tag.pos_tag)
65
-
- ner: Bool (True or False) to turn on the NER.
66
-
- ner_engine: The NER engine. You can read more: [Options for engine](https://pythainlp.github.io/docs/3.1/api/tag.html#pythainlp.tag.NER)
67
-
- dependency_parsing: Bool (True or False) to turn on the Dependency parsing.
68
-
- dependency_parsing_engine: The Dependency parsing engine. You can read more: [Options for engine](https://pythainlp.github.io/docs/3.1/api/parse.html#pythainlp.parse.dependency_parsing)
69
-
- dependency_parsing_model: The Dependency parsing model. You can read more: [Options for model](https://pythainlp.github.io/docs/3.1/api/parse.html#pythainlp.parse.dependency_parsing)
70
-
- word_vector: Bool (True or False) to turn on the word vector.
71
-
- word_vector_model: The word vector model. You can read more: [Options for model](https://pythainlp.github.io/docs/3.1/api/word_vector.html#pythainlp.word_vector.WordVector)
72
-
73
-
**Note: If you turn on Dependency parsing, word segmentation and sentence segmentation are turn off to use word segmentation and sentence segmentation from Dependency parsing.**
184
+
### Configuration Options
185
+
186
+
| Parameter | Type | Default | Description |
187
+
|-----------|------|---------|-------------|
188
+
|`tokenize`|`bool`|`False`| Enable/disable word tokenization (spaCy uses PyThaiNLP's newmm by default) |
|`word_vector`|`bool`|`True`| Enable/disable word vectors |
201
+
|`word_vector_model`|`str`|`"thai2fit_wv"`| Word vector model. [See options](https://pythainlp.github.io/docs/3.1/api/word_vector.html#pythainlp.word_vector.WordVector)|
202
+
203
+
**Important Notes:**
204
+
- When `dependency_parsing` is enabled, word segmentation and sentence segmentation are automatically disabled to use the tokenization from the dependency parser.
205
+
- All configuration options are optional and have sensible defaults.
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
0 commit comments