Skip to content

Commit bedeee0

Browse files
committed
re add
1 parent 019d303 commit bedeee0

1 file changed

Lines changed: 81 additions & 0 deletions

File tree

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
---
2+
title: "Remove remote stop words"
3+
description: "Removes stop words based on a stop word list remote URL."
4+
icon: octicons/cross-reference-24
5+
tags:
6+
- TransformOperator
7+
---
8+
# Remove remote stop words
9+
<!-- This file was generated - DO NOT CHANGE IT MANUALLY -->
10+
11+
12+
13+
The stop word list is retrieved from a remote URL such as
14+
[this German stop word list](https://raw.githubusercontent.com/stopwords-iso/stopwords-de/refs/heads/master/stopwords-de.txt).
15+
16+
Such an overridable stop word list file may be used, for instance, to specify the stop words of a different
17+
language, such as German instead of the
18+
[default stop word list](https://gist.githubusercontent.com/rg089/35e00abf8941d72d419224cfd5b5925d/raw/12d899b70156fd0041fa9778d657330b024b959c/stopwords.txt)
19+
for the English language.
20+
21+
Regardless of the stop word list used, the following comments apply:
22+
23+
* Each line in the stop word list should contain a single stop word.
24+
* The removal of stop words is case-insensitive. For example, 'The' and 'the' are considered the same.
25+
* In the case of German words, notice that the upper-case letter of the lower-case 'ß' is 'ẞ', not 'SS'.
26+
* The separator defines a regular expression (regex) that is used for detecting words.
27+
* By default, the separator is a regular expression for non-whitespace characters.
28+
29+
Additionally, notice the simpler filter 'removeDefaultStopWords', which uses a default stop word list.
30+
31+
## Examples
32+
33+
**Notation:** List of values are represented via square brackets. Example: `[first, second]` represents a list of two values "first" and "second".
34+
35+
---
36+
**Example 1:**
37+
38+
* Input values:
39+
1. `[To be or not to be, that is the question]`
40+
41+
* Returns: `[To, question]`
42+
43+
44+
---
45+
**Example 2:**
46+
47+
* Input values:
48+
1. `[It always seems impossible, until it's done]`
49+
50+
* Returns: `[It impossible, ]`
51+
52+
53+
54+
55+
## Parameter
56+
57+
### Stop word list url
58+
59+
URL of the stop word list
60+
61+
- ID: `stopWordListUrl`
62+
- Datatype: `string`
63+
- Default Value: `https://gist.githubusercontent.com/rg089/35e00abf8941d72d419224cfd5b5925d/raw/12d899b70156fd0041fa9778d657330b024b959c/stopwords.txt`
64+
65+
66+
67+
### Separator
68+
69+
RegEx for detecting words
70+
71+
- ID: `separator`
72+
- Datatype: `string`
73+
- Default Value: `[\s-]+`
74+
75+
76+
77+
78+
79+
## Advanced Parameter
80+
81+
`None`

0 commit comments

Comments
 (0)