You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|[And](min.md)|None |All input scores must be within the threshold. Selects the minimum score. |
18
-
|[Average](average.md)|None |Computes the weighted average. |
19
-
|[Euclidian distance](quadraticMean.md)|None |Calculates the Euclidian distance. |
20
-
|[First non-empty score](firstNonEmpty.md)|None |Forwards the first input that provides a non-empty similarity score. |
21
-
|[Geometric mean](geometricMean.md)|None |Compute the (weighted) geometric mean. |
22
-
|[Handle missing values](handleMissingValues.md)|None |Generates a default similarity score, if no similarity score is provided (e.g., due to missing values). Using this operator can have a performance impact, since it lowers the efficiency of the underlying computation. |
23
-
|[Negate](negate.md)|None |Negates the result of the input comparison. A single input is expected. Using this operator can have a performance impact, since it lowers the efficiency of the underlying computation. |
24
-
|[Or](max.md)|None |At least one input score must be within the threshold. Selects the maximum score. |
25
-
|[Scale](scale.md)|None |Scales a similarity score by a factor. |
15
+
| Name | Description |
16
+
|-------------:|:-------------------------|
17
+
|[And](min.md)| All input scores must be within the threshold. Selects the minimum score. |
18
+
|[Average](average.md)| Computes the weighted average. |
19
+
|[Euclidian distance](quadraticMean.md)| Calculates the Euclidian distance. |
20
+
|[First non-empty score](firstNonEmpty.md)| Forwards the first input that provides a non-empty similarity score. |
21
+
|[Geometric mean](geometricMean.md)| Compute the (weighted) geometric mean. |
22
+
|[Handle missing values](handleMissingValues.md)| Generates a default similarity score, if no similarity score is provided (e.g., due to missing values). Using this operator can have a performance impact, since it lowers the efficiency of the underlying computation. |
23
+
|[Negate](negate.md)| Negates the result of the input comparison. A single input is expected. Using this operator can have a performance impact, since it lowers the efficiency of the underlying computation. |
24
+
|[Or](max.md)| At least one input score must be within the threshold. Selects the maximum score. |
25
+
|[Scale](scale.md)| Scales a similarity score by a factor. |
|[Alignment](alignment.md)|None |Writes the alignment format specified at http://alignapi.gforge.inria.fr/format.html.|
18
-
|[Avro](avro.md)|None |Read from or write to an Apache Avro file. |
19
-
|[Binary file](binaryFile.md)|None |Reads and writes binary files. A typical use-case for this dataset is to process PDF documents or images. |
20
-
|[CSV](csv.md)|None |Read from or write to an CSV file. |
21
-
|[Excel](excel.md)|None |Read from or write to an Excel workbook in Open XML format (XLSX). |
22
-
|[Excel (Google Drive)](googlespreadsheet.md)|None |Read data from a remote Google Spreadsheet. |
23
-
|[Excel (OneDrive, Office365)](office365preadsheet.md)|None |Read data from a remote onedrive or Office365 Spreadsheet. |
24
-
|[Hive database](Hive.md)|None |Read from or write to an embedded Apache Hive endpoint. |
25
-
|[In-memory dataset](inMemory.md)|None |A Dataset that holds all data in-memory. |
26
-
|[Internal dataset](internal.md)|None |Dataset for storing entities between workflow steps. The underlying dataset type can be configured using the `dataset.internal.*` configuration parameters. |
27
-
|[Internal dataset (single graph)](LocalInternalDataset.md)|None |Dataset for storing entities between workflow steps. This variant does use the same graph for all internal datasets in a workflow. The underlying dataset type can be configured using the `dataset.internal.*` configuration parameters. |
28
-
|[JDBC endpoint](Jdbc.md)|None |Connect to an existing JDBC endpoint. |
29
-
|[JSON](json.md)|None |Read from or write to a JSON or JSON Lines file. |
30
-
|[Knowledge Graph](eccencaDataPlatform.md)|None |Read RDF from or write RDF to a Knowledge Graph embedded in Corporate Memory. |
31
-
|[Multi CSV ZIP](multiCsv.md)|None |Reads from or writes to multiple CSV files from/to a single ZIP file. |
32
-
|[Neo4j](neo4j.md)|None |Neo4j graph |
33
-
|[ORC](orc.md)|None |Read from or write to an Apache ORC file. |
34
-
|[Parquet](parquet.md)|None |Read from or write to an Apache Parquet file. |
35
-
|[RDF](file.md)|None |Dataset which retrieves and writes all entities from/to an RDF file. For reading, the dataset is loaded in-memory and thus the size is restricted by the available memory. Large datasets should be loaded into an external RDF store and retrieved using the SPARQL dataset instead. |
36
-
|[Snowflake JDBC endpoint](SnowflakeJdbc.md)|None |Connect to Snowflake JDBC endpoint. |
37
-
|[SparkSQL view](sparkView.md)|None |Use the SQL endpoint dataset instead. |
38
-
|[SPARQL endpoint](sparqlEndpoint.md)|None |Connect to an existing SPARQL endpoint. |
39
-
|[SQL endpoint](sqlEndpoint.md)|None |Provides a JDBC endpoint that exposes workflow or transformation results as tables, which can be queried using SQL. |
40
-
|[Text](text.md)|None |Reads and writes plain text files. |
41
-
|[XML](xml.md)|None |Read from or write to an XML file. |
15
+
| Name | Description |
16
+
|-------------:|:-------------------------|
17
+
|[Alignment](alignment.md)| Writes the alignment format specified at http://alignapi.gforge.inria.fr/format.html.|
18
+
|[Avro](avro.md)| Read from or write to an Apache Avro file. |
19
+
|[Binary file](binaryFile.md)| Reads and writes binary files. A typical use-case for this dataset is to process PDF documents or images. |
20
+
|[CSV](csv.md)| Read from or write to an CSV file. |
21
+
|[Excel](excel.md)| Read from or write to an Excel workbook in Open XML format (XLSX). |
22
+
|[Excel (Google Drive)](googlespreadsheet.md)| Read data from a remote Google Spreadsheet. |
23
+
|[Excel (OneDrive, Office365)](office365preadsheet.md)| Read data from a remote onedrive or Office365 Spreadsheet. |
24
+
|[Hive database](Hive.md)| Read from or write to an embedded Apache Hive endpoint. |
25
+
|[In-memory dataset](inMemory.md)| A Dataset that holds all data in-memory. |
26
+
|[Internal dataset](internal.md)| Dataset for storing entities between workflow steps. The underlying dataset type can be configured using the `dataset.internal.*` configuration parameters. |
27
+
|[Internal dataset (single graph)](LocalInternalDataset.md)| Dataset for storing entities between workflow steps. This variant does use the same graph for all internal datasets in a workflow. The underlying dataset type can be configured using the `dataset.internal.*` configuration parameters. |
28
+
|[JDBC endpoint](Jdbc.md)| Connect to an existing JDBC endpoint. |
29
+
|[JSON](json.md)| Read from or write to a JSON or JSON Lines file. |
30
+
|[Knowledge Graph](eccencaDataPlatform.md)| Read RDF from or write RDF to a Knowledge Graph embedded in Corporate Memory. |
31
+
|[Multi CSV ZIP](multiCsv.md)| Reads from or writes to multiple CSV files from/to a single ZIP file. |
32
+
|[Neo4j](neo4j.md)| Neo4j graph |
33
+
|[ORC](orc.md)| Read from or write to an Apache ORC file. |
34
+
|[Parquet](parquet.md)| Read from or write to an Apache Parquet file. |
35
+
|[RDF](file.md)| Dataset which retrieves and writes all entities from/to an RDF file. For reading, the dataset is loaded in-memory and thus the size is restricted by the available memory. Large datasets should be loaded into an external RDF store and retrieved using the SPARQL dataset instead. |
36
+
|[Snowflake JDBC endpoint](SnowflakeJdbc.md)| Connect to Snowflake JDBC endpoint. |
37
+
|[SparkSQL view](sparkView.md)| Use the SQL endpoint dataset instead. |
38
+
|[SPARQL endpoint](sparqlEndpoint.md)| Connect to an existing SPARQL endpoint. |
39
+
|[SQL endpoint](sqlEndpoint.md)| Provides a JDBC endpoint that exposes workflow or transformation results as tables, which can be queried using SQL. |
40
+
|[Text](text.md)| Reads and writes plain text files. |
41
+
|[XML](xml.md)| Read from or write to an XML file. |
|[Compare physical quantities](PhysicalQuantitiesDistance.md)|None |Computes the distance between two physical quantities. The distance is normalized to the SI base unit of the dimension. For instance for lengths, the distance will be in metres. Comparing incompatible units will yield a validation error. |
19
-
|[Constant similarity value](constantDistance.md)|None |Always returns a constant similarity value. |
|[Geographical distance](wgs84.md)|None |Computes the geographical distance between two points. Author: Konrad Höffner (MOLE subgroup of Research Group AKSW, University of Leipzig) |
25
-
|[Greater than](greaterThan.md)|None |Checks if the source value is greater than the target value. If both strings are numbers, numerical order is used for comparison. Otherwise, alphanumerical order is used. |
26
-
|[Inequality](inequality.md)|None |Returns success if values are not equal, failure otherwise. |
27
-
|[Inside numeric interval](insideNumericInterval.md)|None |Checks if a number is contained inside a numeric interval, such as '1900 - 2000'. |
28
-
|[Is substring](isSubstring.md)|None |Checks if a source value is a substring of a target value. |
29
-
|[Jaccard](jaccard.md)|None |Jaccard similarity coefficient. Divides the matching tokens by the number of distinct tokens from both inputs. |
30
-
|[Jaro distance](jaro.md)|None |Matches strings based on the Jaro distance metric. |
31
-
|[Jaro-Winkler distance](jaroWinkler.md)|None |Matches strings based on the Jaro-Winkler distance measure. |
|[Korean translit distance](koreanTranslitDistance.md)|None |Transliterated Korean distance. |
34
-
|[Levenshtein distance](levenshteinDistance.md)|None |Levenshtein distance. Returns a distance value between zero and the size of the string. |
35
-
|[Lower than](lowerThan.md)|None |Checks if the source value is lower than the target value. |
36
-
|[Normalized Levenshtein distance](levenshtein.md)|None |Normalized Levenshtein distance. Divides the edit distance by the length of the longer string. |
37
-
|[Numeric equality](numericEquality.md)|None |Compares values numerically instead of their string representation as the 'String Equality' operator does. Allows to set the needed precision of the comparison. A value of 0.0 means that the values must represent exactly the same (floating point) value, values higher than that allow for a margin of tolerance. |
38
-
|[Numeric similarity](num.md)|None |Computes the numeric distance between two numbers. |
39
-
|[qGrams](qGrams.md)|None |String similarity based on q-grams (by default q=2). |
40
-
|[Relaxed equality](relaxedEquality.md)|None |Return success if strings are equal, failure otherwise. Lower/upper case and differences like ö/o, n/ñ, c/ç etc. are treated as equal. |
41
-
|[Soft Jaccard](softjaccard.md)|None |Soft Jaccard similarity coefficient. Same as Jaccard distance but values within an levenhstein distance of 'maxDistance' are considered equivalent. |
42
-
|[Starts with](startsWith.md)|None |Returns success if the first string starts with the second string, failure otherwise. |
43
-
|[String equality](equality.md)|None |Checks for equality of the string representation of the given values. Returns success if string values are equal, failure otherwise. For a numeric comparison of values use the 'Numeric Equality' comparator. |
44
-
|[Substring comparison](substringDistance.md)|None |Return 0 to 1 for strong similarity to weak similarity. Based on the paper: Stoilos, Giorgos, Giorgos Stamou, and Stefanos Kollias. "A string metric for ontology alignment." The Semantic Web-ISWC 2005. Springer Berlin Heidelberg, 2005. 624-637. |
45
-
|[Token-wise distance](tokenwiseDistance.md)|None |Token-wise string distance using the specified metric. |
|[Compare physical quantities](PhysicalQuantitiesDistance.md)| Computes the distance between two physical quantities. The distance is normalized to the SI base unit of the dimension. For instance for lengths, the distance will be in metres. Comparing incompatible units will yield a validation error. |
19
+
|[Constant similarity value](constantDistance.md)| Always returns a constant similarity value. |
20
+
|[Cosine](cosine.md)| Cosine Distance Measure. |
21
+
|[Date](date.md)| The distance in days between two dates ('YYYY-MM-DD' format). |
22
+
|[DateTime](dateTime.md)| Distance between two date time values (xsd:dateTime format) in seconds. |
|[Geographical distance](wgs84.md)| Computes the geographical distance between two points. Author: Konrad Höffner (MOLE subgroup of Research Group AKSW, University of Leipzig) |
25
+
|[Greater than](greaterThan.md)| Checks if the source value is greater than the target value. If both strings are numbers, numerical order is used for comparison. Otherwise, alphanumerical order is used. |
26
+
|[Inequality](inequality.md)| Returns success if values are not equal, failure otherwise. |
27
+
|[Inside numeric interval](insideNumericInterval.md)| Checks if a number is contained inside a numeric interval, such as '1900 - 2000'. |
28
+
|[Is substring](isSubstring.md)| Checks if a source value is a substring of a target value. |
29
+
|[Jaccard](jaccard.md)| Jaccard similarity coefficient. Divides the matching tokens by the number of distinct tokens from both inputs. |
30
+
|[Jaro distance](jaro.md)| Matches strings based on the Jaro distance metric. |
31
+
|[Jaro-Winkler distance](jaroWinkler.md)| Matches strings based on the Jaro-Winkler distance measure. |
32
+
|[Korean phoneme distance](koreanPhonemeDistance.md)| Korean phoneme distance. |
33
+
|[Korean translit distance](koreanTranslitDistance.md)| Transliterated Korean distance. |
34
+
|[Levenshtein distance](levenshteinDistance.md)| Levenshtein distance. Returns a distance value between zero and the size of the string. |
35
+
|[Lower than](lowerThan.md)| Checks if the source value is lower than the target value. |
36
+
|[Normalized Levenshtein distance](levenshtein.md)| Normalized Levenshtein distance. Divides the edit distance by the length of the longer string. |
37
+
|[Numeric equality](numericEquality.md)| Compares values numerically instead of their string representation as the 'String Equality' operator does. Allows to set the needed precision of the comparison. A value of 0.0 means that the values must represent exactly the same (floating point) value, values higher than that allow for a margin of tolerance. |
38
+
|[Numeric similarity](num.md)| Computes the numeric distance between two numbers. |
39
+
|[qGrams](qGrams.md)| String similarity based on q-grams (by default q=2). |
40
+
|[Relaxed equality](relaxedEquality.md)| Return success if strings are equal, failure otherwise. Lower/upper case and differences like ö/o, n/ñ, c/ç etc. are treated as equal. |
41
+
|[Soft Jaccard](softjaccard.md)| Soft Jaccard similarity coefficient. Same as Jaccard distance but values within an levenhstein distance of 'maxDistance' are considered equivalent. |
42
+
|[Starts with](startsWith.md)| Returns success if the first string starts with the second string, failure otherwise. |
43
+
|[String equality](equality.md)| Checks for equality of the string representation of the given values. Returns success if string values are equal, failure otherwise. For a numeric comparison of values use the 'Numeric Equality' comparator. |
44
+
|[Substring comparison](substringDistance.md)| Return 0 to 1 for strong similarity to weak similarity. Based on the paper: Stoilos, Giorgos, Giorgos Stamou, and Stefanos Kollias. "A string metric for ontology alignment." The Semantic Web-ISWC 2005. Springer Berlin Heidelberg, 2005. 624-637. |
45
+
|[Token-wise distance](tokenwiseDistance.md)| Token-wise string distance using the specified metric. |
{% for plugin in plugins if not plugin.is_deprecated %} | [{{plugin.title}}]({{plugin.main_category + "/" if plugin.main_category else ""}}{{plugin.pluginId}}.md) | {{plugin.main_category}} | {{plugin.description}} |
1
+
| Name | Description |
2
+
|-------------:|:-------------------------|
3
+
{% for plugin in plugins if not plugin.is_deprecated %} | [{{plugin.title}}]({{plugin.pluginId}}.md) | {{plugin.description}} |
{% for plugin in plugins if not plugin.is_deprecated %} | [{{plugin.title}}]({{plugin.main_category + "/" if plugin.main_category else ""}}{{plugin.pluginId}}.md) | {{plugin.main_category}} | {{plugin.description}} |
0 commit comments