Skip to content
This repository was archived by the owner on Oct 8, 2020. It is now read-only.

Commit ddc2cf4

Browse files
committed
Merge branch 'release/0.3.0'
2 parents 671678d + fe643a2 commit ddc2cf4

55 files changed

Lines changed: 9246 additions & 4190 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

README.md

Lines changed: 13 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,21 @@
11
# SANSA-ML
2+
[![Maven Central](https://maven-badges.herokuapp.com/maven-central/net.sansa-stack/sansa-ml-parent_2.11/badge.svg)](https://maven-badges.herokuapp.com/maven-central/net.sansa-stack/sansa-ml-parent_2.11)
3+
[![Build Status](https://ci.aksw.org/jenkins/job/SANSA-ML/job/develop/badge/icon)](https://ci.aksw.org/jenkins/job/SANSA-ML//job/develop/)
4+
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
5+
[![Twitter](https://img.shields.io/twitter/follow/SANSA_Stack.svg?style=social)](https://twitter.com/SANSA_Stack)
26

3-
SANSA-ML is the Machine Learning (ML) library in the SANSA stack (see http://sansa-stack.net). Algorithms in this repository perform various machine learning tasks directly on [RDF](https://en.wikipedia.org/wiki/Resource_Description_Framework)/[OWL](https://en.wikipedia.org/wiki/Web_Ontology_Language) input data. While most machine learning algorithms are based on processing simple features, the machine learning algorithms in SANSA-ML exploit the graph structure and semantics of the background knowledge specified using the RDF and OWL standards. In many cases, this allows to obtain either more accurate or more human-understandable results. In contrast to most other algorithms supporting background knowledge, they scale horizontally using [Apache Spark](https://spark.apache.org).
7+
SANSA-ML is the Machine Learning (ML) library in the SANSA stack (see http://sansa-stack.net). Algorithms in this repository perform various machine learning tasks directly on [RDF](https://en.wikipedia.org/wiki/Resource_Description_Framework)/[OWL](https://en.wikipedia.org/wiki/Web_Ontology_Language) input data. While most machine learning algorithms are based on processing simple features, the machine learning algorithms in SANSA-ML exploit the graph structure and semantics of the background knowledge specified using the RDF and OWL standards. In many cases, this allows to obtain either more accurate or more human-understandable results. In contrast to most other algorithms supporting background knowledge, they scale horizontally using [Apache Spark](https://spark.apache.org) and [Apache Flink](https://flink.apache.org).
48

59
The ML layer currently supports the following algorithms:
6-
* RDF graph clustering
10+
* RDF graph clustering (Power Iteration, Border Flow, Link based clustering, Modularity based clustering, Silvia Link Clustering)
711
* Rule mining in RDF graphs based on [AMIE+](https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/amie/)
12+
* Semantic similarity measures (Jaccard similarity,Rodríguez and Egenhofer similarity, Tversky Ratio Model, Batet Similarity)
13+
* Knowledge graph embedding approaches:
14+
* TransE (beta status)
15+
* DistMult (beta status)
16+
* Terminological Decision Trees for the classification of concepts(beta status)
17+
* Anomaly detection (beta status)
818

9-
Usage example for clusting:
10-
```scala
11-
RDFByModularityClustering(sparkSession.sparkContext, numIterations, input, output)
12-
```
13-
14-
Please see https://github.com/SANSA-Stack/SANSA-Examples/tree/master/sansa-examples-spark/src/main/scala/net/sansa_stack/examples/spark/ml for further examples.
19+
Please see https://github.com/SANSA-Stack/SANSA-Examples/tree/master/sansa-examples-spark/src/main/scala/net/sansa_stack/examples/spark/ml for examples on how to use the above machine learning approaches.
1520

1621
Several further algorithms are in development. Please create a pull request and/or contact [Jens Lehmann](http://jens-lehmann.org) if you are interested in contributing algorithms to SANSA-ML.
17-
18-
Support for [Apache Flink](https://flink.apache.org) is planned in future releases.

pom.xml

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
<modelVersion>4.0.0</modelVersion>
55
<groupId>net.sansa-stack</groupId>
66
<artifactId>sansa-ml-parent_2.11</artifactId>
7-
<version>0.2.0</version>
7+
<version>0.3.0</version>
88
<packaging>pom</packaging>
99
<name>ML API - Parent</name>
1010
<description>RDF/OWL Machine Learning Library for Big Data</description>
@@ -49,19 +49,19 @@
4949
<module>sansa-ml-common</module>
5050
<module>sansa-ml-flink</module>
5151
<module>sansa-ml-spark</module>
52-
<module>sansa-ml-tests</module>
5352
</modules>
5453

5554
<properties>
5655
<maven.compiler.source>1.8</maven.compiler.source>
5756
<maven.compiler.target>1.8</maven.compiler.target>
5857
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
59-
<scala.version>2.11.8</scala.version>
58+
<scala.version>2.11.11</scala.version>
6059
<scala.binary.version>2.11</scala.binary.version>
61-
<spark.version>2.1.1</spark.version>
62-
<flink.version>1.3.0</flink.version>
63-
<jena.version>3.1.1</jena.version>
64-
<sansa.version>0.2.0</sansa.version>
60+
<spark.version>2.2.1</spark.version>
61+
<spark.binary.version>2.2</spark.binary.version>
62+
<flink.version>1.4.0</flink.version>
63+
<jena.version>3.5.0</jena.version>
64+
<sansa.version>0.3.0-SNAPSHOT</sansa.version>
6565
</properties>
6666

6767
<dependencyManagement>
@@ -79,6 +79,7 @@
7979
<artifactId>sansa-rdf-spark-core_${scala.binary.version}</artifactId>
8080
<version>${sansa.version}</version>
8181
</dependency>
82+
8283
<!-- SANSA OWL -->
8384
<dependency>
8485
<groupId>net.sansa-stack</groupId>

sansa-ml-common/pom.xml

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,9 @@
44
<parent>
55
<artifactId>sansa-ml-parent_2.11</artifactId>
66
<groupId>net.sansa-stack</groupId>
7-
<version>0.2.0</version>
7+
<version>0.3.0</version>
88
</parent>
9-
<groupId>net.sansa-stack</groupId>
109
<artifactId>sansa-ml-common_2.11</artifactId>
11-
<version>0.2.0</version>
1210
<name>ML API - Common</name>
1311
<description>Common objects for the SANSA Machine Learning Layer</description>
1412
<inceptionYear>2016</inceptionYear>
@@ -17,14 +15,12 @@
1715
<dependency>
1816
<groupId>org.scala-lang</groupId>
1917
<artifactId>scala-library</artifactId>
20-
<version>${scala.version}</version>
2118
</dependency>
2219

2320
<!-- Test -->
2421
<dependency>
2522
<groupId>junit</groupId>
2623
<artifactId>junit</artifactId>
27-
<version>4.11</version>
2824
<scope>test</scope>
2925
</dependency>
3026
<dependency>
@@ -49,7 +45,6 @@
4945
<!-- see http://davidb.github.com/scala-maven-plugin -->
5046
<groupId>net.alchim31.maven</groupId>
5147
<artifactId>scala-maven-plugin</artifactId>
52-
<version>3.2.0</version>
5348
<executions>
5449
<execution>
5550
<goals>
@@ -68,7 +63,6 @@
6863
<plugin>
6964
<groupId>org.apache.maven.plugins</groupId>
7065
<artifactId>maven-surefire-plugin</artifactId>
71-
<version>2.18.1</version>
7266
<configuration>
7367
<useFile>false</useFile>
7468
<disableXmlReport>true</disableXmlReport>

sansa-ml-flink/pom.xml

Lines changed: 1 addition & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,9 @@
55
<parent>
66
<artifactId>sansa-ml-parent_2.11</artifactId>
77
<groupId>net.sansa-stack</groupId>
8-
<version>0.2.0</version>
8+
<version>0.3.0</version>
99
</parent>
10-
<groupId>net.sansa-stack</groupId>
1110
<artifactId>sansa-ml-flink_2.11</artifactId>
12-
<version>0.2.0</version>
1311
<name>ML API - Apache Flink</name>
1412
<description>RDF/OWL Machine Learning Library for Apache Flink</description>
1513

@@ -22,7 +20,6 @@
2220
<dependency>
2321
<groupId>org.scala-lang</groupId>
2422
<artifactId>scala-library</artifactId>
25-
<version>${scala.version}</version>
2623
</dependency>
2724

2825
<!-- Apache Flink -->
@@ -59,7 +56,6 @@
5956
<dependency>
6057
<groupId>junit</groupId>
6158
<artifactId>junit</artifactId>
62-
<version>4.11</version>
6359
<scope>test</scope>
6460
</dependency>
6561
<dependency>
@@ -102,7 +98,6 @@
10298
<plugin>
10399
<groupId>org.apache.maven.plugins</groupId>
104100
<artifactId>maven-surefire-plugin</artifactId>
105-
<version>2.19.1</version>
106101
<configuration>
107102
<useFile>false</useFile>
108103
<disableXmlReport>true</disableXmlReport>

sansa-ml-spark/pom.xml

Lines changed: 39 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,16 @@
55
<parent>
66
<artifactId>sansa-ml-parent_2.11</artifactId>
77
<groupId>net.sansa-stack</groupId>
8-
<version>0.2.0</version>
8+
<version>0.3.0</version>
99
</parent>
10-
<groupId>net.sansa-stack</groupId>
1110
<artifactId>sansa-ml-spark_2.11</artifactId>
12-
<version>0.2.0</version>
1311
<name>ML API - Apache Spark</name>
1412
<description>RDF/OWL Machine Learning Library for Apache Spark</description>
1513

14+
<properties>
15+
<hadoop.version>2.7.0</hadoop.version>
16+
</properties>
17+
1618
<dependencies>
1719

1820
<!-- ML Common -->
@@ -38,7 +40,6 @@
3840
<dependency>
3941
<groupId>org.scala-lang</groupId>
4042
<artifactId>scala-library</artifactId>
41-
<version>${scala.version}</version>
4243
</dependency>
4344

4445
<!-- Apache Spark Core -->
@@ -47,6 +48,7 @@
4748
<artifactId>spark-core_${scala.binary.version}</artifactId>
4849
<version>${spark.version}</version>
4950
</dependency>
51+
5052
<!-- Apache Spark SQL -->
5153
<dependency>
5254
<groupId>org.apache.spark</groupId>
@@ -62,10 +64,28 @@
6264

6365
<!-- HermiT reasoner -->
6466
<dependency>
65-
<groupId>com.hermit-reasoner</groupId>
67+
<groupId>net.sourceforge.owlapi</groupId>
6668
<artifactId>org.semanticweb.hermit</artifactId>
67-
<version>1.3.8.1</version>
69+
<version>1.3.8.510</version>
70+
</dependency>
71+
72+
<!-- Hadoop dependencies (mainly used for InputFormat definitions) -->
73+
<dependency>
74+
<groupId>org.apache.hadoop</groupId>
75+
<artifactId>hadoop-mapreduce-client-core</artifactId>
76+
<version>${hadoop.version}</version>
77+
</dependency>
78+
<dependency>
79+
<groupId>org.apache.hadoop</groupId>
80+
<artifactId>hadoop-common</artifactId>
81+
<version>${hadoop.version}</version>
6882
</dependency>
83+
<dependency>
84+
<groupId>org.apache.hadoop</groupId>
85+
<artifactId>hadoop-streaming</artifactId>
86+
<version>${hadoop.version}</version>
87+
</dependency>
88+
6989
<!-- Apache JENA 3.x -->
7090
<dependency>
7191
<groupId>org.apache.jena</groupId>
@@ -78,12 +98,18 @@
7898
<artifactId>scopt_${scala.binary.version}</artifactId>
7999
<version>3.5.0</version>
80100
</dependency>
101+
102+
<!-- BigDL Library -->
103+
<dependency>
104+
<groupId>com.intel.analytics.bigdl</groupId>
105+
<artifactId>bigdl-SPARK_${spark.binary.version}</artifactId>
106+
<version>0.3.0</version>
107+
</dependency>
81108

82109
<!-- Test -->
83110
<dependency>
84111
<groupId>junit</groupId>
85112
<artifactId>junit</artifactId>
86-
<version>4.8.1</version>
87113
<scope>test</scope>
88114
</dependency>
89115
<dependency>
@@ -105,6 +131,11 @@
105131
<artifactId>scopt_${scala.binary.version}</artifactId>
106132
</dependency>
107133

134+
<dependency>
135+
<groupId>org.springframework</groupId>
136+
<artifactId>spring</artifactId>
137+
<version>2.5.6.SEC03</version>
138+
</dependency>
108139
</dependencies>
109140

110141
<build>
@@ -126,7 +157,6 @@
126157
<plugin>
127158
<groupId>org.apache.maven.plugins</groupId>
128159
<artifactId>maven-surefire-plugin</artifactId>
129-
<version>2.19.1</version>
130160
<configuration>
131161
<useFile>false</useFile>
132162
<disableXmlReport>true</disableXmlReport>
@@ -165,7 +195,7 @@
165195
<plugin>
166196
<groupId>org.codehaus.mojo</groupId>
167197
<artifactId>build-helper-maven-plugin</artifactId>
168-
<version>1.7</version>
198+
<version>1.8</version>
169199
<executions>
170200
<!-- Add src/main/scala to eclipse build path -->
171201
<execution>

0 commit comments

Comments
 (0)