Skip to content

Commit 46986c5

Browse files
committed
Merge branch 'master' of https://github.com/tdebatty/java-LSH
Conflicts: src/main/java/info/debatty/java/lsh/LSHMinHash.java
2 parents 134577d + afe3db2 commit 46986c5

2 files changed

Lines changed: 5 additions & 1 deletion

File tree

README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,14 +7,16 @@ Locality Sensitive Hashing (LSH) is a family of hashing methods that tent to pro
77

88
LSH functions have two main use cases:
99
* Compute the signature of large input vectors. These signatures can be used to quickly estimate the similarity between vectors.
10-
* With a given number of buekcts, bin similar vectors together.
10+
* With a given number of buckets, bin similar vectors together.
1111

1212
This library implements Locality Sensitive Hashing (LSH), as described in Leskovec, Rajaraman & Ullman (2014), "Mining of Massive Datasets", Cambridge University Press.
1313

1414
Are currently implemented:
1515
* MinHash algorithm for Jaccard index;
1616
* Super-Bit algorithm for cosine similarity.
1717

18+
The coeficients of hashing functions are randomly choosen when the LSH object is instantiated. You can thus only compare signatures or bucket binning generated by the same LSH object. To reuse your LSH object between executions, you have to serialize it and save it to a file (see below the [example of LSH object serialization](https://github.com/tdebatty/java-LSH#serialization)).
19+
1820
##Download
1921

2022
Using maven:

src/main/java/info/debatty/java/lsh/LSHMinHash.java

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,8 @@
2424

2525
package info.debatty.java.lsh;
2626

27+
import java.util.Set;
28+
2729
/**
2830
*
2931
* @author Thibault Debatty

0 commit comments

Comments
 (0)