Cosine similarity between documents (rows) - spark
I have a spark task to compute the similarity between text documents:
RowMatrix rowMatrix = new RowMatrix(vectorsRDD.rdd());
CoordinateMatrix rowsimilarity=rowMatrix.columnSimilarities(0.5);
JavaRDD<MatrixEntry> entries = rowsimilarity.entries().toJavaRDD();
List<MatrixEntry> list = entries.collect();
for(MatrixEntry s : list) System.out.println(s);
MatrixEntry (i, j, value) represents the similarities between columns (say, features of documents). But how can I show the similarity between the lines? Let's say I have five documents Doc1, .... Doc5. We would like to show the similarities between all of these documents. How do we know this? any help?
+3
source to share
No one has answered this question yet
See similar questions:
or similar: