Cosine similarity between documents (rows) - spark

I have a spark task to compute the similarity between text documents:

RowMatrix rowMatrix = new RowMatrix(vectorsRDD.rdd());      
CoordinateMatrix  rowsimilarity=rowMatrix.columnSimilarities(0.5);
JavaRDD<MatrixEntry> entries = rowsimilarity.entries().toJavaRDD();

List<MatrixEntry> list = entries.collect();

for(MatrixEntry s : list) System.out.println(s);

      

MatrixEntry (i, j, value) represents the similarities between columns (say, features of documents). But how can I show the similarity between the lines? Let's say I have five documents Doc1, .... Doc5. We would like to show the similarities between all of these documents. How do we know this? any help?

+3


source to share





All Articles