N gram scala conversion of the output set
def ngrams(n: Int, words: Array[String]) = {
// exclude 1-grams
(1 to n).map { i => words.sliding(i).toStream }
.foldLeft(Stream[Array[String]]()) {
(a, b) => a #::: b
} }
scala> val op2 = ngrams(3, "how are you".split(" ")).foreach { x => println((x.mkString(" ")))}
Output as :
how
are
you
how are
are you
how are you
op2: Unit = ()
How to avoid the above Unit value , actually I want to convert them to Set, because of Unit = (), it doesn't work. So, can you please help in deducing what should be Install (how, you, how you, how you), thanks for the post How to generate n-grams in scala? ...
source to share
This is the type signature for op2
. You could do
- remove assignment to Op2
ngrams(3, "how are you".split(" ")).foreach { x => println((x.mkString(" ")))}
- Change
.foreach
to.map
and callop2
for the result.
scala> val op2 = ngrams(3, "how are you".split(" ")).map { x => x.mkString(" ")}.toList
scala> op2
source to share
The short answer is that the return type foreach
is Unit
. So when you assign the output foreach
to op2
, the type op2
is Unit
and its value is ()
.
It sounds like you want to do the following:
- calculate n-grams using a method
ngrams
, - save
Set
n-grams beforeop2
and - print all n-grams.
Let's start with the type of the method ngrams
:
(n: Int, words: Array[String]) => Stream[Array[String]]
It returns Stream
, which looks like it can be easily turned into Set
c toSet
:
ngrams(3, "how are you".split(" ")).toSet
However, this is dangerous because in scala, Array
equality is done by reference. It is much safer to turn yours Stream[Array[String]]
into Stream[List[String]]
to remove all duplicates (this assumes order matters for every n-gram):
val op2 = ngrams(3, "how are you".split(" ")).map(_.toList).toSet
It is now easy to print Set[List[String]]
just like you did Stream[Array[String]]
:
op2.foreach { x => println((x.mkString(" ")))}
Since the result ()
is a type Unit
, there is no reason to assign it to a variable.
source to share