Scala - finding the first position where two Seqs differ
Scala comes with a nice corresponds
method:
val a = scala.io.Source.fromFile("fileA").getLines().toSeq()
val b = scala.io.Source.fromFile("fileB").getLines().toSeq()
val areEqual = a.corresponds(b){_.equals(_)}
if(areEqual) ...
And I really like the brevity of it.
Is there already a similar method that will also tell me the first position in which the two sequences differ?
those. is there a more idiomatic way to write something like this:
val result = ((seqA zip seqB).zipWithIndex).find{case ((a,b),i) => !a.equals(b)} match{
case Some(((a,b),i)) => s"seqA and seqB differ in pos $i: $a <> $b"
case _ => "no difference"
}
Because, as you can see, this bloody neck pain is readable. And it gets even worse if I want to use triplets instead of tuples of tuples:
val result = (((seqA zip seqB).zipWithIndex) map {case (t,i) => (t._1,t._2,i)}).find{case (a,b,i) => !a.equals(b)} match{
case Some((a,b,i)) => s"seqA and seqB differ in pos $i: $a <> $b"
case _ => "no difference"
}
I know the method diff
. Unfortunately, this doesn't take into account the order of the elements.
source to share
You can use indexWhere
(see ScalaDoc ) like this:
(as zip bs).indexWhere{case (x,y) => x != y}
Example:
scala> val as = List(1,2,3,4)
scala> val bs = List(1,2,4,4)
scala> (as zip bs).indexWhere{case (x,y) => x != y}
res0: Int = 2
Note that all based solutions zip
may report differences if one Seq is longer than the other (
zip
truncates the longer Seq) - this may or may not be what you want ...
Update . For Seqs of equal length, a different approach is used:
as.indices.find(i => as(i) != bs(i))
This is fine as it returns Option[Int]
, so it returns None
rather than the magic -1 if there is no difference between Seqs.
It behaves the same as the other solution if as
shorter bs
, but fails if as
longer (you can of course take the minimum length).
However, since it is addressed as Seqs by index, it will only work well for IndexedSeq
s.
Update 2 . We can handle different Seq lengths by using lift
so that we get an option when retrieving items by index:
bs.indices.find(i => as.lift(i) != bs.lift(i))
so if as = [1,2]
and bs = [1,2,3]
, the first index by which they differ is 2 (since this element is not present in as
). However, in this case, we need to call indices
on the longest Seq, not the shortest, or explicit check which is the longest, using max
eg.
(0 until (as.length max bs.length)).find(i => as.lift(i) != bs.lift(i))
source to share
This is slightly better:
(as zip bs).zipWithIndex.collectFirst { case ((a,b),i) if a!=b => i }
Cm:
def firstDiff[A,B](as: Seq[A], bs: Seq[B]) = (as zip bs).zipWithIndex.collectFirst { case ((a,b),i) if a!=b => i }
firstDiff(Seq(1,2,3,4), Seq(1,2,9,4))
// res1: Option[Int] = Some(2)
If you want a
and b
in the output:
(as zip bs).zipWithIndex.collectFirst { case ((a,b),i) if a!=b => (i,a,b) }
Also: if you want it to be like your example corresponds
, you can do it as an extension method:
implicit class Enriched_counts_TraversableOnce[A](val as: TraversableOnce[A]) extends AnyVal {
def firstDiff[B](bs: TraversableOnce[B]): Option[Int] = {
(as.toIterator zip bs.toIterator)
.zipWithIndex
.collectFirst { case ((a,b),i) if a!=b => i }
}
}
Seq(1,2,3,4).firstDiff(Seq(1,2,9,4))
// res2: Option[Int] = Some(2)
Or even:
implicit class Enriched_counts_TraversableOnce[A](val as: TraversableOnce[A]) extends AnyVal {
def firstDiff2[B](bs: TraversableOnce[B])(p: (A,B) => Boolean): Option[Int] = {
(as.toIterator zip bs.toIterator)
.zipWithIndex
.collectFirst { case ((a,b),i) if !p(a,b) => i }
}
}
Seq(1,2,3,4).firstDiff2(Seq(1,2,9,4)){ _ == _ }
// res3: Option[Int] = Some(2)
source to share