Scala - finding the first position where two Seqs differ

Scala comes with a nice corresponds

method:

val a = scala.io.Source.fromFile("fileA").getLines().toSeq()
val b = scala.io.Source.fromFile("fileB").getLines().toSeq()

val areEqual = a.corresponds(b){_.equals(_)}

if(areEqual) ...

      

And I really like the brevity of it.

Is there already a similar method that will also tell me the first position in which the two sequences differ?

those. is there a more idiomatic way to write something like this:

val result = ((seqA zip seqB).zipWithIndex).find{case ((a,b),i) => !a.equals(b)} match{
    case Some(((a,b),i)) => s"seqA and seqB differ in pos $i: $a <> $b"
    case _ => "no difference"
}

      

Because, as you can see, this bloody neck pain is readable. And it gets even worse if I want to use triplets instead of tuples of tuples:

val result = (((seqA zip seqB).zipWithIndex) map {case (t,i) => (t._1,t._2,i)}).find{case (a,b,i) => !a.equals(b)} match{
    case Some((a,b,i)) => s"seqA and seqB differ in pos $i: $a <> $b"
    case _ => "no difference"
}

      

I know the method diff

. Unfortunately, this doesn't take into account the order of the elements.

+3


source to share


2 answers


You can use indexWhere

(see ScalaDoc ) like this:

(as zip bs).indexWhere{case (x,y) => x != y}

      

Example:

scala> val as = List(1,2,3,4)
scala> val bs = List(1,2,4,4)

scala> (as zip bs).indexWhere{case (x,y) => x != y}

res0: Int = 2

      

Note that all based solutions zip

may report differences if one Seq is longer than the other ( zip

truncates the longer Seq) - this may or may not be what you want ...

Update . For Seqs of equal length, a different approach is used:

as.indices.find(i => as(i) != bs(i))

      



This is fine as it returns Option[Int]

, so it returns None

rather than the magic -1 if there is no difference between Seqs.

It behaves the same as the other solution if as

shorter bs

, but fails if as

longer (you can of course take the minimum length).

However, since it is addressed as Seqs by index, it will only work well for IndexedSeq

s.

Update 2 . We can handle different Seq lengths by using lift

so that we get an option when retrieving items by index:

bs.indices.find(i => as.lift(i) != bs.lift(i))

      

so if as = [1,2]

and bs = [1,2,3]

, the first index by which they differ is 2 (since this element is not present in as

). However, in this case, we need to call indices

on the longest Seq, not the shortest, or explicit check which is the longest, using max

eg.

(0 until (as.length max bs.length)).find(i => as.lift(i) != bs.lift(i))

      

+8


source


This is slightly better:

(as zip bs).zipWithIndex.collectFirst { case ((a,b),i) if a!=b => i }

      

Cm:

def firstDiff[A,B](as: Seq[A], bs: Seq[B]) = (as zip bs).zipWithIndex.collectFirst { case ((a,b),i) if a!=b => i }

firstDiff(Seq(1,2,3,4), Seq(1,2,9,4))
// res1: Option[Int] = Some(2)

      

If you want a

and b

in the output:



(as zip bs).zipWithIndex.collectFirst { case ((a,b),i) if a!=b => (i,a,b) }

      

Also: if you want it to be like your example corresponds

, you can do it as an extension method:

implicit class Enriched_counts_TraversableOnce[A](val as: TraversableOnce[A]) extends AnyVal {
  def firstDiff[B](bs: TraversableOnce[B]): Option[Int] = {
    (as.toIterator zip bs.toIterator)
      .zipWithIndex
      .collectFirst { case ((a,b),i) if a!=b => i }
  }
}

Seq(1,2,3,4).firstDiff(Seq(1,2,9,4))
// res2: Option[Int] = Some(2)

      

Or even:

implicit class Enriched_counts_TraversableOnce[A](val as: TraversableOnce[A]) extends AnyVal {
  def firstDiff2[B](bs: TraversableOnce[B])(p: (A,B) => Boolean): Option[Int] = {
    (as.toIterator zip bs.toIterator)
      .zipWithIndex
      .collectFirst { case ((a,b),i) if !p(a,b) => i }
  }
}

Seq(1,2,3,4).firstDiff2(Seq(1,2,9,4)){ _ == _ }
// res3: Option[Int] = Some(2)

      

+3


source







All Articles