How can I find the maximum overlap between two strings in Scala?

Suppose I have two lines: s

and t

. I need to write a function f

to find the max. t

which is also a suffix s

. For example:

 s = "abcxyz", t = "xyz123", f(s, t) = "xyz"
 s = "abcxxx", t = "xx1234", f(s, t) = "xx"

      

How could you write it in Scala?

+3


source to share


3 answers


This first solution is easily the most concise and also more efficient than the recursive version as it uses lazy-evaluated iteration

s.tails.find(t.startsWith).get

      

It has now been discussed whether to tails

copy the entire line over and over. In this case, you can use toList

on s

and then the mkString

result.

s.toList.tails.find(t.startsWith(_: List[Char])).get.mkString

      



For some reason, type annotation is required to compile it. I'm not really trying to see which one is faster.

UPDATE - OPTIMIZATION

As som-snytt pointed out, t

it cannot start on any line that is longer than this, and so we could do the following optimization:

s.drop(s.length - t.length).tails.find(t.startsWith).get

      

+4


source


If we need to find the common overlapping part, then we can recursively take the tail of the first line (which should intersect with the beginning of the second line) until the remainder is the one that the second line starts with.This also applies to the case where lines don't have overlap, because then an empty string will be returned.

scala> def findOverlap(s:String, t:String):String = { 
    if (s == t.take(s.size)) s else findOverlap (s.tail, t) 
}
findOverlap: (s: String, t: String)String

scala> findOverlap("abcxyz", "xyz123")
res3: String = xyz

scala> findOverlap("one","two")
res1: String = ""

      



UPDATE: It has been pointed out that it tail

may not be implemented in the most efficient way (i.e. it creates a new row when it is called). If this becomes an issue, using substring(1)

instead tail

(or converting both strings to lists where the tail / head should be O (1)) may give better performance. And thus we can replace t.take(s.size)

with t.substring(0,s.size)

.

+3


source


Efficient, it isn't, but it's a neat (IMO) one-liner.

val s = "abcxyz"
val t ="xyz123"

(s.tails.toSet intersect t.inits.toSet).maxBy(_.size)
 //res8: String = xyz

      

(take all s suffixes that are also t prefixes and choose the longest one)

+3


source







All Articles