Performance of slow arrays and strings

Here are two quite similar ones Levenshtein Distance algorithms

.

Swift

implementation: https://gist.github.com/bgreenlee/52d93a1d8fa1b8c1f38b

And Objective-C

implementation: https://gist.github.com/boratlibre/1593632

Swift

one of them is significantly slower than ObjC

. I posted a couple of hours to make it faster, but ... Arrays Swift

and Strings

doesn't seem to run as fast as ObjC

.

In 2000 random Strings

, the Swift

implementation is about 100 (!!!) times slower than ObjC

.

To be honest, I have no idea what could be wrong, even this part of the quick

func levenshtein(aStr: String, bStr: String) -> Int {
// create character arrays
let a = Array(aStr)
let b = Array(bStr)
...

      

several times slower than the whole algorithm in Objective C

Does anyone know how to speed up calculations Swift

?

Thank you in advance!

Append

After all the suggested improvements, the quick code looks like this. And it is 4 times slower than ObjC in the release configuration.

import Foundation
class Array2D {
    var cols:Int, rows:Int
    var matrix:UnsafeMutablePointer<Int>


    init(cols:Int, rows:Int) {
        self.cols = cols
        self.rows = rows
        matrix = UnsafeMutablePointer<Int>(malloc(UInt(cols * rows) * UInt(sizeof(Int))))
        for i in 0...cols*rows {
            matrix[i] = 0
        }

    }

    subscript(col:Int, row:Int) -> Int {
        get {
            return matrix[cols * row + col] as Int
        }
        set {
            matrix[cols*row+col] = newValue
        }
    }

    func colCount() -> Int {
        return self.cols
    }

    func rowCount() -> Int {
        return self.rows
    }
}

extension String {
    func levenshteinDistanceFromStringSwift(comparingString: NSString) -> Int {
        let aStr = self
        let bStr = comparingString

//        let a = Array(aStr.unicodeScalars)
//        let b = Array(bStr.unicodeScalars)

        let a:NSString = aStr
        let b:NSString = bStr

        var dist = Array2D(cols: a.length + 1, rows: b.length + 1)



        for i in 1...a.length {
            dist[i, 0] = i
        }

        for j in 1...b.length {
            dist[0, j] = j
        }

        for i in 1...a.length {
            for j in 1...b.length {
                if a.characterAtIndex(i-1) == b.characterAtIndex(j-1) {
                    dist[i, j] = dist[i-1, j-1]  // noop
                } else {
                    dist[i, j] = min(
                        dist[i-1, j] + 1,  // deletion
                        dist[i, j-1] + 1,  // insertion
                        dist[i-1, j-1] + 1  // substitution
                    )
                }
            }
        }

        return dist[a.length, b.length]

    }
    func levenshteinDistanceFromStringObjC(comparingString: String) -> Int {
        let aStr = self
        let bStr = comparingString
        //It is really strange, but I should link Objective-C coz dramatic slow swift performance
        return aStr.compareWithWord(bStr, matchGain: 0, missingCost: 1)

    }

}

      

thaNos ?? NSString ?? and at the end to decrease the speed 4 times? Someone needs it faster?

+3


source to share


1 answer


There are several reasons why Swift code is slower than Objective-C code. I made a very simple test case comparing two fixed strings 100 times.

  • Objective-C code: 0.026 seconds
  • Fast code: 3.14 seconds

The first reason is that Swift Character

is an "extended grapheme cluster" that can contain multiple Unicode code points (eg "flags"). This makes decomposition of the string into characters slow. On the other hand, Objective-C NSString

stores strings as a sequence of UTF-16 code points.

If you replace

let a = Array(aStr)
let b = Array(bStr)

      

by

let a = Array(aStr.utf16)
let b = Array(bStr.utf16)

      

to make the Swift code work with UTF-16 sequences too, then the time goes down to 1.88 seconds.

The distribution of a two-dimensional array is also slow. It is faster to allocate one one-dimensional array. I found a simple class here Array2D

: http://blog.trolieb.com/trouble-multidimensional-arrays-swift/



class Array2D {
    var cols:Int, rows:Int
    var matrix: [Int]


    init(cols:Int, rows:Int) {
        self.cols = cols
        self.rows = rows
        matrix = Array(count:cols*rows, repeatedValue:0)
    }

    subscript(col:Int, row:Int) -> Int {
        get {
            return matrix[cols * row + col]
        }
        set {
            matrix[cols*row+col] = newValue
        }
    }

    func colCount() -> Int {
        return self.cols
    }

    func rowCount() -> Int {
        return self.rows
    }
}

      

Using this class in your code

func levenshtein(aStr: String, bStr: String) -> Int {
    let a = Array(aStr.utf16)
    let b = Array(bStr.utf16)

    var dist = Array2D(cols: a.count + 1, rows: b.count + 1)

    for i in 1...a.count {
        dist[i, 0] = i
    }

    for j in 1...b.count {
        dist[0, j] = j
    }

    for i in 1...a.count {
        for j in 1...b.count {
            if a[i-1] == b[j-1] {
                dist[i, j] = dist[i-1, j-1]  // noop
            } else {
                dist[i, j] = min(
                    dist[i-1, j] + 1,  // deletion
                    dist[i, j-1] + 1,  // insertion
                    dist[i-1, j-1] + 1  // substitution
                )
            }
        }
    }

    return dist[a.count, b.count]
}

      

the time in the test case is reduced to 0.84 seconds.

The last bottleneck I found in Swift code is function min()

. The Swift library has a built in function min()

that is faster. So simply removing the custom function from the Swift code reduces the test time to 0.04 seconds, which is almost as good as the Objective-C version.

Addendum: Using Unicode scanners looks a little faster:

let a = Array(aStr.unicodeScalars)
let b = Array(bStr.unicodeScalars)

      

and has the advantage that it works correctly with surrogate pairs like Emojis.

+8


source







All Articles