Performance of slow arrays and strings
Here are two quite similar ones Levenshtein Distance algorithms
.
Swift
implementation:
https://gist.github.com/bgreenlee/52d93a1d8fa1b8c1f38b
And Objective-C
implementation:
https://gist.github.com/boratlibre/1593632
Swift
one of them is significantly slower than ObjC
. I posted a couple of hours to make it faster, but ... Arrays Swift
and Strings
doesn't seem to run as fast as ObjC
.
In 2000 random Strings
, the Swift
implementation is about 100 (!!!) times slower than ObjC
.
To be honest, I have no idea what could be wrong, even this part of the quick
func levenshtein(aStr: String, bStr: String) -> Int {
// create character arrays
let a = Array(aStr)
let b = Array(bStr)
...
several times slower than the whole algorithm in Objective C
Does anyone know how to speed up calculations Swift
?
Thank you in advance!
Append
After all the suggested improvements, the quick code looks like this. And it is 4 times slower than ObjC in the release configuration.
import Foundation
class Array2D {
var cols:Int, rows:Int
var matrix:UnsafeMutablePointer<Int>
init(cols:Int, rows:Int) {
self.cols = cols
self.rows = rows
matrix = UnsafeMutablePointer<Int>(malloc(UInt(cols * rows) * UInt(sizeof(Int))))
for i in 0...cols*rows {
matrix[i] = 0
}
}
subscript(col:Int, row:Int) -> Int {
get {
return matrix[cols * row + col] as Int
}
set {
matrix[cols*row+col] = newValue
}
}
func colCount() -> Int {
return self.cols
}
func rowCount() -> Int {
return self.rows
}
}
extension String {
func levenshteinDistanceFromStringSwift(comparingString: NSString) -> Int {
let aStr = self
let bStr = comparingString
// let a = Array(aStr.unicodeScalars)
// let b = Array(bStr.unicodeScalars)
let a:NSString = aStr
let b:NSString = bStr
var dist = Array2D(cols: a.length + 1, rows: b.length + 1)
for i in 1...a.length {
dist[i, 0] = i
}
for j in 1...b.length {
dist[0, j] = j
}
for i in 1...a.length {
for j in 1...b.length {
if a.characterAtIndex(i-1) == b.characterAtIndex(j-1) {
dist[i, j] = dist[i-1, j-1] // noop
} else {
dist[i, j] = min(
dist[i-1, j] + 1, // deletion
dist[i, j-1] + 1, // insertion
dist[i-1, j-1] + 1 // substitution
)
}
}
}
return dist[a.length, b.length]
}
func levenshteinDistanceFromStringObjC(comparingString: String) -> Int {
let aStr = self
let bStr = comparingString
//It is really strange, but I should link Objective-C coz dramatic slow swift performance
return aStr.compareWithWord(bStr, matchGain: 0, missingCost: 1)
}
}
thaNos ?? NSString ?? and at the end to decrease the speed 4 times? Someone needs it faster?
source to share
There are several reasons why Swift code is slower than Objective-C code. I made a very simple test case comparing two fixed strings 100 times.
- Objective-C code: 0.026 seconds
- Fast code: 3.14 seconds
The first reason is that Swift Character
is an "extended grapheme cluster" that can contain multiple Unicode code points (eg "flags"). This makes decomposition of the string into characters slow. On the other hand, Objective-C
NSString
stores strings as a sequence of UTF-16 code points.
If you replace
let a = Array(aStr)
let b = Array(bStr)
by
let a = Array(aStr.utf16)
let b = Array(bStr.utf16)
to make the Swift code work with UTF-16 sequences too, then the time goes down to 1.88 seconds.
The distribution of a two-dimensional array is also slow. It is faster to allocate one one-dimensional array. I found a simple class here Array2D
:
http://blog.trolieb.com/trouble-multidimensional-arrays-swift/
class Array2D {
var cols:Int, rows:Int
var matrix: [Int]
init(cols:Int, rows:Int) {
self.cols = cols
self.rows = rows
matrix = Array(count:cols*rows, repeatedValue:0)
}
subscript(col:Int, row:Int) -> Int {
get {
return matrix[cols * row + col]
}
set {
matrix[cols*row+col] = newValue
}
}
func colCount() -> Int {
return self.cols
}
func rowCount() -> Int {
return self.rows
}
}
Using this class in your code
func levenshtein(aStr: String, bStr: String) -> Int {
let a = Array(aStr.utf16)
let b = Array(bStr.utf16)
var dist = Array2D(cols: a.count + 1, rows: b.count + 1)
for i in 1...a.count {
dist[i, 0] = i
}
for j in 1...b.count {
dist[0, j] = j
}
for i in 1...a.count {
for j in 1...b.count {
if a[i-1] == b[j-1] {
dist[i, j] = dist[i-1, j-1] // noop
} else {
dist[i, j] = min(
dist[i-1, j] + 1, // deletion
dist[i, j-1] + 1, // insertion
dist[i-1, j-1] + 1 // substitution
)
}
}
}
return dist[a.count, b.count]
}
the time in the test case is reduced to 0.84 seconds.
The last bottleneck I found in Swift code is function min()
. The Swift library has a built in function min()
that is faster. So simply removing the custom function from the Swift code reduces the test time to 0.04 seconds, which is almost as good as the Objective-C version.
Addendum: Using Unicode scanners looks a little faster:
let a = Array(aStr.unicodeScalars)
let b = Array(bStr.unicodeScalars)
and has the advantage that it works correctly with surrogate pairs like Emojis.
source to share