Getting all possible ranges by matching using regular expressions in Swift
I am playing around with the following code in Swift to generate the appropriate regex for the application:
let regExp = "-(\\([0-9.a-z()+-×÷√^₁₀²³/]+\\)|[0-9.a-z()+-×÷√^₁₀²³/]+)"
let testString = "-(hsjshdf) -hsghsgsgs -(k) -(1/64) -dhsg62 -(p)"
let regularExpression = try! NSRegularExpression(pattern: regExp, options: [])
let matchesArray = regularExpression.matches(in: testString, options: [], range: NSRange(location: 0, length: testString.characters.count))
for match in matchesArray {
for i in 0..<match.numberOfRanges {
let range = match.rangeAt(i)
let r = testString.index(testString.startIndex, offsetBy: range.location) ..< testString.index(testString.startIndex, offsetBy: range.location + range.length)
print(testString.substring(with: r))
}
}
The result is obtained as follows:
-(hsjshdf)
(hsjshdf)
-hsghsgsgs
hsghsgsgs
-(k)
(k)
-(1/64)
(1/64)
-dhsg62
dhsg62
-(p)
(p)
However, I want the regex to match and group the substring inside "()", so I can get the following output:
-(hsjshdf)
(hsjshdf)
hsjshdf
-hsghsgsgs
hsghsgsgs
-(k)
(k)
k
-(1/64)
(1/64)
1/64
-dhsg62
dhsg62
-(p)
(p)
p
I tried the following modification of the original regex and it worked for the substring "- (hsjshdf)", but crashed when printing the substring matches "-hsghsgsgs" with a runtime error (fatal error: ENDINDEX cannot grow):
let regExp = "-(\\(([0-9.a-z()+-×÷√^₁₀²³/]+)\\)|[0-9.a-z()+-×÷√^₁₀²³/]+)"
I am not familiar with NSRegularExpression. Am I using the wrong regex? Do I need to set a custom parameter?
Thank you for your help. Regards.
/ TB
source to share
The problem is actually with the loops.
You know that in your regex let regExp = "-(\\(([0-9.a-z()+-×÷√^₁₀²³/]+)\\)|[0-9.a-z()+-×÷√^₁₀²³/]+)"
you have two pairs of captured parentheses and the second (inner) one cannot write any parts of the string.
You should know what NSRegularExpression
returns NSRange(location: NSNotFound, length: 0)
for missing captures. In the current implementation NSNotFound
has the same meaning as and Int.max
which can be much larger than any actual strings.
You just need to check if there are location
ranges NSNotFound
or not before using them:
let regExp = "-(\\(([0-9.a-z()+-×÷√^₁₀²³/]+)\\)|[0-9.a-z()+-×÷√^₁₀²³/]+)"
let testString = "-(hsjshdf) -hsghsgsgs -(k) -(1/64) -dhsg62 -(p)"
let regularExpression = try! NSRegularExpression(pattern: regExp, options: [])
//###(1) Use `.utf16.count`, not `.characters.count`.
let matchesArray = regularExpression.matches(in: testString, options: [], range: NSRange(location: 0, length: testString.utf16.count))
for match in matchesArray {
for i in 0..<match.numberOfRanges {
let range = match.rangeAt(i)
if range.location == NSNotFound {continue} //###(2) Skip missing captures.
//###(3) Your way of creating `r` does not work for non-BMP characters.
print((testString as NSString).substring(with: range))
}
}
(My comments (1) and (3) are not critical to your input testString
, but you should also know what NSRegularExpression
works with NSString
, which are in UTF-16 based format. location
And length
are UTF-16 based offset and count, while not Characters
.)
source to share