Getting all possible ranges by matching using regular expressions in Swift

I am playing around with the following code in Swift to generate the appropriate regex for the application:

let regExp = "-(\\([0-9.a-z()+-×÷√^₁₀²³/]+\\)|[0-9.a-z()+-×÷√^₁₀²³/]+)"

let testString = "-(hsjshdf)   -hsghsgsgs -(k) -(1/64) -dhsg62 -(p)"

let regularExpression = try! NSRegularExpression(pattern: regExp, options: [])

let matchesArray = regularExpression.matches(in: testString, options: [], range: NSRange(location: 0, length: testString.characters.count))

for match in matchesArray {
    for i in 0..<match.numberOfRanges {
        let range = match.rangeAt(i)
        let r = testString.index(testString.startIndex, offsetBy: range.location) ..< testString.index(testString.startIndex, offsetBy: range.location + range.length)
        print(testString.substring(with: r))
    }
}

      

The result is obtained as follows:

-(hsjshdf)
(hsjshdf)
-hsghsgsgs
hsghsgsgs
-(k)
(k)
-(1/64)
(1/64)
-dhsg62
dhsg62
-(p)
(p)

      

However, I want the regex to match and group the substring inside "()", so I can get the following output:

-(hsjshdf)
(hsjshdf)
hsjshdf
-hsghsgsgs
hsghsgsgs
-(k)
(k)
k
-(1/64)
(1/64)
1/64
-dhsg62
dhsg62
-(p)
(p)
p

      

I tried the following modification of the original regex and it worked for the substring "- (hsjshdf)", but crashed when printing the substring matches "-hsghsgsgs" with a runtime error (fatal error: ENDINDEX cannot grow):

let regExp = "-(\\(([0-9.a-z()+-×÷√^₁₀²³/]+)\\)|[0-9.a-z()+-×÷√^₁₀²³/]+)"

      

I am not familiar with NSRegularExpression. Am I using the wrong regex? Do I need to set a custom parameter?

Thank you for your help. Regards.

/ TB

+3


source to share


1 answer


The problem is actually with the loops.

You know that in your regex let regExp = "-(\\(([0-9.a-z()+-×÷√^₁₀²³/]+)\\)|[0-9.a-z()+-×÷√^₁₀²³/]+)"

you have two pairs of captured parentheses and the second (inner) one cannot write any parts of the string.

You should know what NSRegularExpression

returns NSRange(location: NSNotFound, length: 0)

for missing captures. In the current implementation NSNotFound

has the same meaning as and Int.max

which can be much larger than any actual strings.



You just need to check if there are location

ranges NSNotFound

or not before using them:

let regExp = "-(\\(([0-9.a-z()+-×÷√^₁₀²³/]+)\\)|[0-9.a-z()+-×÷√^₁₀²³/]+)"

let testString = "-(hsjshdf)   -hsghsgsgs -(k) -(1/64) -dhsg62 -(p)"

let regularExpression = try! NSRegularExpression(pattern: regExp, options: [])

//###(1) Use `.utf16.count`, not `.characters.count`.
let matchesArray = regularExpression.matches(in: testString, options: [], range: NSRange(location: 0, length: testString.utf16.count))

for match in matchesArray {
    for i in 0..<match.numberOfRanges {
        let range = match.rangeAt(i)
        if range.location == NSNotFound {continue} //###(2) Skip missing captures.
        //###(3) Your way of creating `r` does not work for non-BMP characters.
        print((testString as NSString).substring(with: range))
    }
}

      

(My comments (1) and (3) are not critical to your input testString

, but you should also know what NSRegularExpression

works with NSString

, which are in UTF-16 based format. location

And length

are UTF-16 based offset and count, while not Characters

.)

+1


source







All Articles