Throwing exceptions for fuzzy match with new Python regex module

I am testing a new python regex module that allows fuzzy string matching and am still impressed with its capabilities.However, I am having problems with some fuzzy matching exceptions. Below is an example. I want ST LOUIS

and all options ST LOUIS

within an edit distance of 1 to match ref

. However, I want to make one exception to this rule: the editing may not consist of the insertion to the left of the leftmost character, containing letters N

, S

, E

, or W

. In the following example, I want inputs 1 - 3 to match ref and input 4 to error. However, using the followingref

causes it to match all four inputs. Does anyone familiar with the new regex module know of a possible workaround?

input1 = 'ST LOUIS'
input2 = 'AST LOUIS'
input3 = 'ST LOUS'
input4 = 'NST LOUIS'


ref = '([^NSEW]|(?<=^))(ST LOUIS){e<=1}'

match = regex.fullmatch(ref,input1)
match
<_regex.Match object at 0x1006c6030>
match = regex.fullmatch(ref,input2)
match
<_regex.Match object at 0x1006c6120>
match = regex.fullmatch(ref,input3)
match
<_regex.Match object at 0x1006c6030>
match = regex.fullmatch(ref,input4)
match
<_regex.Match object at 0x1006c6120>

      

+3


source to share


1 answer


Try using negative lookahead instead:

(?![NEW]|SS)(ST LOUIS){e<=1}

      



(ST LOUIS){e<=1}

matches a line containing fuzzy conditions placed on it. You want him not to start with [NSEW]

. A negative look does it for you (?![NSEW])

. But the line S

you want starts with already, you only want to exclude lines starting with S

added to the beginning of your line. Such a line starts with SS

, and therefore is added to the negative view.

Note that if you make errors> 1 this probably won't work as desired.

+3


source







All Articles