How to make Regex ignore a pattern after a specific group

I posted this question 2 months ago and got the following REGEX pattern to capture ICD9 codes. It is expected to capture only ICD9 codes (for example: 134.57 or V23.54 or E33.62) and ignore patient weight 134.57 lbs or laboratory result, for example 127.20 mg / dL.

icdRegex = recomp('(V\d{2}\.\d{1,2}|\d{3}\.\d{1,2}|E\d{3}\.\d)(?!\s*(?:kg|lb|mg)s?)')

      

Now there are exceptions. The second part of the regex does not ignore the pattern, followed by either kilograms, pounds, mg, or any other stop words.

I can write some basic Regex, but this is getting too complicated for my tiny brain and needs help.

+2


source to share


1 answer


(?:(?:V\d{1,2}\.\d{1,2})|(?:\d{1,3}\.\d{1,2})|(?:E\d{1,2}\.\d+))(?!\d|(?:\s*(?:kg|lb|mg)s?))

      

Try it. Check out the demo.

https://regex101.com/r/pM9yO9/12

modified the lookahead to include \d in it so that partial matches are avoided

x="""134.57 or V23.54 or E33.62 134.57 lb or a lab result like 127.20 mg/dL"""
print re.findall(r"(?:(?:V\d{1,2}\.\d{1,2})|(?:\d{3}\.\d{1,2})|(?:E\d{1,2}\.\d+))(?!\d|(?:\s*(?:kg|lb|mg)s?))",x)

      



Output: ['134.57', 'V23.54', 'E33.62']

Final version tested against data.

icdRegex = recomp ("(? :( ?: V \ d {1,2}. \ d {1,2}) | (?: \ d {3}. \ d {1,2}) | (: E \ d {1,2} \ d +.)) (\ D | (?!? \ s * (?: kg | lb | mg) s)) ") codes = findall (icdRegex, hit)

where "hit" will be a clinical note.

+2


source







All Articles