Regex: Extract and Match Specific words Between Two Characters

I need to extract from a string, words that match (path, road, street and street) with each word before and after it up to a comma, character, or near in front.

Line examples:
 1. Yeet Road, Off Mandy Plant Way, Mando GRA.
 2.3A, Sleek Drive, Off Tremble Rake Street.
 3.57 Radish Slist Road Ikoyi

The result should be as close to:

  • Yeet road
  • Mandy's Way
  • Rake Street Alarm
  • Radish Slist Road Ikoyi

Based on some stack answers, this is what I have:
(?<=\,)(.*Way|Road|Str|Street?)(?=\,)

Any help would be appreciated.

+3


source to share


2 answers


You can try something like this (with the ignore_case flag):

\b(?:(?!off\b)[a-z]+[^\w,\n]+)*?\b(?:way|road|str(?:eet)?)\b(?:[^\w,\n]+[a-z]+)*

      

demo



However, pattern types that start describing an undefined substring of length undefined before literal pattern parts (keywords) are ineffective. This is not important for small lines, but you cannot use them on a large line.

To exclude certain words, you can change (?!off\b)

to(?!off\b|word1\b|word2\b|...)

Also, you need to clarify what characters are allowed or not between words.

+2


source


you can use

^\d+\s*(*SKIP)(*F)|\b[^,]*\b(?:way|r(?:oa)?d|str(?:eet)?)\b[^,]*\b

      

See regex demo



More details

  • ^\d+\s*(*SKIP)(*F)

    - matches and omits leading 1 or more digits followed by 0+ spaces at the beginning of the line
  • |

    - or matches ...
  • \b[^,]*\b(?:way|r(?:oa)?d|str(?:eet)?)\b[^,]*\b

    - any 0+ non-comma characters followed by any non-capturing alternatives as whole words, and then 0 + non-comma characters again, the entire subpattern is matched at word boundaries to avoid leading / trailing punctuation matching / spaces.
+1


source







All Articles