Regular expression in R constrains one term and the other is not OR
I am trying to extract records from data.frame using grepl. Here are some examples.
a <- c('This is a healthcare facility', 'this is a hospital', 'this is a hospital district', 'this is a district health service')
I want to retrieve all records that have a hospital but not a district. I broke up when the neighborhood and the hospital happen on the same rope. I tried using dollowing:
str_match(string=a,pattern='hospital|^district' )
How do I restrict the area but still include the hospital in this example?
Thank.
source to share
R supports Perl-compatible regular expressions , which allow negative assertions as headers, so in principle you can write:
str_match(string=a, pattern='^(?!.*district).*hospital', perl=TRUE)
(which matches "start of line" followed by a period in a line, not followed .*district
by .*hospital
") However, I'm really not sure if this condition in one regex is the best way to do it; there might be more R-ish.
source to share