Regular expression in R constrains one term and the other is not OR

I am trying to extract records from data.frame using grepl. Here are some examples.

a <- c('This is a healthcare facility', 'this is a hospital', 'this is a hospital district', 'this is a district health service')

I want to retrieve all records that have a hospital but not a district. I broke up when the neighborhood and the hospital happen on the same rope. I tried using dollowing:

str_match(string=a,pattern='hospital|^district' )

How do I restrict the area but still include the hospital in this example?



source to share

3 answers

R supports Perl-compatible regular expressions , which allow negative assertions as headers, so in principle you can write:

str_match(string=a, pattern='^(?!.*district).*hospital', perl=TRUE)


(which matches "start of line" followed by a period in a line, not followed .*district

by .*hospital

") However, I'm really not sure if this condition in one regex is the best way to do it; there might be more R-ish.



You need to use the symbol for AND ,! for NOT, with two grepl calls:

grepl("hospital", a) & !grepl("district", a)
# [1] "this is a hospital"




You can use two calls to grepl


a[grepl("hospital", a) & !grepl("district", a)]
# [1] "this is a hospital"




All Articles