Regular expression in R constrains one term and the other is not OR

I am trying to extract records from data.frame using grepl. Here are some examples.

a <- c('This is a healthcare facility', 'this is a hospital', 'this is a hospital district', 'this is a district health service')

I want to retrieve all records that have a hospital but not a district. I broke up when the neighborhood and the hospital happen on the same rope. I tried using dollowing:

str_match(string=a,pattern='hospital|^district' )

How do I restrict the area but still include the hospital in this example?

Thank.

+3


source to share


3 answers


R supports Perl-compatible regular expressions , which allow negative assertions as headers, so in principle you can write:

str_match(string=a, pattern='^(?!.*district).*hospital', perl=TRUE)

      



(which matches "start of line" followed by a period in a line, not followed .*district

by .*hospital

") However, I'm really not sure if this condition in one regex is the best way to do it; there might be more R-ish.

+4


source


You need to use the symbol for AND ,! for NOT, with two grepl calls:



grepl("hospital", a) & !grepl("district", a)
# [1] FALSE  TRUE FALSE FALSE
a[.Last.value]
# [1] "this is a hospital"

      

+6


source


You can use two calls to grepl

:

a[grepl("hospital", a) & !grepl("district", a)]
# [1] "this is a hospital"

      

+5


source







All Articles