Subset dataframe if $ character exists in row column

Question

Subset dataframe if $ character exists in row column

I have one dataframe

with column time

and column string

. I want subset

this dataframe

- where I only store rows where the column string

contains a character $

somewhere in it.

After subset, I want to clear the column string

so that it only contains characters

after the character $

until there is space

orsymbol

df <- data.frame("time"=c(1:10),
"string"=c("$ABCD test","test","test $EFG test",
"$500 test","$HI/ hello","test $JK/",
"testing/123","$MOO","$abc","123"))

I want the end result to be:

Time  string  
1     ABCD
3     EFG
4     500
5     HI
6     JK
8     MOO
9     abc

It only contains strings with $

in the string column and then only keeps the characters after the character $

and until space

orsymbol

I have had some success with sub

just for pulling string

, but have been unable to apply that to df

and subset. Thanks for your help.

+3

regex r dataframe subset

newtoR March 25 17 at 21:52

source to share

2 answers

Until someone comes up with a cute regex solution, here's my take:

# subset for $ signs and convert to character class
res <- df[ grepl("$", df$string, fixed = TRUE),]
res$string <- as.character(res$string)

# split on non alpha and non $, and grab the one with $, then remove $
res$clean <- sapply(strsplit(res$string, split = "[^a-zA-Z0-9$']", perl = TRUE),
                    function(i){
                      x <- i[grepl("$", i, fixed = TRUE)]
                      # in case when there is more than one $
                      # x <- i[grepl("$", i, fixed = TRUE)][1]
                      gsub("$", "", x, fixed = TRUE)
                    })
res
#   time         string clean
# 1    1     $ABCD test  ABCD
# 3    3 test $EFG test   EFG
# 4    4      $500 test   500
# 5    5     $HI/ hello    HI
# 6    6      test $JK/    JK
# 8    8           $MOO   MOO
# 9    9           $abc   abc

+3

zx8754 March 25 17 at 22:23

source to share

akrun · Accepted Answer · 2017-03-26T04:23:44+0000

We can do this by extracting the substring with regexpr/regmatches

to extract only the substring that follows$

i1 <- grep("$", df$string, fixed = TRUE)
transform(df[i1,], string = regmatches(string, regexpr("(?<=[$])\\w+", string, perl = TRUE)))
#    time string
#1    1   ABCD
#3    3    EFG
#4    4    500
#5    5     HI
#6    6     JK
#8    8    MOO
#9    9    abc

Or with the syntax tidyverse

library(tidyverse)
df %>% 
   filter(str_detect(string, fixed("$")))  %>%
   mutate(string = str_extract(string, "(?<=[$])\\w+"))

Subset dataframe if $ character exists in row column

More articles: