Subset dataframe if $ character exists in row column
I have one dataframe
with column time
and column string
. I want subset
this dataframe
- where I only store rows where the column string
contains a character $
somewhere in it.
After subset, I want to clear the column string
so that it only contains characters
after the character $
until there is space
orsymbol
df <- data.frame("time"=c(1:10),
"string"=c("$ABCD test","test","test $EFG test",
"$500 test","$HI/ hello","test $JK/",
"testing/123","$MOO","$abc","123"))
I want the end result to be:
Time string
1 ABCD
3 EFG
4 500
5 HI
6 JK
8 MOO
9 abc
It only contains strings with $
in the string column and then only keeps the characters after the character $
and until space
orsymbol
I have had some success with sub
just for pulling string
, but have been unable to apply that to df
and subset. Thanks for your help.
source to share
We can do this by extracting the substring with regexpr/regmatches
to extract only the substring that follows$
i1 <- grep("$", df$string, fixed = TRUE)
transform(df[i1,], string = regmatches(string, regexpr("(?<=[$])\\w+", string, perl = TRUE)))
# time string
#1 1 ABCD
#3 3 EFG
#4 4 500
#5 5 HI
#6 6 JK
#8 8 MOO
#9 9 abc
Or with the syntax tidyverse
library(tidyverse)
df %>%
filter(str_detect(string, fixed("$"))) %>%
mutate(string = str_extract(string, "(?<=[$])\\w+"))
source to share
Until someone comes up with a cute regex solution, here's my take:
# subset for $ signs and convert to character class
res <- df[ grepl("$", df$string, fixed = TRUE),]
res$string <- as.character(res$string)
# split on non alpha and non $, and grab the one with $, then remove $
res$clean <- sapply(strsplit(res$string, split = "[^a-zA-Z0-9$']", perl = TRUE),
function(i){
x <- i[grepl("$", i, fixed = TRUE)]
# in case when there is more than one $
# x <- i[grepl("$", i, fixed = TRUE)][1]
gsub("$", "", x, fixed = TRUE)
})
res
# time string clean
# 1 1 $ABCD test ABCD
# 3 3 test $EFG test EFG
# 4 4 $500 test 500
# 5 5 $HI/ hello HI
# 6 6 test $JK/ JK
# 8 8 $MOO MOO
# 9 9 $abc abc
source to share