R - remove anything after a comma from a column
I would like to strip this column so that it just shows the last name - if there is a comma, I would like to remove the comma and anything after it. I have a data column that is a combination of only last names and finally first. The data looks like this:
Last Name
Sample, A
Tester
Wilfred, Nancy
Day, Bobby Jean
Morris
+3
user3922483
source
to share
5 answers
You can use gsub () and some regex:
> x <- 'Day, Bobby Jean'
> gsub("(.*),.*", "\\1", x)
[1] "Day"
+6
EDi
source
to share
You can use gsub:
gsub(",.*", "", c("last only", "last, first"))
# [1] "last only" "last"
",.*"
says: replace comma (,) and every character after that (. *), without nothing ""
.
+3
martin
source
to share
str1 <- c("Sample, A", "Tester", "Wifred, Nancy", "Day, Bobby Jean", "Morris")
library(stringr)
str_extract(str1, perl('[A-Za-z]+(?=(,|\\b))'))
#[1] "Sample" "Tester" "Wifred" "Day" "Morris"
Match alphabets [A-Za-z]+
and extract those followed by ,
or word boundary.
0
akrun
source
to share
This will work
a <- read.delim("C:\\Desktop\\a.csv", row.names = NULL,header=TRUE,
stringsAsFactors=FALSE,sep=",")
a=as.matrix(a)
Data=str_replace_all(string=a,pattern="\\,.*$",replacement=" ")
0
user3619015
source
to share
Also try strsplit
:
string <- c("Sample, A", "Tester", "Wifred, Nancy", "Day, Bobby Jean", "Morris")
sapply(strsplit(string, ","), "[", 1)
#[1] "Sample" "Tester" "Wifred" "Day" "Morris"
0
useR
source
to share