R - remove anything after a comma from a column

I would like to strip this column so that it just shows the last name - if there is a comma, I would like to remove the comma and anything after it. I have a data column that is a combination of only last names and finally first. The data looks like this:

Last Name  
Sample, A  
Tester  
Wilfred, Nancy  
Day, Bobby Jean  
Morris  

      

+3


source to share


5 answers


You can use gsub () and some regex:



> x <- 'Day, Bobby Jean'
> gsub("(.*),.*", "\\1", x)
[1] "Day"

      

+6


source


You can use gsub:

gsub(",.*", "", c("last only", "last, first"))
# [1] "last only" "last"

      



",.*"

says: replace comma (,) and every character after that (. *), without nothing ""

.

+3


source


 str1 <- c("Sample, A", "Tester", "Wifred, Nancy", "Day, Bobby Jean", "Morris")
 library(stringr)
  str_extract(str1, perl('[A-Za-z]+(?=(,|\\b))'))
 #[1] "Sample" "Tester" "Wifred" "Day"   "Morris"  

      

Match alphabets [A-Za-z]+

and extract those followed by ,

or word boundary.

0


source


This will work

a <- read.delim("C:\\Desktop\\a.csv", row.names = NULL,header=TRUE, 
                 stringsAsFactors=FALSE,sep=",")
a=as.matrix(a)
Data=str_replace_all(string=a,pattern="\\,.*$",replacement=" ")

      

0


source


Also try strsplit

:

string <- c("Sample, A", "Tester", "Wifred, Nancy", "Day, Bobby Jean", "Morris")

sapply(strsplit(string, ","), "[", 1)
#[1] "Sample" "Tester" "Wifred" "Day"    "Morris"

      

0


source







All Articles