Assigning strsplit results to multiple columns of a dataframe

I am trying to split a character vector into three different vectors inside a dataframe.

My details:

> df <- data.frame(filename = c("Author1 (2010) Title of paper", 
                                "Author2 et al (2009) Title of paper",
                                "Author3 & Author4 (2004) Title of paper"),
                   stringsAsFactors = FALSE)

      

And I would like to share these 3 information ( authors

, year

, title

) on three different columns, so that it will be:

> df
                          filename             author  year   title
 1           Author1 (2010) Title1            Author1  2010  Title1
 2     Author2 et al (2009) Title2      Author2 et al  2009  Title2
 3 Author3 & Author4 (2004) Title3  Author3 & Author4  2004  Title3

      

I used strsplit

to split each filename

into a vector of 3 elements:

 df$temp <- strsplit(df$filename, " \\(|\\) ")

      

But now I cannot find a way to put each item in a separate column. I can access certain information:

> df$temp[[2]][1]
[1] "Author2 et al"

      

but can't find how to put it in other columns

> df$author <- df$temp[[]][1]
Error

      

+3


source to share


4 answers


With the package tidyr

, here separate

:

separate(df, "filename", c("Author","Year","Title"), sep=" \\(|\\) "), remove=F)
#                                  filename            Author
# 1           Author1 (2010) Title of paper           Author1
# 2     Author2 et al (2009) Title of paper     Author2 et al
# 3 Author3 & Author4 (2004) Title of paper Author3 & Author4
#   Year          Title
# 1 2010 Title of paper
# 2 2009 Title of paper
# 3 2004 Title of paper

      



Leading and trailing spaces were considered

+6


source


You can try tstrsplit

from devel versiondata.table

library(data.table)#v1.9.5+
 setDT(df)[, c('author', 'year', 'title') :=tstrsplit(filename, ' \\(|\\) ')]
df
#                                  filename             author year
#1:           Author1 (2010) Title of paper           Author1  2010
#2:     Author2 et al (2009) Title of paper     Author2 et al  2009
#3: Author3 & Author4 (2004) Title of paper Author3 & Author4  2004
#             title
#1:  Title of paper
#2:  Title of paper
#3:  Title of paper

      



Edit: Included OP's section splitting pattern to remove spaces.

+6


source


result <- cbind(df, do.call("rbind", strsplit(df$filename, " \\(|\\) ")))
colnames(result)[2:4] <- c("author", "year", "title")

      

+5


source


There is a basic t

-method (transpose) for data:

 res <- t( data.frame(  strsplit(df$filename, " \\(|\\) ") ))
 colnames(res) <- c("author", "year", "title")
 rownames(res) <- seq_along(rownames(res) )
 res
#--------------
  author              year   title           
1 "Author1"           "2010" "Title of paper"
2 "Author2 et al"     "2009" "Title of paper"
3 "Author3 & Author4" "2004" "Title of paper"

      

+2


source







All Articles