Split column of rows into variable number of columns using data.table in R

Question

Split column of rows into variable number of columns using data.table in R

I want to analyze many years of income records in a Quicken house. I exported the file to qif and used the bank2csv program to render the csv. Within Quicken you can use category (eg car, tax), subcategories (eg car: service, car: fuel) and tags (eg self, spouse, son). bank2csv displays categories: subcategories / tag as a concatenated string. I want to put a category in a category column instead, a subcategory in a subcategory column, and put tags in a tag column. I saw a similar question , but alas, strsplit

then it workedunlist

and then indexed each element so that it can be written to the correct location by assignment. It won't work as sometimes there is no tag and sometimes there is no subcategory. It is easy to split the string into a list and store that list in a column, but how does one actually assign the first element of the list to one column and the second element (if it exists) to the second column. There is of course an elegant way.

simplified sample

library(data.table)
library(stringi)
dt <- data.table(category.tag=c("toys/David", "toys/David", "toys/James", "toys", "toys", "toys/James"), transaction=1:6)

How to create the third and fourth columns: category, tag. Some tag would beNA

I can do the following, but it doesn't take me very far. I need a way to specify the first or second element of the resulting list (as opposed to the whole list)

dt[, category:= strsplit(x = category.tag, split = "/") ]

+3

string r data.table

Farrel 24 nov. 14 at 23:02

source to share

4 answers

you can use cSplit

library(splitstackshape)
dt[, c("category", "tag") := cSplit(dt[,.(category.tag)], "category.tag", "/")]
dt
#    category.tag transaction category   tag
# 1:   toys/David           1     toys David
# 2:   toys/David           2     toys David
# 3:   toys/James           3     toys James
# 4:         toys           4     toys    NA
# 5:         toys           5     toys    NA
# 6:   toys/James           6     toys James

+4

Rich scriven 24 nov. At 23:10

source to share

1) Tryread.table

read <- function(x) read.table(text = x, sep  = "/", fill = TRUE, na.strings = "")
dt[, c("category", "tag") := read(category.tag)]

No additional packages are required.

2) An alternative is to use separate

in the tidyr package:

library(tidyr)
separate(dt, category.tag, c("category", "tag"), extra = "drop")

The above tidyr version is 0.1.0.9000 from github. To install it, make sure the package is installed devtools R and instructs: devtools::install_github("hadley/tidyr")

.

Update: Comments and minor improvements made by Richard Scrivens. Added tidyr solution.

+2

G. Grothendieck 24 nov. 14 at 23:14

source to share

Since you already have "stringi", you can also look at the argument stri_split_fixed

and simplify

:

setnames(cbind(dt, stri_split_fixed(dt$category.tag, "/", simplify = TRUE)), 
         c("V1", "V2"), c("category", "tag"))[]
#    category.tag transaction category   tag
# 1:   toys/David           1     toys David
# 2:   toys/David           2     toys David
# 3:   toys/James           3     toys James
# 4:         toys           4     toys    NA
# 5:         toys           5     toys    NA
# 6:   toys/James           6     toys James

Although I must admit that I am partial to cSplit

: -)

+1

A5C1D2H2I1M1N2O1R2T1 25 nov. '14 at 1:22

source to share

Arun · Accepted Answer · 2015-01-27T19:24:27+0000

Just pushed two functions transpose()

and tstrsplit()

in data.table v1.9.5.

With this we can do:

require(data.table)
dt[, c("category", "tag") := tstrsplit(category.tag, "/", fixed=TRUE)]
#    category.tag transaction category   tag
# 1:   toys/David           1     toys David
# 2:   toys/David           2     toys David
# 3:   toys/James           3     toys James
# 4:         toys           4     toys    NA
# 5:         toys           5     toys    NA
# 6:   toys/James           6     toys James

tstrsplit

is a wrapper for transpose(strsplit(as.character(x), ...))

. And you can also pass fill=.

to fill in missing values with any other value than NA

.

transpose()

can also be used for lists, data frames and data tables.

Split column of rows into variable number of columns using data.table in R

More articles: