How to split one column into multiple columns by other variables in R

Question

How to split one column into multiple columns by other variables in R

I have a dataset

data
    Choice    Length Gender
 1       I subadults      M
 2       F subadults      M
 3       F subadults      M
 4       F subadults      M
 5       I subadults      M
 6       F subadults      M
 7       I subadults      M
 8       F subadults      M
 9       I subadults      M
 10      I subadults      M
 11      I subadults      M
 12      O subadults      M
 13      O subadults      M
 14      I subadults      M
 15      F subadults      M
 16      F subadults      M
 17      I subadults      M
 18      O subadults      M
 19      F subadults      M
 20      O subadults      M
 21      F subadults      M
 22      F    adults      M
 23      I    adults      M
 24      F    adults      M
 25      I    adults      M
 26      F    adults      M
 27      F    adults      M
 28      F    adults      M
 29      F    adults      M
 30      F    adults      M
 31      O    adults      M
 32      O    adults      M
 33      F    adults      F
 34      F    adults      F
 35      F    adults      F
 36      F    adults      F
 37      O    adults      F
 38      F    adults      F
 39      F    adults      F
 40      I subadults      F
 41      I subadults      F
 42      I subadults      F
 43      O subadults      F
 44      I subadults      F
 45      I subadults      F
 46      I subadults      F
 47      F subadults      F
 48      I subadults      F
 49      O subadults      F
 50      I subadults      F
 51      I    adults      F
 52      F    adults      F
 53      F    adults      F
 54      F    adults      F
 55      F    adults      F

Now I want to split the Selection column by three, so the dataset will look like this:

  F  I  O  Length    Gender
  1  0 20  subadults   F
  0  10 0  adults      F
  12 0  11  subadults  M
  0  10 0  adults      M

where F, I, O is the sum of length and floor.

I couldn't find an R command for this. Can anyone help me? Thank you very much! Yang

+3

r reshape

Yanyan May 07 '15 at 4:31

source to share

3 answers

There are two approaches to consideration in the R base: ftable

and aggregate

.

Here ftable

:

> ftable(mydf, col.vars = "Choice")
                 Choice  F  I  O
Length    Gender                
adults    F             10  1  1
          M              7  2  2
subadults F              1  8  2
          M              9  8  4

Here aggregate

:

> aggregate(Choice ~ Length + Gender, mydf, table)
     Length Gender Choice.F Choice.I Choice.O
1    adults      F       10        1        1
2 subadults      F        1        8        2
3    adults      M        7        2        2
4 subadults      M        9        8        4

Using "data.table" you can also try this:

as.data.table(mydf)[, as.list(table(Choice)), by = list(Length, Gender)]
#       Length Gender  F I O
# 1: subadults      M  9 8 4
# 2:    adults      M  7 2 2
# 3:    adults      F 10 1 1
# 4: subadults      F  1 8 2

However, there dcast.data.table

would be a more common approach:

dcast.data.table(as.data.table(mydf), Length + Gender ~ Choice, value.var = "Choice")

With "dplyr" and "tidyr" you can try:

library(dplyr)
library(tidyr)

mydf %>%
  group_by(Length, Gender, Choice) %>%
  summarise(Count = n()) %>%
  spread(Choice, Count)
# Source: local data frame [4 x 5]
# 
#      Length Gender  F I O
# 1    adults      F 10 1 1
# 2    adults      M  7 2 2
# 3 subadults      F  1 8 2
# 4 subadults      M  9 8 4

+3

A5C1D2H2I1M1N2O1R2T1 May 07 '15 at 7:54

source to share

You may try:

 reshape(as.data.frame(table(df)),
         idvar=c("Length","Gender"),
         timevar="Choice",direction="wide")
 #      Length Gender Freq.F Freq.I Freq.O
 #1     adults      F     10      1      1
 #4  subadults      F      1      8      2
 #7     adults      M      7      2      2
 #10 subadults      M      9      8      4

The function table

gives you the number of occurrences of each Choice

, Gender

and Length

as a multidimensional array. Then you coerce data.frame

with 4 columns (three above plus a named column Freq

that indicates the number of cases of each case) and then you modify the result as you wish.

Edit

Now I understand that I am not receiving your values. Here I have counted the number of cases of each case. Are your values correct? If so, how do you extract these values?

+1

nicola May 07 '15 at 4:52

source to share

vagabond · Accepted Answer · 2015-05-07T04:55:57+0000

try:

require(reshape2)
data <- data.frame(choice = c('I', 'F', 'I', 'O', 'F', 'O'), 
                   length = c('subadults', 'subadults', 'subadults', 'adults', 'adults', 'adults'),
                   gender = c('M', 'M', 'F', 'F', 'M', 'F'))

melt_data = melt(data, value.name = "value", id.vars = c("length", "gender"))

dcast(melt_data, gender+length ~ value)

  gender    length F I O
1      F    adults 0 0 2
2      F subadults 0 1 0
3      M    adults 1 0 0
4      M subadults 1 1 0

How to split one column into multiple columns by other variables in R

More articles: