How to split one column into multiple columns by other variables in R
I have a dataset
data
Choice Length Gender
1 I subadults M
2 F subadults M
3 F subadults M
4 F subadults M
5 I subadults M
6 F subadults M
7 I subadults M
8 F subadults M
9 I subadults M
10 I subadults M
11 I subadults M
12 O subadults M
13 O subadults M
14 I subadults M
15 F subadults M
16 F subadults M
17 I subadults M
18 O subadults M
19 F subadults M
20 O subadults M
21 F subadults M
22 F adults M
23 I adults M
24 F adults M
25 I adults M
26 F adults M
27 F adults M
28 F adults M
29 F adults M
30 F adults M
31 O adults M
32 O adults M
33 F adults F
34 F adults F
35 F adults F
36 F adults F
37 O adults F
38 F adults F
39 F adults F
40 I subadults F
41 I subadults F
42 I subadults F
43 O subadults F
44 I subadults F
45 I subadults F
46 I subadults F
47 F subadults F
48 I subadults F
49 O subadults F
50 I subadults F
51 I adults F
52 F adults F
53 F adults F
54 F adults F
55 F adults F
Now I want to split the Selection column by three, so the dataset will look like this:
F I O Length Gender
1 0 20 subadults F
0 10 0 adults F
12 0 11 subadults M
0 10 0 adults M
where F, I, O is the sum of length and floor.
I couldn't find an R command for this. Can anyone help me? Thank you very much! Yang
source to share
try:
require(reshape2)
data <- data.frame(choice = c('I', 'F', 'I', 'O', 'F', 'O'),
length = c('subadults', 'subadults', 'subadults', 'adults', 'adults', 'adults'),
gender = c('M', 'M', 'F', 'F', 'M', 'F'))
melt_data = melt(data, value.name = "value", id.vars = c("length", "gender"))
dcast(melt_data, gender+length ~ value)
gender length F I O
1 F adults 0 0 2
2 F subadults 0 1 0
3 M adults 1 0 0
4 M subadults 1 1 0
source to share
There are two approaches to consideration in the R base: ftable
and aggregate
.
Here ftable
:
> ftable(mydf, col.vars = "Choice")
Choice F I O
Length Gender
adults F 10 1 1
M 7 2 2
subadults F 1 8 2
M 9 8 4
Here aggregate
:
> aggregate(Choice ~ Length + Gender, mydf, table)
Length Gender Choice.F Choice.I Choice.O
1 adults F 10 1 1
2 subadults F 1 8 2
3 adults M 7 2 2
4 subadults M 9 8 4
Using "data.table" you can also try this:
as.data.table(mydf)[, as.list(table(Choice)), by = list(Length, Gender)]
# Length Gender F I O
# 1: subadults M 9 8 4
# 2: adults M 7 2 2
# 3: adults F 10 1 1
# 4: subadults F 1 8 2
However, there dcast.data.table
would be a more common approach:
dcast.data.table(as.data.table(mydf), Length + Gender ~ Choice, value.var = "Choice")
With "dplyr" and "tidyr" you can try:
library(dplyr)
library(tidyr)
mydf %>%
group_by(Length, Gender, Choice) %>%
summarise(Count = n()) %>%
spread(Choice, Count)
# Source: local data frame [4 x 5]
#
# Length Gender F I O
# 1 adults F 10 1 1
# 2 adults M 7 2 2
# 3 subadults F 1 8 2
# 4 subadults M 9 8 4
source to share
You may try:
reshape(as.data.frame(table(df)),
idvar=c("Length","Gender"),
timevar="Choice",direction="wide")
# Length Gender Freq.F Freq.I Freq.O
#1 adults F 10 1 1
#4 subadults F 1 8 2
#7 adults M 7 2 2
#10 subadults M 9 8 4
The function table
gives you the number of occurrences of each Choice
, Gender
and Length
as a multidimensional array. Then you coerce data.frame
with 4 columns (three above plus a named column Freq
that indicates the number of cases of each case) and then you modify the result as you wish.
Edit
Now I understand that I am not receiving your values. Here I have counted the number of cases of each case. Are your values โโcorrect? If so, how do you extract these values?
source to share