How to make a column with multiple rows?
I am new to R and programming myself and now I have a problem with my data.frame that prevents me from continuing my work.
I have a dataset as follows
Table 1
Individual Score
Tim 45
Tim 77
Tim 32
Clare 92
Clare 70
Clare 88
Let me explain the above table 1, I have several people (TIm and Clare in the example above) and I have their score in the test they presented on three different occasions (2009, 2010, 2011). I am trying to turn this above into something like this:
Table2
Individual Score09 Score10 Score11
Tim 45 77 32
Clare 92 70 88
I used ddply to get table 1, since initially I had information about subsets of the test (the variable score is just the sum of the entire subset)
Please let me know if there is a way to actually get Table 2 instead of Table 1, since I have over 10,000 observations and Table 1 is set up to prevent me from moving forward with the intended proposal.
EDIT:
The original df from which table 1 was generated:
The data frame looks like this
Base Individual score_math score_bio score_chem
SB1120091 Tim 12 23 10
SB1120092 Tim 30 25 22
SB1120101 Tim 17 5 10
SB1120091 Clare 50 20 22
SB1120092 Clare 40 10 20
SB1120101 Clare 47 20 21
And the code was:
>Table1 <-ddply(x, .(Indivual), summarise, Score=(score_math*score_bio*score_chem))
EDIT2:
The original dataset doesn't have a Year variable, but a base variable that provides information about when the test was run.
Also, the Score variable is calculated as the product of all subsets.
source to share
Data:
df <- structure(list(Individual = structure(c(2L, 2L, 2L, 1L, 1L, 1L),
.Label = c("Clare", "Tim"), class = "factor"),
Score = c(45, 77, 32, 92, 70, 88),
count = c(1L, 2L, 3L, 1L, 2L, 3L)),
.Names = c("Individual", "Score", "count"),
row.names = c(NA, -6L), class = "data.frame")
df$count <- rep(c("09", "10", "11"), 2)
-
Using
reshape
from base stat:> reshape(df, idvar="Individual", timevar="count", direction="wide", sep="") # Individual Score09 Score10 Score11 # 1 Tim 45 77 32 # 4 Clare 92 70 88
source to share
Your call ddply
splits the function separately, which results in a separate data frame for each individual user and calculating the sum for each data frame separately. The dataset has multiple rows for each user, so this amount has one total for each row. It then concatenates the data together and by default gives one row in the results for each initial row. But you need one row for each person; if we just wrap the result, it turns it into a single row matrix, which leads to the desired behavior.
Using the data provided:
x <- read.table(text="Year Individual score_math score_bio score_chem
2009 Tim 12 23 10
2010 Tim 30 25 22
2011 Tim 17 5 10
2009 Clare 50 20 22
2010 Clare 40 10 20
2011 Clare 47 20 21", header=TRUE)
Here's a revised call ddply
:
> ddply(x, .(Individual), summarise, Score=t((score_math+score_bio+score_chem)))
Individual Score.1 Score.2 Score.3
1 Clare 92 70 88
2 Tim 45 77 32
ddply
really not quite the right tool; you just do a very simple calculation on each line and then rebuild. My preference would be to add a column for the total score and then use dcast
from the package reshape2
. One reason for this preference is that you will have a complete master data set with all the information you need later, and then use that to do all the calculations and transformations.
library(reshape2)
x$Total <- with(x, score_math + score_bio + score_chem)
dcast(x, Individual ~ Year, value.var="Total")
source to share