How to make a column with multiple rows?

I am new to R and programming myself and now I have a problem with my data.frame that prevents me from continuing my work.

I have a dataset as follows

Table 1

    Individual             Score
    Tim                      45
    Tim                      77
    Tim                      32
    Clare                    92
    Clare                    70
    Clare                    88

      

Let me explain the above table 1, I have several people (TIm and Clare in the example above) and I have their score in the test they presented on three different occasions (2009, 2010, 2011). I am trying to turn this above into something like this:

Table2

    Individual             Score09             Score10             Score11
    Tim                      45                   77                  32
    Clare                    92                   70                  88

      

I used ddply to get table 1, since initially I had information about subsets of the test (the variable score is just the sum of the entire subset)

Please let me know if there is a way to actually get Table 2 instead of Table 1, since I have over 10,000 observations and Table 1 is set up to prevent me from moving forward with the intended proposal.

EDIT:

The original df from which table 1 was generated:

The data frame looks like this

    Base          Individual     score_math    score_bio     score_chem
    SB1120091       Tim              12            23             10
    SB1120092       Tim              30            25             22
    SB1120101       Tim              17             5             10
    SB1120091       Clare            50            20             22
    SB1120092       Clare            40            10             20
    SB1120101       Clare            47            20             21

      

And the code was:

>Table1 <-ddply(x, .(Indivual), summarise, Score=(score_math*score_bio*score_chem))

EDIT2:

The original dataset doesn't have a Year variable, but a base variable that provides information about when the test was run.

Also, the Score variable is calculated as the product of all subsets.

+3


source to share


4 answers


Data:

df <- structure(list(Individual = structure(c(2L, 2L, 2L, 1L, 1L, 1L), 
                     .Label = c("Clare", "Tim"), class = "factor"), 
                     Score = c(45, 77, 32, 92, 70, 88), 
                     count = c(1L, 2L, 3L, 1L, 2L, 3L)), 
                     .Names = c("Individual", "Score", "count"), 
                     row.names = c(NA, -6L), class = "data.frame")
df$count <- rep(c("09", "10", "11"), 2)

      



  • Using reshape

    from base stat:

    > reshape(df, idvar="Individual", timevar="count", direction="wide", sep="")
    
    #   Individual Score09 Score10 Score11
    # 1        Tim      45      77      32
    # 4      Clare      92      70      88
    
          

+4


source


You can use the package reshape2

:



# presuming your data frame is 'xx'
library(reshape2)

# Create a 'Case' Column
xx$Case <- rep(paste0("Score", c("09", "10", "11")), 2)

dcast(xx, Individual ~ Case, value.var="Score")
 Individual Score09 Score10 Score11
      Clare      92      70      88
        Tim      45      77      32

      

+2


source


Now that you have provided your original table, use xtabs()

in your original dataset. Assuming your dataset is called "x":

xtabs(score_math + score_bio + score_chem ~ Individual + Year, x)
#           Year
# Individual 2009 2010 2011
#      Clare   92   70   88
#      Tim     45   77   32

      

+2


source


Your call ddply

splits the function separately, which results in a separate data frame for each individual user and calculating the sum for each data frame separately. The dataset has multiple rows for each user, so this amount has one total for each row. It then concatenates the data together and by default gives one row in the results for each initial row. But you need one row for each person; if we just wrap the result, it turns it into a single row matrix, which leads to the desired behavior.

Using the data provided:

x <- read.table(text="Year Individual score_math score_bio score_chem
2009 Tim 12 23 10
2010 Tim 30 25 22
2011 Tim 17 5 10
2009 Clare 50 20 22
2010 Clare 40 10 20
2011 Clare 47 20 21", header=TRUE)

      

Here's a revised call ddply

:

> ddply(x, .(Individual), summarise, Score=t((score_math+score_bio+score_chem)))
  Individual Score.1 Score.2 Score.3
1      Clare      92      70      88
2        Tim      45      77      32

      

ddply

really not quite the right tool; you just do a very simple calculation on each line and then rebuild. My preference would be to add a column for the total score and then use dcast

from the package reshape2

. One reason for this preference is that you will have a complete master data set with all the information you need later, and then use that to do all the calculations and transformations.

library(reshape2)
x$Total <- with(x, score_math + score_bio + score_chem)
dcast(x, Individual ~ Year, value.var="Total")

      

+1


source







All Articles