R: How can I create a new variable with order numbers (by date) for each level (for reframing).)

I'm new to R and I have to deal with a large dataset. I've googled a lot, but I just can't seem to find a way to do what I need (although it sounds like a simple thing).

What I want to do is modify my data in broad form. To do it the way I want, I need a new variable with date order numbers for each factor (which will start with one for each new factor).

Now this is a small example of what I have:

ID<-c("A","A","A","B","B","C","D","D","D","D")

Date<-c("01-01-2014", "05-01-2014", "06-01-2014",
        "01-01-2014", "12-01-2014", "25-01-2014", 
        "06-01-2014", "12-01-2014", "25-01-2014", 
        "26-01-2014")

Value<-c(2.5, 3.4, 2.5, 305.66, 300.00, 55.01,
        205.32, 99.99, 210.25, 105.125)

mydata<-data.frame(ID, Date, Value)
mydata

ID       Date   Value
1   A 01-01-2014   2.500
2   A 05-01-2014   3.400
3   A 06-01-2014   2.500
4   B 01-01-2014 305.660
5   B 12-01-2014 300.000
6   C 25-01-2014  55.010
7   D 06-01-2014 205.320
8   D 12-01-2014  99.990
9   D 25-01-2014 210.250
10  D 26-01-2014 105.125

      

(the dataset is sorted first by the identifier factor than by date for each factor.)

And here's what I need: the new variable is called Order.

   ID       Date   Value Order
1   A 01-01-2014   2.500     1
2   A 05-01-2014   3.400     2
3   A 06-01-2014   2.500     3
4   B 01-01-2014 305.660     1
5   B 12-01-2014 300.000     2
6   C 25-01-2014  55.010     1
7   D 06-01-2014 205.320     1
8   D 12-01-2014  99.990     2
9   D 25-01-2014 210.250     3
10  D 26-01-2014 105.125     4

      

The ultimate goal is to change the data based on the Order variable as follows:

library(reshape)
goal<-reshape(mydata2, 
              idvar="ID",
              timevar="Order",
              direction="wide")
goal

   ID     Date.1  Value.1     Date.2  Value.2     Date.3  Value.3     Date.4  Value.4
1  A  01-01-2014    2.50  05-01-2014    3.40  06-01-2014    2.50        <NA>      NA
4  B  01-01-2014  305.66  12-01-2014  300.00        <NA>       NA       <NA>      NA
6  C  25-01-2014   55.01        <NA>      NA        <NA>       NA       <NA>      NA
7  D  06-01-2014  205.32  12-01-2014   99.99  25-01-2014   210.25   26-01-2014 105.125

      

Or is there any other way to modify data like this without the "Order" variable?

+3


source to share


1 answer


This is what the function getanID

in my splitstackshape package is meant to do:

> library(splitstackshape)
> getanID(mydata, "ID")
    ID       Date   Value .id
 1:  A 01-01-2014   2.500   1
 2:  A 05-01-2014   3.400   2
 3:  A 06-01-2014   2.500   3
 4:  B 01-01-2014 305.660   1
 5:  B 12-01-2014 300.000   2
 6:  C 25-01-2014  55.010   1
 7:  D 06-01-2014 205.320   1
 8:  D 12-01-2014  99.990   2
 9:  D 25-01-2014 210.250   3
10:  D 26-01-2014 105.125   4

      



Alternatively, you can look into the development version of "data.table", which is very flexible in its implementation dcast

, allowing you to do this conversion without having to generate a "time" variable.

+3


source







All Articles