R: How can I create a new variable with order numbers (by date) for each level (for reframing).)
I'm new to R and I have to deal with a large dataset. I've googled a lot, but I just can't seem to find a way to do what I need (although it sounds like a simple thing).
What I want to do is modify my data in broad form. To do it the way I want, I need a new variable with date order numbers for each factor (which will start with one for each new factor).
Now this is a small example of what I have:
ID<-c("A","A","A","B","B","C","D","D","D","D")
Date<-c("01-01-2014", "05-01-2014", "06-01-2014",
"01-01-2014", "12-01-2014", "25-01-2014",
"06-01-2014", "12-01-2014", "25-01-2014",
"26-01-2014")
Value<-c(2.5, 3.4, 2.5, 305.66, 300.00, 55.01,
205.32, 99.99, 210.25, 105.125)
mydata<-data.frame(ID, Date, Value)
mydata
ID Date Value
1 A 01-01-2014 2.500
2 A 05-01-2014 3.400
3 A 06-01-2014 2.500
4 B 01-01-2014 305.660
5 B 12-01-2014 300.000
6 C 25-01-2014 55.010
7 D 06-01-2014 205.320
8 D 12-01-2014 99.990
9 D 25-01-2014 210.250
10 D 26-01-2014 105.125
(the dataset is sorted first by the identifier factor than by date for each factor.)
And here's what I need: the new variable is called Order.
ID Date Value Order
1 A 01-01-2014 2.500 1
2 A 05-01-2014 3.400 2
3 A 06-01-2014 2.500 3
4 B 01-01-2014 305.660 1
5 B 12-01-2014 300.000 2
6 C 25-01-2014 55.010 1
7 D 06-01-2014 205.320 1
8 D 12-01-2014 99.990 2
9 D 25-01-2014 210.250 3
10 D 26-01-2014 105.125 4
The ultimate goal is to change the data based on the Order variable as follows:
library(reshape)
goal<-reshape(mydata2,
idvar="ID",
timevar="Order",
direction="wide")
goal
ID Date.1 Value.1 Date.2 Value.2 Date.3 Value.3 Date.4 Value.4
1 A 01-01-2014 2.50 05-01-2014 3.40 06-01-2014 2.50 <NA> NA
4 B 01-01-2014 305.66 12-01-2014 300.00 <NA> NA <NA> NA
6 C 25-01-2014 55.01 <NA> NA <NA> NA <NA> NA
7 D 06-01-2014 205.32 12-01-2014 99.99 25-01-2014 210.25 26-01-2014 105.125
Or is there any other way to modify data like this without the "Order" variable?
source to share
This is what the function getanID
in my splitstackshape package is meant to do:
> library(splitstackshape)
> getanID(mydata, "ID")
ID Date Value .id
1: A 01-01-2014 2.500 1
2: A 05-01-2014 3.400 2
3: A 06-01-2014 2.500 3
4: B 01-01-2014 305.660 1
5: B 12-01-2014 300.000 2
6: C 25-01-2014 55.010 1
7: D 06-01-2014 205.320 1
8: D 12-01-2014 99.990 2
9: D 25-01-2014 210.250 3
10: D 26-01-2014 105.125 4
Alternatively, you can look into the development version of "data.table", which is very flexible in its implementation dcast
, allowing you to do this conversion without having to generate a "time" variable.
source to share