Create new column based on repeated observation
My question is how to create a new column based on another one. Here's my part data
:
Category Brand Time1 value Time2 number
2 HTC 2015-01-01 1724 NA 1
6 APPLE 2015-10-10 3000 2015-10-30 1
2 APPLE 2016-01-15 430 NA 1
NA Samsung 2016-10-20 860 2016-12-20 1
I am showing 4 vol. above data
and I explain mine data
more:
Let's look at the structure first.
> str(data)
Classes โdata.tableโ and 'data.frame': 105907 obs. of 6 variables:
$ Category : num 2 2 2 2 2 2 2 2 2 2 ...
$ Brand : chr "HTC" "APPLE" "INFOCUS" "APPLE" ...
$ Time1 : POSIXct, format: "2015-01-01" "2015-01-01" "2015-01-01" "2015-01-01" ...
$ value : num 1724 2946 330 2946 2946 ...
$ Time2 : POSIXct, format: NA NA NA "2015-01-03" ...
$ number : chr "1" "1" "1" "1" ...
- attr(*, ".internal.selfref")=<externalptr>
Secondly, I want to replicate every obs. based on Time1
.
This is my code:
data[,rep:=ifelse(year(Time1)==2016, 12-month(Time1)+1, 13)][rep(1:.N,rep)][]
Now data
it looks like this:
Category Brand Time1 value Time2 number rep
2 HTC 2015-01-01 1724 NA 1 13
2 HTC 2015-01-01 1724 NA 1 13
2 HTC 2015-01-01 1724 NA 1 13
2 HTC 2015-01-01 1724 NA 1 13
2 HTC 2015-01-01 1724 NA 1 13
2 HTC 2015-01-01 1724 NA 1 13
2 HTC 2015-01-01 1724 NA 1 13
2 HTC 2015-01-01 1724 NA 1 13
2 HTC 2015-01-01 1724 NA 1 13
2 HTC 2015-01-01 1724 NA 1 13
2 HTC 2015-01-01 1724 NA 1 13
2 HTC 2015-01-01 1724 NA 1 13
2 HTC 2015-01-01 1724 NA 1 13
6 APPLE 2015-10-10 3000 2015-10-30 1 13
6 APPLE 2015-10-10 3000 2015-10-30 1 13
6 APPLE 2015-10-10 3000 2015-10-30 1 13
6 APPLE 2015-10-10 3000 2015-10-30 1 13
6 APPLE 2015-10-10 3000 2015-10-30 1 13
6 APPLE 2015-10-10 3000 2015-10-30 1 13
6 APPLE 2015-10-10 3000 2015-10-30 1 13
6 APPLE 2015-10-10 3000 2015-10-30 1 13
6 APPLE 2015-10-10 3000 2015-10-30 1 13
6 APPLE 2015-10-10 3000 2015-10-30 1 13
6 APPLE 2015-10-10 3000 2015-10-30 1 13
6 APPLE 2015-10-10 3000 2015-10-30 1 13
6 APPLE 2015-10-10 3000 2015-10-30 1 13
2 APPLE 2016-01-15 430 NA 1 12
2 APPLE 2016-01-15 430 NA 1 12
2 APPLE 2016-01-15 430 NA 1 12
2 APPLE 2016-01-15 430 NA 1 12
2 APPLE 2016-01-15 430 NA 1 12
2 APPLE 2016-01-15 430 NA 1 12
2 APPLE 2016-01-15 430 NA 1 12
2 APPLE 2016-01-15 430 NA 1 12
2 APPLE 2016-01-15 430 NA 1 12
2 APPLE 2016-01-15 430 NA 1 12
2 APPLE 2016-01-15 430 NA 1 12
2 APPLE 2016-01-15 430 NA 1 12
NA Samsung 2016-10-20 860 2016-12-20 1 3
NA Samsung 2016-10-20 860 2016-12-20 1 3
NA Samsung 2016-10-20 860 2016-12-20 1 3
Third, I want to create a new column Lapse
and I want:
Category Brand Time1 value Time2 number rep Lapse
2 HTC 2015-01-01 1724 NA 1 13 0
2 HTC 2015-01-01 1724 NA 1 13 1
2 HTC 2015-01-01 1724 NA 1 13 2
2 HTC 2015-01-01 1724 NA 1 13 3
2 HTC 2015-01-01 1724 NA 1 13 4
2 HTC 2015-01-01 1724 NA 1 13 5
2 HTC 2015-01-01 1724 NA 1 13 6
2 HTC 2015-01-01 1724 NA 1 13 7
2 HTC 2015-01-01 1724 NA 1 13 8
2 HTC 2015-01-01 1724 NA 1 13 9
2 HTC 2015-01-01 1724 NA 1 13 10
2 HTC 2015-01-01 1724 NA 1 13 11
2 HTC 2015-01-01 1724 NA 1 13 12
6 APPLE 2015-10-10 3000 2015-10-30 1 13 0
6 APPLE 2015-10-10 3000 2015-10-30 1 13 1
6 APPLE 2015-10-10 3000 2015-10-30 1 13 2
6 APPLE 2015-10-10 3000 2015-10-30 1 13 3
6 APPLE 2015-10-10 3000 2015-10-30 1 13 4
6 APPLE 2015-10-10 3000 2015-10-30 1 13 5
6 APPLE 2015-10-10 3000 2015-10-30 1 13 6
6 APPLE 2015-10-10 3000 2015-10-30 1 13 7
6 APPLE 2015-10-10 3000 2015-10-30 1 13 8
6 APPLE 2015-10-10 3000 2015-10-30 1 13 9
6 APPLE 2015-10-10 3000 2015-10-30 1 13 10
6 APPLE 2015-10-10 3000 2015-10-30 1 13 11
6 APPLE 2015-10-10 3000 2015-10-30 1 13 12
2 APPLE 2016-01-15 430 NA 1 12 0
2 APPLE 2016-01-15 430 NA 1 12 1
2 APPLE 2016-01-15 430 NA 1 12 2
2 APPLE 2016-01-15 430 NA 1 12 3
2 APPLE 2016-01-15 430 NA 1 12 4
2 APPLE 2016-01-15 430 NA 1 12 5
2 APPLE 2016-01-15 430 NA 1 12 6
2 APPLE 2016-01-15 430 NA 1 12 7
2 APPLE 2016-01-15 430 NA 1 12 8
2 APPLE 2016-01-15 430 NA 1 12 9
2 APPLE 2016-01-15 430 NA 1 12 10
2 APPLE 2016-01-15 430 NA 1 12 11
NA Samsung 2016-10-20 860 2016-12-20 1 3 0
NA Samsung 2016-10-20 860 2016-12-20 1 3 1
NA Samsung 2016-10-20 860 2016-12-20 1 3 2
Above is the result I want, I try the code like this:
data[,Lapse := seq_len(.N)-1, by = (Category,Brand,Time1,value,Time2,number)]
However, this is wrong.
If it's right,
uniqie(data$Lapse)
[1] 0 1 2 3 4 5 6 7 8 9 10 11 12
But I got it 0~999
. I think my code is wrong.
Any suggestion?
Or maybe there are other good ways to do this?
UPDATE
data <- " Category Brand Time1 value Time2 number
1: 2 HTC 2015-01-01 1724 NA 1
2: 2 APPLE 2015-01-01 2946 NA 1
3: 2 INFOCUS 2015-01-01 330 NA 1
4: 2 APPLE 2015-01-01 2946 2015-01-03 1
5: 2 APPLE 2015-01-01 2946 NA 1
6: 2 Samsung 2015-01-01 2189 NA 1
7: 2 HTC 2015-01-01 730 NA 1
8: 2 Samsung 2015-01-01 2189 NA 1
9: 2 Samsung 2015-01-01 2189 NA 1
10: 2 HTC 2015-01-01 1296 NA 1
11: 2 HTC 2015-01-01 730 NA 1
12: 2 APPLE 2015-01-01 2189 NA 1
13: 2 INFOCUS 2015-01-01 330 2015-01-02 1
14: 2 HTC 2015-01-01 2189 NA 1
15: 2 SONY 2015-01-01 1296 NA 1
16: 2 HTC 2015-01-01 730 NA 1
17: 2 APPLE 2015-01-01 2946 NA 1
18: 2 APPLE 2015-01-01 2946 NA 1
19: 2 HTC 2015-01-01 1724 NA 1
20: 2 Samsung 2015-01-02 1724 NA 1
21: 2 Samsung 2015-01-02 2189 NA 1
22: 2 HTC 2015-01-02 730 NA 1
23: 2 Samsung 2015-01-02 2189 NA 1
24: 2 HTC 2015-01-02 730 NA 1
25: 2 APPLE 2015-01-02 2946 NA 1
26: 2 HTC 2015-01-02 1724 NA 1
27: 2 HTC 2015-01-02 730 NA 1
28: 2 ASUS 2015-01-02 330 NA 1
29: 2 ASUS 2015-01-02 330 NA 1
30: 2 Samsung 2015-01-02 1724 NA 1
31: 2 APPLE 2015-01-02 2189 NA 1
32: 2 HTC 2015-01-02 730 NA 1
33: 2 Samsung 2015-01-02 730 NA 1
34: 2 HTC 2015-01-02 730 NA 1
35: 2 HTC 2015-01-02 730 NA 1
36: 2 HTC 2015-01-02 730 NA 1
37: 2 Samsung 2015-01-02 730 NA 1
38: 2 APPLE 2015-01-03 2189 NA 1
39: 2 APPLE 2015-01-03 2946 NA 1
40: 2 Benten 2015-01-03 330 NA 1
41: 2 APPLE 2015-01-03 2946 NA 1
42: 2 INFOCUS 2015-01-03 330 NA 1
43: 2 Samsung 2015-01-03 1296 NA 1
44: 2 HTC 2015-01-03 730 NA 1
45: 2 Samsung 2015-01-03 2189 NA 1
46: 2 SONY 2015-01-03 2189 NA 1
47: 2 TaiwanMobile 2015-01-03 730 NA 1
48: 2 HTC 2015-01-03 1296 NA 1
49: 2 HTC 2015-01-03 730 NA 1
50: 2 APPLE 2015-01-03 2189 NA 1
51: 2 APPLE 2015-01-03 2189 NA 1
52: 2 HTC 2015-01-03 730 NA 1
53: 2 Samsung 2015-01-03 330 NA 1
54: 2 TaiwanMobile 2015-01-03 730 NA 1
55: 2 HTC 2015-01-03 730 NA 1
56: 2 HTC 2015-01-03 730 NA 1
57: 2 TaiwanMobile 2015-01-03 330 NA 1
58: 2 Samsung 2015-01-03 1724 2015-01-04 1
59: 2 HTC 2015-01-03 730 NA 1
60: 2 INFOCUS 2015-01-03 330 NA 1
61: 2 SONY 2015-01-03 730 NA 1
62: 2 HTC 2015-01-04 730 NA 1
63: 2 HTC 2015-01-04 730 NA 1
64: 2 APPLE 2015-01-04 2189 2015-01-05 1
65: 2 TaiwanMobile 2015-01-04 730 2015-01-05 1"
data <- read.table(text=data, header = TRUE)
data <- as.data.table(data)
data <- data[,rep:=ifelse(year(Time1)==2016, 12-month(Time1)+1, 13)][rep(1:.N,rep)][]
data[, Lapse := seq_len(.N)-1 , .(Category, Brand, Time1, value, Time2, number)]
dput(droplevels(head(data,65)))
structure(list(Category = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), Brand = c("HTC", "APPLE",
"INFOCUS", "APPLE", "APPLE", "Samsung", "HTC", "Samsung", "Samsung",
"HTC", "HTC", "APPLE", "INFOCUS", "HTC", "SONY", "HTC", "APPLE",
"APPLE", "HTC", "Samsung", "Samsung", "HTC", "Samsung", "HTC",
"APPLE", "HTC", "HTC", "ASUS", "ASUS", "Samsung", "APPLE", "HTC",
"Samsung", "HTC", "HTC", "HTC", "Samsung", "APPLE", "APPLE",
"Benten", "APPLE", "INFOCUS", "Samsung", "HTC", "Samsung", "SONY",
"TaiwanMobile", "HTC", "HTC", "APPLE", "APPLE", "HTC", "Samsung",
"TaiwanMobile", "HTC", "HTC", "TaiwanMobile", "Samsung", "HTC",
"INFOCUS", "SONY", "HTC", "HTC", "APPLE", "TaiwanMobile"), Time1 = structure(c(1420070400,
1420070400, 1420070400, 1420070400, 1420070400, 1420070400, 1420070400,
1420070400, 1420070400, 1420070400, 1420070400, 1420070400, 1420070400,
1420070400, 1420070400, 1420070400, 1420070400, 1420070400, 1420070400,
1420156800, 1420156800, 1420156800, 1420156800, 1420156800, 1420156800,
1420156800, 1420156800, 1420156800, 1420156800, 1420156800, 1420156800,
1420156800, 1420156800, 1420156800, 1420156800, 1420156800, 1420156800,
1420243200, 1420243200, 1420243200, 1420243200, 1420243200, 1420243200,
1420243200, 1420243200, 1420243200, 1420243200, 1420243200, 1420243200,
1420243200, 1420243200, 1420243200, 1420243200, 1420243200, 1420243200,
1420243200, 1420243200, 1420243200, 1420243200, 1420243200, 1420243200,
1420329600, 1420329600, 1420329600, 1420329600), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), value = c(1724, 2946, 330, 2946,
2946, 2189, 730, 2189, 2189, 1296, 730, 2189, 330, 2189, 1296,
730, 2946, 2946, 1724, 1724, 2189, 730, 2189, 730, 2946, 1724,
730, 330, 330, 1724, 2189, 730, 730, 730, 730, 730, 730, 2189,
2946, 330, 2946, 330, 1296, 730, 2189, 2189, 730, 1296, 730,
2189, 2189, 730, 330, 730, 730, 730, 330, 1724, 730, 330, 730,
730, 730, 2189, 730), Time2 = structure(c(NA, NA, NA, 1420243200,
NA, NA, NA, NA, NA, NA, NA, NA, 1420156800, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 1420329600, NA, NA, NA, NA, NA, 1420416000,
1420416000), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
number = c("1", "1", "1", "1", "1", "1", "1", "1", "1", "1",
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1",
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1",
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1",
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1",
"1", "1", "1", "1", "1", "1", "1")), .Names = c("Category",
"Brand", "Time1", "value", "Time2", "number"), row.names = c(NA,
-65L), .internal.selfref = <pointer: 0x003e24a0>, class = c("data.table",
"data.frame"))
And the problem is, the result is weird.
unique(data$Lapse)
[1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
[38] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73
[75] 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
source to share
The problem is the original data is not updated as it has the same number of rows as before. If we check the output
data[,rep:=ifelse(year(Time1)==2016, 12-month(Time1)+1, 13)][rep(1:.N,rep)]
and then
data
becomes apparent.
So, we assign the output two steps back to the original object ("data") or to another object (if we don't want to modify the original object)
data <- data[,rep:=ifelse(year(Time1)==2016, 12-month(Time1)+1, 13)][rep(1:.N,rep)]
and create a column "Lapse" based on the sequence by group
data[, Lapse := seq_len(.N)-1 , .(Category, Brand, Time1, value, Time2, number)]
data
Category Brand Time1 value Time2 number rep Lapse
1: 2 HTC 2015-01-01 1724 <NA> 1 13 0
2: 2 HTC 2015-01-01 1724 <NA> 1 13 1
3: 2 HTC 2015-01-01 1724 <NA> 1 13 2
4: 2 HTC 2015-01-01 1724 <NA> 1 13 3
5: 2 HTC 2015-01-01 1724 <NA> 1 13 4
6: 2 HTC 2015-01-01 1724 <NA> 1 13 5
7: 2 HTC 2015-01-01 1724 <NA> 1 13 6
8: 2 HTC 2015-01-01 1724 <NA> 1 13 7
9: 2 HTC 2015-01-01 1724 <NA> 1 13 8
10: 2 HTC 2015-01-01 1724 <NA> 1 13 9
11: 2 HTC 2015-01-01 1724 <NA> 1 13 10
12: 2 HTC 2015-01-01 1724 <NA> 1 13 11
13: 2 HTC 2015-01-01 1724 <NA> 1 13 12
14: 6 APPLE 2015-10-10 3000 2015-10-30 1 13 0
15: 6 APPLE 2015-10-10 3000 2015-10-30 1 13 1
16: 6 APPLE 2015-10-10 3000 2015-10-30 1 13 2
17: 6 APPLE 2015-10-10 3000 2015-10-30 1 13 3
18: 6 APPLE 2015-10-10 3000 2015-10-30 1 13 4
19: 6 APPLE 2015-10-10 3000 2015-10-30 1 13 5
20: 6 APPLE 2015-10-10 3000 2015-10-30 1 13 6
21: 6 APPLE 2015-10-10 3000 2015-10-30 1 13 7
22: 6 APPLE 2015-10-10 3000 2015-10-30 1 13 8
23: 6 APPLE 2015-10-10 3000 2015-10-30 1 13 9
24: 6 APPLE 2015-10-10 3000 2015-10-30 1 13 10
25: 6 APPLE 2015-10-10 3000 2015-10-30 1 13 11
26: 6 APPLE 2015-10-10 3000 2015-10-30 1 13 12
27: 2 APPLE 2016-01-15 430 <NA> 1 12 0
28: 2 APPLE 2016-01-15 430 <NA> 1 12 1
29: 2 APPLE 2016-01-15 430 <NA> 1 12 2
30: 2 APPLE 2016-01-15 430 <NA> 1 12 3
31: 2 APPLE 2016-01-15 430 <NA> 1 12 4
32: 2 APPLE 2016-01-15 430 <NA> 1 12 5
33: 2 APPLE 2016-01-15 430 <NA> 1 12 6
34: 2 APPLE 2016-01-15 430 <NA> 1 12 7
35: 2 APPLE 2016-01-15 430 <NA> 1 12 8
36: 2 APPLE 2016-01-15 430 <NA> 1 12 9
37: 2 APPLE 2016-01-15 430 <NA> 1 12 10
38: 2 APPLE 2016-01-15 430 <NA> 1 12 11
39: NA Samsung 2016-10-20 860 2016-12-20 1 3 0
40: NA Samsung 2016-10-20 860 2016-12-20 1 3 1
41: NA Samsung 2016-10-20 860 2016-12-20 1 3 2
source to share