Cbind two data, but with some modifications, using data.table

I have two datasets:

data1<-structure(list(gear = c(3, 3, 3, 3, 5, 3, 3, 3, 4, 4, 3, 3, 3, 
3, 3, 5, 5, 5), carb = c(2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 
4, 4, 4, 4, 6, 8)), .Names = c("gear", "carb"), class = "data.frame", row.names = c(NA, 
-18L))

data1
   gear carb
1     3    2
2     3    2
3     3    2
4     3    2
5     5    2
6     3    3
7     3    3
8     3    3
9     4    4
10    4    4
11    3    4
12    3    4
13    3    4
14    3    4
15    3    4
16    5    4
17    5    6
18    5    8

data2<-structure(list(carb = c(1, 2, 2, 2, 2, 2, 3, 4, 4, 4, 4, 4, 4, 
4, 4, 6, 8), fac = c(1L, 1L, 2L, 3L, 4L, 5L, 1L, 1L, 2L, 3L, 
4L, 5L, 6L, 7L, 8L, 1L, 1L), hello = c(NA, 0.292389553859123, 
0.584779107718246, 0.804071273112588, 0.804071273112588, 0.402035636556294, 
NA, 0.460230801434478, 1.25285051501608, 1.15057700358619, 0.869324847154013, 
0.818188091439071, 0.894893225011484, 0.792619713581601, 0.51136755714942, 
NA, NA), hello2 = c(NA, 5L, 5L, 5L, 5L, 4L, NA, 1L, 1L, 2L, 2L, 
1L, 1L, 2L, 2L, NA, NA)), row.names = c(NA, -17L), class = "data.frame", .Names = c("carb", 
"fac", "hello", "hello2"))

   carb fac     hello hello2
1     1   1        NA     NA
2     2   1 0.2923896      5
3     2   2 0.5847791      5
4     2   3 0.8040713      5
5     2   4 0.8040713      5
6     2   5 0.4020356      4
7     3   1        NA     NA
8     4   1 0.4602308      1
9     4   2 1.2528505      1
10    4   3 1.1505770      2
11    4   4 0.8693248      2
12    4   5 0.8181881      1
13    4   6 0.8948932      1
14    4   7 0.7926197      2
15    4   8 0.5113676      2
16    6   1        NA     NA
17    8   1        NA     NA

      

Suppose data1 is master data. I want to bind data1 and data2 (NO MERGE). But, as you can see, they don't have the same number of lines. One way I am trying to achieve this is using a generic var carb

. If the category carb

is in data 2, but not in data1, then I don't want to bind that category to data2. For example, in the data above, carb with a value of 1 is in data2, but not in data1, so ignore this when cbinding. If the row count for the categories that exist in both datasets is not the same, I will use the row count for each category from data1. For example, for the value carb = 3, the number of rows in data 1 is 3, whereas in data2 it is 1. So I need to have 3 rows for carb 3 in data 2 beforecbind

... The extra two lines should just replicate the line that is in data1. My desired result (the order should remain the same as in data1):

     +----------------------------------------+
     | gear   carb   fac       hello   hello2 |
     |----------------------------------------|
  1. |    3      2     1   0.2923896        5 |
  2. |    3      2     2   0.5847791        5 |
  3. |    3      2     3   0.8040713        5 |
  4. |    3      2     4   0.8040713        5 |
  5. |    5      2     5   0.4020356        4 |
     |----------------------------------------|
  6. |    3      3     1          NA       NA |
  7. |    3      3     1          NA       NA |
  8. |    3      3     1          NA       NA |
  9. |    4      4     1   0.4602308        1 |
 10. |    4      4     2   1.2528505        1 |
     |----------------------------------------|
 11. |    3      4     3    1.150577        2 |
 12. |    3      4     4   0.8693248        2 |
 13. |    3      4     5   0.8181881        1 |
 14. |    3      4     6   0.8948932        1 |
 15. |    3      4     7   0.7926197        2 |
     |----------------------------------------|
 16. |    5      4     8   0.5113676        2 |
 17. |    5      6     1          NA       NA |
 18. |    5      8     1          NA       NA |
     +----------------------------------------+

      

I was wondering if there is any particular function in the package data.table

that makes this look.

+3


source to share


2 answers


I think you really want to merge:

setDT(data1)
setDT(data2)

data1[,fac:=1:.N,by=carb]

setkey(data1,carb,fac)
setkey(data2,carb,fac)
data2[data1]

      



which gives

    carb fac     hello hello2 gear
 1:    2   1 0.2923896      5    3
 2:    2   2 0.5847791      5    3
 3:    2   3 0.8040713      5    3
 4:    2   4 0.8040713      5    3
 5:    2   5 0.4020356      4    5
 6:    3   1        NA     NA    3
 7:    3   2        NA     NA    3
 8:    3   3        NA     NA    3
 9:    4   1 0.4602308      1    4
10:    4   2 1.2528505      1    4
11:    4   3 1.1505770      2    3
12:    4   4 0.8693248      2    3
13:    4   5 0.8181881      1    3
14:    4   6 0.8948932      1    3
15:    4   7 0.7926197      2    3
16:    4   8 0.5113676      2    5
17:    6   1        NA     NA    5
18:    8   1        NA     NA    5

      

+5


source


Using the same idea as @ Frank's answer (append fac

in data1

):

library(dplyr)

data1 %>%
  group_by(carb) %>%
  mutate(fac = row_number()) %>%
  left_join(., data2)

      



What gives:

#Source: local data frame [18 x 5]
#Groups: carb
#
#   gear carb fac     hello hello2
#1     3    2   1 0.2923896      5
#2     3    2   2 0.5847791      5
#3     3    2   3 0.8040713      5
#4     3    2   4 0.8040713      5
#5     5    2   5 0.4020356      4
#6     3    3   1        NA     NA
#7     3    3   2        NA     NA
#8     3    3   3        NA     NA
#9     4    4   1 0.4602308      1
#10    4    4   2 1.2528505      1
#11    3    4   3 1.1505770      2
#12    3    4   4 0.8693248      2
#13    3    4   5 0.8181881      1
#14    3    4   6 0.8948932      1
#15    3    4   7 0.7926197      2
#16    5    4   8 0.5113676      2
#17    5    6   1        NA     NA
#18    5    8   1        NA     NA

      

+2


source







All Articles