Cbind two data, but with some modifications, using data.table
I have two datasets:
data1<-structure(list(gear = c(3, 3, 3, 3, 5, 3, 3, 3, 4, 4, 3, 3, 3,
3, 3, 5, 5, 5), carb = c(2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4,
4, 4, 4, 4, 6, 8)), .Names = c("gear", "carb"), class = "data.frame", row.names = c(NA,
-18L))
data1
gear carb
1 3 2
2 3 2
3 3 2
4 3 2
5 5 2
6 3 3
7 3 3
8 3 3
9 4 4
10 4 4
11 3 4
12 3 4
13 3 4
14 3 4
15 3 4
16 5 4
17 5 6
18 5 8
data2<-structure(list(carb = c(1, 2, 2, 2, 2, 2, 3, 4, 4, 4, 4, 4, 4,
4, 4, 6, 8), fac = c(1L, 1L, 2L, 3L, 4L, 5L, 1L, 1L, 2L, 3L,
4L, 5L, 6L, 7L, 8L, 1L, 1L), hello = c(NA, 0.292389553859123,
0.584779107718246, 0.804071273112588, 0.804071273112588, 0.402035636556294,
NA, 0.460230801434478, 1.25285051501608, 1.15057700358619, 0.869324847154013,
0.818188091439071, 0.894893225011484, 0.792619713581601, 0.51136755714942,
NA, NA), hello2 = c(NA, 5L, 5L, 5L, 5L, 4L, NA, 1L, 1L, 2L, 2L,
1L, 1L, 2L, 2L, NA, NA)), row.names = c(NA, -17L), class = "data.frame", .Names = c("carb",
"fac", "hello", "hello2"))
carb fac hello hello2
1 1 1 NA NA
2 2 1 0.2923896 5
3 2 2 0.5847791 5
4 2 3 0.8040713 5
5 2 4 0.8040713 5
6 2 5 0.4020356 4
7 3 1 NA NA
8 4 1 0.4602308 1
9 4 2 1.2528505 1
10 4 3 1.1505770 2
11 4 4 0.8693248 2
12 4 5 0.8181881 1
13 4 6 0.8948932 1
14 4 7 0.7926197 2
15 4 8 0.5113676 2
16 6 1 NA NA
17 8 1 NA NA
Suppose data1 is master data. I want to bind data1 and data2 (NO MERGE). But, as you can see, they don't have the same number of lines. One way I am trying to achieve this is using a generic var carb
. If the category carb
is in data 2, but not in data1, then I don't want to bind that category to data2. For example, in the data above, carb with a value of 1 is in data2, but not in data1, so ignore this when cbinding. If the row count for the categories that exist in both datasets is not the same, I will use the row count for each category from data1. For example, for the value carb = 3, the number of rows in data 1 is 3, whereas in data2 it is 1. So I need to have 3 rows for carb 3 in data 2 beforecbind
... The extra two lines should just replicate the line that is in data1. My desired result (the order should remain the same as in data1):
+----------------------------------------+
| gear carb fac hello hello2 |
|----------------------------------------|
1. | 3 2 1 0.2923896 5 |
2. | 3 2 2 0.5847791 5 |
3. | 3 2 3 0.8040713 5 |
4. | 3 2 4 0.8040713 5 |
5. | 5 2 5 0.4020356 4 |
|----------------------------------------|
6. | 3 3 1 NA NA |
7. | 3 3 1 NA NA |
8. | 3 3 1 NA NA |
9. | 4 4 1 0.4602308 1 |
10. | 4 4 2 1.2528505 1 |
|----------------------------------------|
11. | 3 4 3 1.150577 2 |
12. | 3 4 4 0.8693248 2 |
13. | 3 4 5 0.8181881 1 |
14. | 3 4 6 0.8948932 1 |
15. | 3 4 7 0.7926197 2 |
|----------------------------------------|
16. | 5 4 8 0.5113676 2 |
17. | 5 6 1 NA NA |
18. | 5 8 1 NA NA |
+----------------------------------------+
I was wondering if there is any particular function in the package data.table
that makes this look.
source to share
I think you really want to merge:
setDT(data1)
setDT(data2)
data1[,fac:=1:.N,by=carb]
setkey(data1,carb,fac)
setkey(data2,carb,fac)
data2[data1]
which gives
carb fac hello hello2 gear
1: 2 1 0.2923896 5 3
2: 2 2 0.5847791 5 3
3: 2 3 0.8040713 5 3
4: 2 4 0.8040713 5 3
5: 2 5 0.4020356 4 5
6: 3 1 NA NA 3
7: 3 2 NA NA 3
8: 3 3 NA NA 3
9: 4 1 0.4602308 1 4
10: 4 2 1.2528505 1 4
11: 4 3 1.1505770 2 3
12: 4 4 0.8693248 2 3
13: 4 5 0.8181881 1 3
14: 4 6 0.8948932 1 3
15: 4 7 0.7926197 2 3
16: 4 8 0.5113676 2 5
17: 6 1 NA NA 5
18: 8 1 NA NA 5
source to share
Using the same idea as @ Frank's answer (append fac
in data1
):
library(dplyr)
data1 %>%
group_by(carb) %>%
mutate(fac = row_number()) %>%
left_join(., data2)
What gives:
#Source: local data frame [18 x 5]
#Groups: carb
#
# gear carb fac hello hello2
#1 3 2 1 0.2923896 5
#2 3 2 2 0.5847791 5
#3 3 2 3 0.8040713 5
#4 3 2 4 0.8040713 5
#5 5 2 5 0.4020356 4
#6 3 3 1 NA NA
#7 3 3 2 NA NA
#8 3 3 3 NA NA
#9 4 4 1 0.4602308 1
#10 4 4 2 1.2528505 1
#11 3 4 3 1.1505770 2
#12 3 4 4 0.8693248 2
#13 3 4 5 0.8181881 1
#14 3 4 6 0.8948932 1
#15 3 4 7 0.7926197 2
#16 5 4 8 0.5113676 2
#17 5 6 1 NA NA
#18 5 8 1 NA NA
source to share