An efficient way to add a row with a difference from a specific group in a data table
I have the following data table:
require(data.table)
dt1 <- data.table(ind = 1:8, cat = c("A", "A", "A", "B", "B", "C", "C", "D"), counts = (10:3))
ind cat counts
1: 1 A 10
2: 2 A 9
3: 3 A 8
4: 4 B 7
5: 5 B 6
6: 6 C 5
7: 7 C 4
8: 8 D 3
What I would like to achieve is to add a line for each cat, which in the calculation has the difference between the sum (number) of the cat and the sum (number) of the cat A. For these lines, ind should be 0. Basically, I would like to draw attention for the following information:
added_info <- cbind(ind =0, dt1[, .(counts = dt1[cat == "A", sum(counts)] - sum(counts)), by = cat])
> added_info
ind cat counts
1: 0 A 0
2: 0 B 14
3: 0 C 18
4: 0 D 24
And the end result will be:
dt1 <- rbind(dt1, added_info)[order(cat)]
> dt1
ind cat counts
1: 1 A 10
2: 2 A 9
3: 3 A 8
4: 0 A 0
5: 4 B 7
6: 5 B 6
7: 0 B 14
8: 6 C 5
9: 7 C 4
10: 0 C 18
11: 8 D 3
12: 0 D 24
My question is, is there a better (shorter) way to achieve this using datatable (perhaps using .I or .N ??)
+3
source to share
3 answers
You could do
require(data.table)
dt1 <- data.table(ind = 1:8, cat = c("A", "A", "A", "B", "B", "C", "C", "D"), counts = (10:3))
dt1[,c:=sum(counts[cat=="A"])][,.(ind=c(ind,0), counts=c(counts,c[.N]-sum(counts))),cat][]
# cat ind counts
# 1: A 1 10
# 2: A 2 9
# 3: A 3 8
# 4: A 0 0
# 5: B 4 7
# 6: B 5 6
# 7: B 0 14
# 8: C 6 5
# 9: C 7 4
# 10: C 0 18
# 11: D 8 3
# 12: D 0 24
+4
source to share
This could be a solution within one call to data.table:
dt1[, rbind(.SD,
data.table(ind = 0,
counts = dt1[cat == 'A', sum(counts)] - sum(.SD$counts))),
by = cat]
Of:
cat ind counts
1: A 1 10
2: A 2 9
3: A 3 8
4: A 0 0
5: B 4 7
6: B 5 6
7: B 0 14
8: C 6 5
9: C 7 4
10: C 0 18
11: D 8 3
12: D 0 24
+1
source to share
You said you were effective, so ... That's two times; unique, most likely vectorized, and data.table by for the sum must compile in a c loop.
> dt1[, .SD
][, ca := sum(.SD[cat == 'A', counts])
][, cc := sum(counts), cat
][, cd := ca - cc
][, rbind(.SD, unique(.SD, by=c('cat'))[, `:=`(ind=0)])
][ind == 0, counts := cd
][, .(cat, ind, counts)
][order(cat, ind)
]
cat ind counts
1: A 0 0
2: A 1 10
3: A 2 9
4: A 3 8
5: B 0 14
6: B 4 7
7: B 5 6
8: C 0 18
9: C 6 5
10: C 7 4
11: D 0 24
12: D 8 3
>
0
source to share