An efficient way to add a row with a difference from a specific group in a data table

I have the following data table:

require(data.table)
dt1 <- data.table(ind = 1:8, cat = c("A", "A", "A", "B", "B", "C", "C", "D"), counts = (10:3))

    ind cat counts
1:   1   A     10
2:   2   A      9
3:   3   A      8
4:   4   B      7
5:   5   B      6
6:   6   C      5
7:   7   C      4
8:   8   D      3

      

What I would like to achieve is to add a line for each cat, which in the calculation has the difference between the sum (number) of the cat and the sum (number) of the cat A. For these lines, ind should be 0. Basically, I would like to draw attention for the following information:

added_info <- cbind(ind =0, dt1[, .(counts = dt1[cat == "A", sum(counts)] - sum(counts)), by = cat])

> added_info
   ind cat counts
1:   0   A      0
2:   0   B     14
3:   0   C     18
4:   0   D     24

      

And the end result will be:

dt1 <- rbind(dt1, added_info)[order(cat)]

> dt1
    ind cat counts
 1:   1   A     10
 2:   2   A      9
 3:   3   A      8
 4:   0   A      0
 5:   4   B      7
 6:   5   B      6
 7:   0   B     14
 8:   6   C      5
 9:   7   C      4
10:   0   C     18
11:   8   D      3
12:   0   D     24

      

My question is, is there a better (shorter) way to achieve this using datatable (perhaps using .I or .N ??)

+3


source to share


3 answers


You could do



require(data.table)
dt1 <- data.table(ind = 1:8, cat = c("A", "A", "A", "B", "B", "C", "C", "D"), counts = (10:3))
dt1[,c:=sum(counts[cat=="A"])][,.(ind=c(ind,0), counts=c(counts,c[.N]-sum(counts))),cat][]
#     cat ind counts
#  1:   A   1     10
#  2:   A   2      9
#  3:   A   3      8
#  4:   A   0      0
#  5:   B   4      7
#  6:   B   5      6
#  7:   B   0     14
#  8:   C   6      5
#  9:   C   7      4
# 10:   C   0     18
# 11:   D   8      3
# 12:   D   0     24

      

+4


source


This could be a solution within one call to data.table:

dt1[, rbind(.SD, 
            data.table(ind = 0, 
                       counts = dt1[cat == 'A', sum(counts)] - sum(.SD$counts))), 
    by = cat]

      



Of:

   cat ind counts
 1:   A   1     10
 2:   A   2      9
 3:   A   3      8
 4:   A   0      0
 5:   B   4      7
 6:   B   5      6
 7:   B   0     14
 8:   C   6      5
 9:   C   7      4
10:   C   0     18
11:   D   8      3
12:   D   0     24

      

+1


source


You said you were effective, so ... That's two times; unique, most likely vectorized, and data.table by for the sum must compile in a c loop.

> dt1[, .SD
      ][, ca := sum(.SD[cat == 'A', counts])
      ][, cc := sum(counts), cat
      ][, cd := ca - cc
      ][, rbind(.SD, unique(.SD, by=c('cat'))[, `:=`(ind=0)])
      ][ind == 0, counts := cd
      ][, .(cat, ind, counts)
      ][order(cat, ind)
      ]

    cat ind counts
 1:   A   0      0
 2:   A   1     10
 3:   A   2      9
 4:   A   3      8
 5:   B   0     14
 6:   B   4      7
 7:   B   5      6
 8:   C   0     18
 9:   C   6      5
10:   C   7      4
11:   D   0     24
12:   D   8      3
> 

      

0


source







All Articles