An efficient way to add a row with a difference from a specific group in a data table

Question

An efficient way to add a row with a difference from a specific group in a data table

I have the following data table:

require(data.table)
dt1 <- data.table(ind = 1:8, cat = c("A", "A", "A", "B", "B", "C", "C", "D"), counts = (10:3))

    ind cat counts
1:   1   A     10
2:   2   A      9
3:   3   A      8
4:   4   B      7
5:   5   B      6
6:   6   C      5
7:   7   C      4
8:   8   D      3

What I would like to achieve is to add a line for each cat, which in the calculation has the difference between the sum (number) of the cat and the sum (number) of the cat A. For these lines, ind should be 0. Basically, I would like to draw attention for the following information:

added_info <- cbind(ind =0, dt1[, .(counts = dt1[cat == "A", sum(counts)] - sum(counts)), by = cat])

> added_info
   ind cat counts
1:   0   A      0
2:   0   B     14
3:   0   C     18
4:   0   D     24

And the end result will be:

dt1 <- rbind(dt1, added_info)[order(cat)]

> dt1
    ind cat counts
 1:   1   A     10
 2:   2   A      9
 3:   3   A      8
 4:   0   A      0
 5:   4   B      7
 6:   5   B      6
 7:   0   B     14
 8:   6   C      5
 9:   7   C      4
10:   0   C     18
11:   8   D      3
12:   0   D     24

My question is, is there a better (shorter) way to achieve this using datatable (perhaps using .I or .N ??)

+3

r data.table

User2321 05 May '17 at 9:09

source to share

3 answers

This could be a solution within one call to data.table:

dt1[, rbind(.SD, 
            data.table(ind = 0, 
                       counts = dt1[cat == 'A', sum(counts)] - sum(.SD$counts))), 
    by = cat]

Of:

   cat ind counts
 1:   A   1     10
 2:   A   2      9
 3:   A   3      8
 4:   A   0      0
 5:   B   4      7
 6:   B   5      6
 7:   B   0     14
 8:   C   6      5
 9:   C   7      4
10:   C   0     18
11:   D   8      3
12:   D   0     24

+1

LyzandeR 05 May '17 at 9:31

source to share

You said you were effective, so ... That's two times; unique, most likely vectorized, and data.table by for the sum must compile in a c loop.

> dt1[, .SD
      ][, ca := sum(.SD[cat == 'A', counts])
      ][, cc := sum(counts), cat
      ][, cd := ca - cc
      ][, rbind(.SD, unique(.SD, by=c('cat'))[, `:=`(ind=0)])
      ][ind == 0, counts := cd
      ][, .(cat, ind, counts)
      ][order(cat, ind)
      ]

    cat ind counts
 1:   A   0      0
 2:   A   1     10
 3:   A   2      9
 4:   A   3      8
 5:   B   0     14
 6:   B   4      7
 7:   B   5      6
 8:   C   0     18
 9:   C   6      5
10:   C   7      4
11:   D   0     24
12:   D   8      3
>

0

Clayton stanley 05 May '17 at 19:33

source to share

lukeA · Accepted Answer · 2017-05-05T11:43:28+0000

You could do

require(data.table)
dt1 <- data.table(ind = 1:8, cat = c("A", "A", "A", "B", "B", "C", "C", "D"), counts = (10:3))
dt1[,c:=sum(counts[cat=="A"])][,.(ind=c(ind,0), counts=c(counts,c[.N]-sum(counts))),cat][]
#     cat ind counts
#  1:   A   1     10
#  2:   A   2      9
#  3:   A   3      8
#  4:   A   0      0
#  5:   B   4      7
#  6:   B   5      6
#  7:   B   0     14
#  8:   C   6      5
#  9:   C   7      4
# 10:   C   0     18
# 11:   D   8      3
# 12:   D   0     24

An efficient way to add a row with a difference from a specific group in a data table

More articles: