R: using if / else to add a column to a list with objects of varying length

I am trying to add a column of values ​​to the elements of an R list, where each element has a different length. Here's an example of a list foo:

A   B   C   
1   1   150
1   2   25
1   4   30
2   1   200
2   3   15
3   4   30

      

I first split foo into a list of foo into elements based on each unique value of A. Now I would like to write a function that: a) sums the C values ​​for each value of A, but that b) excludes B when B == 4.c ) The amount is added as a new column D, and d) C is divided by D to get the fraction (column E). Ultimately this will be merged into a new df to look like this:

A   B   C   D   E
1   1   150 175 0.857
1   2   25  175 0.143
1   4   30  175 0.171
2   1   200 215 0.930
2   3   15  215 0.070
3   4   30  0   0/NA

      

However, I am having problems because in some cases, for a given value of A, there are only cases where B == 4 (here where A == 3), so when I try to split C by D, I get error messages.

Is there a way to include an if / else statement in the function so that when A is unique and B's only possible value is 4, the operation is skipped and a non-zero value is placed in the added column by default?

Substituting df in the exception cases where B == 4 makes later operations more difficult, but includes cases where B == 4 makes the sum / proportion calculation imprecise.

Any help is appreciated! Here's the current code:

goo <- lapply(foo,function(df){
  df$D <- sum(df$C, na.rm = TRUE)
  df$E <- df$C / df$D
  ###  .....
  df
})

      

+3


source to share


3 answers


This is how I would do it using dplyr

library(dplyr)
newfoo <- foo %>%
  group_by(A) %>%
  mutate(D = sum(C[B != 4]),
         E = C/D)
#newfoo                   # the resulting data.frame
#Source: local data frame [6 x 5]
#Groups: A
#
#  A B   C   D          E
#1 1 1 150 175 0.85714286
#2 1 2  25 175 0.14285714
#3 1 4  30 175 0.17142857
#4 2 1 200 215 0.93023256
#5 2 3  15 215 0.06976744
#6 3 4  30   0        Inf

      



Or, if you want to avoid Inf

, you can use ifelse

like this:

newfoo <- foo %>%
  group_by(A) %>%
  mutate(D = sum(C[B != 4]),
         E = ifelse(D == 0, 0, C/D))
#Source: local data frame [6 x 5]
#Groups: A
#
#  A B   C   D          E
#1 1 1 150 175 0.85714286
#2 1 2  25 175 0.14285714
#3 1 4  30 175 0.17142857
#4 2 1 200 215 0.93023256
#5 2 3  15 215 0.06976744
#6 3 4  30   0 0.00000000

      

+4


source


And a data.table

(possible) solution

library(data.table)
setDT(foo)[, D := sum(C[B != 4]), by = A][, E := C/D]
# foo
#    A B   C   D          E
# 1: 1 1 150 175 0.85714286
# 2: 1 2  25 175 0.14285714
# 3: 1 4  30 175 0.17142857
# 4: 2 1 200 215 0.93023256
# 5: 2 3  15 215 0.06976744
# 6: 3 4  30   0        Inf

      



Not sure what you want to put into the column E

when A == 3

, but you can use is.finite

for it and avoid messing around with ifelse

like (replacing with null)

setDT(foo)[, D := sum(C[B!=4]), by = A][, E := C/D][!is.finite(E), E := 0]

      

+4


source


Here is a solution using the package base

.

First, make sure the data is modeled appropriately, converting A

to a factor if it is not already one:

df$A <- factor(df$A)

      

We can now compute D

with tapply

, which iterates over the groups and returns the result as a way t

. We do it with

subset

of df

where B != 4

.

df$D <- with(subset(df, B != 4), tapply(C, A, sum))[df$A]

      

Note that since it A

is a factor, we can index it into the table to perform the merge. Now we can use ifelse

to calculate E

:

df$E <- with(df, ifelse(is.na(D), 0, C/D))

      

+3


source







All Articles