R: How to use the Apply function using multiple row and column entries

Jan   Feb   Mar   Apr   May   Jun   Jul   Aug  Sep   Oct   Nov   Dec  

 13    0     14     0    16     0    22     0    20     0    18     0
 30    0     30     0     0     0     0     0     0     0    30     0
 0     0     29    33     0    48     0    49     0    50     0    33
 0     45    30     0     0    55     0    69    55     0     0    40
 0     54    0     45     0    48     0    73     0    46     0    36
 16    0     15     0    13     0    16     0    24     0    23     0
 0     32    26     0     0    57     0    65    49     0     0    32
 6     0     6      0     5     0     7     0     6     0     6     0
 0     665   310   271    0   646     0   706     0   585     0   516
 0     70    41     0     0   101     0   112   112     0     0    90

      

I currently have this sparse dataframe in which I want to replace all 0s with half the value to the right and replace the December zero with half the January value.

For example, the first line should look like this:

Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov   Dec

6.5    7    7     8     8     11     11    10    10     9     9    6.5

      

Thus, the total for the year should remain constant and should be distributed more evenly over the months.

I tried to use loops, however my actual data spans several years and hundreds of thousands of lines and loops are too slow. I know that applying is the way to go but can't figure out what my function should be or how to apply it to rows and columns.

thank

+3


source to share


1 answer


I did dfout

to leave the original df

intact. dfv2

gives us what the value should be placed in dfout

and dfv3

helps to find which columns should be split (the ones that were replaced). Condition df==0

is a key element in this decision. I hope someone post data.table or decision.

dfout <- df
dfv2 <- data.frame(df[,-1], Jan = df[,1])   
dfv3 <- data.frame(Dec = df[,12], df[,-12]) 
dfout[df==0] <- dfv2[df==0]
dfout[df==0 | dfv3==0] <- dfout[df==0 | dfv3==0] / 2

rowSums(df) - rowSums(dfout)

#  [1] 0 0 0 0 0 0 0 0 0 0


dfout

#      Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov   Dec 
# 1    6.5   7.0   7.0   8.0   8.0  11.0  11.0  10.0  10.0   9.0   9.0   6.5 
# 2   15.0  15.0  15.0   0.0   0.0   0.0   0.0   0.0   0.0  15.0  15.0  15.0 
# 3    0.0  14.5  14.5  33.0  24.0  24.0  24.5  24.5  25.0  25.0  16.5  16.5 
# 4   22.5  22.5  30.0   0.0  27.5  27.5  34.5  34.5  55.0   0.0  20.0  20.0 
# 5   27.0  27.0  22.5  22.5  24.0  24.0  36.5  36.5  23.0  23.0  18.0  18.0 
# 6    8.0   7.5   7.5   6.5   6.5   8.0   8.0  12.0  12.0  11.5  11.5   8.0 
# 7   16.0  16.0  26.0   0.0  28.5  28.5  32.5  32.5  49.0   0.0  16.0  16.0 
# 8    3.0   3.0   3.0   2.5   2.5   3.5   3.5   3.0   3.0   3.0   3.0   3.0 
# 9  332.5 332.5 310.0 271.0 323.0 323.0 353.0 353.0 292.5 292.5 258.0 258.0 
# 10  35.0  35.0  41.0   0.0  50.5  50.5  56.0  56.0 112.0   0.0  45.0  45.0

      

Data:

df <- structure(list(Jan = c(13L, 30L, 0L, 0L, 0L, 16L, 0L, 6L, 0L, 
0L), Feb = c(0L, 0L, 0L, 45L, 54L, 0L, 32L, 0L, 665L, 70L), Mar = c(14L, 
30L, 29L, 30L, 0L, 15L, 26L, 6L, 310L, 41L), Apr = c(0L, 0L, 
33L, 0L, 45L, 0L, 0L, 0L, 271L, 0L), May = c(16L, 0L, 0L, 0L, 
0L, 13L, 0L, 5L, 0L, 0L), Jun = c(0L, 0L, 48L, 55L, 48L, 0L, 
57L, 0L, 646L, 101L), Jul = c(22L, 0L, 0L, 0L, 0L, 16L, 0L, 7L, 
0L, 0L), Aug = c(0L, 0L, 49L, 69L, 73L, 0L, 65L, 0L, 706L, 112L
), Sep = c(20L, 0L, 0L, 55L, 0L, 24L, 49L, 6L, 0L, 112L), Oct = c(0L, 
0L, 50L, 0L, 46L, 0L, 0L, 0L, 585L, 0L), Nov = c(18L, 30L, 0L, 
0L, 0L, 23L, 0L, 6L, 0L, 0L), Dec = c(0L, 0L, 33L, 40L, 36L, 
0L, 32L, 0L, 516L, 90L)), .Names = c("Jan", "Feb", "Mar", "Apr", 
"May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"), class = 
"data.frame", row.names = c(NA, -10L))

      



Update:

If you want all elements to be non-null, you can use while

:

original.df <- df

while(any(df==0)) {
                   dfout <- df
                   dfv2 <- data.frame(df[,-1], Jan = df[,1])   
                   dfv3 <- data.frame(Dec = df[,12], df[,-12]) 
                   dfout[df==0] <- dfv2[df==0]
                   dfout[df==0 | dfv3==0] <- dfout[df==0 | dfv3==0] / 2
                   df <- dfout
                  }

rowSums(original.df) - rowSums(dfout)

#  [1] 0 0 0 0 0 0 0 0 0 0

dfout

#      Jan   Feb   Mar    Apr    May    Jun    Jul   Aug   Sep   Oct Nov   Dec 
# 1    6.5   7.0   7.0   8.00   8.00  11.00  11.00  10.0  10.0   9.0   9   6.5 
# 2   15.0  15.0  15.0   0.23   0.23   0.47   0.94   1.9   3.8   7.5  15  15.0 
# 3    7.2   7.2  14.5  33.00  24.00  24.00  24.50  24.5  25.0  25.0  16  16.5 
# 4   22.5  22.5  30.0  13.75  13.75  27.50  34.50  34.5  55.0  10.0  10  20.0 
# 5   27.0  27.0  22.5  22.50  24.00  24.00  36.50  36.5  23.0  23.0  18  18.0 
# 6    8.0   7.5   7.5   6.50   6.50   8.00   8.00  12.0  12.0  11.5  12   8.0 
# 7   16.0  16.0  26.0  14.25  14.25  28.50  32.50  32.5  49.0   8.0   8  16.0 
# 8    3.0   3.0   3.0   2.50   2.50   3.50   3.50   3.0   3.0   3.0   3   3.0 
# 9  332.5 332.5 310.0 271.00 323.00 323.00 353.00 353.0 292.5 292.5 258 258.0 
# 10  35.0  35.0  41.0  25.25  25.25  50.50  56.00  56.0 112.0  22.5  22  45.0

      

+5


source







All Articles