R: How to use the Apply function using multiple row and column entries
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
13 0 14 0 16 0 22 0 20 0 18 0
30 0 30 0 0 0 0 0 0 0 30 0
0 0 29 33 0 48 0 49 0 50 0 33
0 45 30 0 0 55 0 69 55 0 0 40
0 54 0 45 0 48 0 73 0 46 0 36
16 0 15 0 13 0 16 0 24 0 23 0
0 32 26 0 0 57 0 65 49 0 0 32
6 0 6 0 5 0 7 0 6 0 6 0
0 665 310 271 0 646 0 706 0 585 0 516
0 70 41 0 0 101 0 112 112 0 0 90
I currently have this sparse dataframe in which I want to replace all 0s with half the value to the right and replace the December zero with half the January value.
For example, the first line should look like this:
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
6.5 7 7 8 8 11 11 10 10 9 9 6.5
Thus, the total for the year should remain constant and should be distributed more evenly over the months.
I tried to use loops, however my actual data spans several years and hundreds of thousands of lines and loops are too slow. I know that applying is the way to go but can't figure out what my function should be or how to apply it to rows and columns.
thank
source to share
I did dfout
to leave the original df
intact. dfv2
gives us what the value should be placed in dfout
and dfv3
helps to find which columns should be split (the ones that were replaced). Condition df==0
is a key element in this decision. I hope someone post data.table ordplyr decision.
dfout <- df
dfv2 <- data.frame(df[,-1], Jan = df[,1])
dfv3 <- data.frame(Dec = df[,12], df[,-12])
dfout[df==0] <- dfv2[df==0]
dfout[df==0 | dfv3==0] <- dfout[df==0 | dfv3==0] / 2
rowSums(df) - rowSums(dfout)
# [1] 0 0 0 0 0 0 0 0 0 0
dfout
# Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
# 1 6.5 7.0 7.0 8.0 8.0 11.0 11.0 10.0 10.0 9.0 9.0 6.5
# 2 15.0 15.0 15.0 0.0 0.0 0.0 0.0 0.0 0.0 15.0 15.0 15.0
# 3 0.0 14.5 14.5 33.0 24.0 24.0 24.5 24.5 25.0 25.0 16.5 16.5
# 4 22.5 22.5 30.0 0.0 27.5 27.5 34.5 34.5 55.0 0.0 20.0 20.0
# 5 27.0 27.0 22.5 22.5 24.0 24.0 36.5 36.5 23.0 23.0 18.0 18.0
# 6 8.0 7.5 7.5 6.5 6.5 8.0 8.0 12.0 12.0 11.5 11.5 8.0
# 7 16.0 16.0 26.0 0.0 28.5 28.5 32.5 32.5 49.0 0.0 16.0 16.0
# 8 3.0 3.0 3.0 2.5 2.5 3.5 3.5 3.0 3.0 3.0 3.0 3.0
# 9 332.5 332.5 310.0 271.0 323.0 323.0 353.0 353.0 292.5 292.5 258.0 258.0
# 10 35.0 35.0 41.0 0.0 50.5 50.5 56.0 56.0 112.0 0.0 45.0 45.0
Data:
df <- structure(list(Jan = c(13L, 30L, 0L, 0L, 0L, 16L, 0L, 6L, 0L,
0L), Feb = c(0L, 0L, 0L, 45L, 54L, 0L, 32L, 0L, 665L, 70L), Mar = c(14L,
30L, 29L, 30L, 0L, 15L, 26L, 6L, 310L, 41L), Apr = c(0L, 0L,
33L, 0L, 45L, 0L, 0L, 0L, 271L, 0L), May = c(16L, 0L, 0L, 0L,
0L, 13L, 0L, 5L, 0L, 0L), Jun = c(0L, 0L, 48L, 55L, 48L, 0L,
57L, 0L, 646L, 101L), Jul = c(22L, 0L, 0L, 0L, 0L, 16L, 0L, 7L,
0L, 0L), Aug = c(0L, 0L, 49L, 69L, 73L, 0L, 65L, 0L, 706L, 112L
), Sep = c(20L, 0L, 0L, 55L, 0L, 24L, 49L, 6L, 0L, 112L), Oct = c(0L,
0L, 50L, 0L, 46L, 0L, 0L, 0L, 585L, 0L), Nov = c(18L, 30L, 0L,
0L, 0L, 23L, 0L, 6L, 0L, 0L), Dec = c(0L, 0L, 33L, 40L, 36L,
0L, 32L, 0L, 516L, 90L)), .Names = c("Jan", "Feb", "Mar", "Apr",
"May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"), class =
"data.frame", row.names = c(NA, -10L))
Update:
If you want all elements to be non-null, you can use while
:
original.df <- df
while(any(df==0)) {
dfout <- df
dfv2 <- data.frame(df[,-1], Jan = df[,1])
dfv3 <- data.frame(Dec = df[,12], df[,-12])
dfout[df==0] <- dfv2[df==0]
dfout[df==0 | dfv3==0] <- dfout[df==0 | dfv3==0] / 2
df <- dfout
}
rowSums(original.df) - rowSums(dfout)
# [1] 0 0 0 0 0 0 0 0 0 0
dfout
# Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
# 1 6.5 7.0 7.0 8.00 8.00 11.00 11.00 10.0 10.0 9.0 9 6.5
# 2 15.0 15.0 15.0 0.23 0.23 0.47 0.94 1.9 3.8 7.5 15 15.0
# 3 7.2 7.2 14.5 33.00 24.00 24.00 24.50 24.5 25.0 25.0 16 16.5
# 4 22.5 22.5 30.0 13.75 13.75 27.50 34.50 34.5 55.0 10.0 10 20.0
# 5 27.0 27.0 22.5 22.50 24.00 24.00 36.50 36.5 23.0 23.0 18 18.0
# 6 8.0 7.5 7.5 6.50 6.50 8.00 8.00 12.0 12.0 11.5 12 8.0
# 7 16.0 16.0 26.0 14.25 14.25 28.50 32.50 32.5 49.0 8.0 8 16.0
# 8 3.0 3.0 3.0 2.50 2.50 3.50 3.50 3.0 3.0 3.0 3 3.0
# 9 332.5 332.5 310.0 271.00 323.00 323.00 353.00 353.0 292.5 292.5 258 258.0
# 10 35.0 35.0 41.0 25.25 25.25 50.50 56.00 56.0 112.0 22.5 22 45.0
source to share