R: Algorithm for faster detection of missing values
I have a problem setting missing values ββin a dataframe. The first three columns have the product ID, store ID, and the number of weeks. There are also 28 columns from 4 to 31 corresponding to the last 28 days of the sale (the last 7 days are the days of our week). I want to set the missing values ββby comparing two records with the same first and second column, but with a different number of weeks.
corrections <- function(x,y){
#the functions changes vector y if the difference between weeks is not greeter than 3
if (x[1]==y[1] && x[2]==y[2] && -(x[3]-y[3])<=3){
t=y[3]-x[3]
t=as.integer(t)
a=x[(4+ (t*7) ):31]
b=y[4:(31- (t*7)) ]
c= a-b
for (i in 1:(28-(t*7))){
if (is.na(c[i]))
{
if (!(is.na(a[i]) && is.na(b[i])))
{
if (is.na(b[i]))
b[i]=a[i]
else
a[i]=b[i]
}
}
}
y[4:(31- t*7)]=b
}
return(y)
}
for (i in 2:(dim(salesTraining)[1]) {
salesTraining[i,]=corrections(salesTraining[i-1,], salesTraining[i,])
}
The loop takes 1 minute for every 1000 records, so if my data has 212000 records it will take ~ 3.5 hours (if it's linear complexity). Is there any mistake or can I do it better - faster?
Example data frame:
productID storeID weekInData dailySales1 dailySales2 dailySales3 dailySales4 dailySales5
1 1 1 37 0 0 0 0 0
2 1 1 38 0 0 0 0 0
3 1 1 39 0 0 0 0 0
4 1 1 40 0 NA 0 NA 2
5 1 1 41 NA 0 NA 0 0
6 1 1 42 0 0 0 NA 0
7 1 1 43 0 0 NA 0 NA
8 1 1 44 0 2 1 NA 0
9 1 1 45 NA 0 0 NA 0
10 1 1 46 NA 0 0 NA NA
dailySales6 dailySales7 dailySales8 dailySales9 dailySales10 dailySales11 dailySales12 dailySales13
1 NA NA 0 NA 0 0 0 0
2 0 NA NA 0 0 0 0 0
3 0 NA 0 0 0 NA 2 NA
4 0 NA 0 NA 0 NA 0 0
5 0 0 NA 0 0 0 0 0
6 NA 0 NA 0 0 0 0 0
7 0 0 0 2 NA 0 0 0
8 0 NA 0 NA 0 NA 0 1
9 1 0 0 0 0 0 1 0
10 0 0 0 NA 0 NA 0 0
dailySales14 dailySales15 dailySales16 dailySales17 dailySales18 dailySales19 dailySales20
1 0 0 0 0 0 0 0
2 0 0 0 0 5 2 NA
3 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0
6 0 0 2 1 0 0 NA
7 0 0 0 0 0 0 1
8 0 0 0 0 0 1 0
9 0 0 -1 0 0 0 0
10 0 0 0 0 0 0 0
dailySales21 dailySales22 dailySales23 dailySales24 dailySales25 dailySales26 dailySales27
1 NA 0 0 0 5 2 0
2 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0
5 0 0 NA 1 0 0 0
6 0 0 0 0 0 0 1
7 0 0 0 0 0 1 0
8 0 0 NA 0 0 0 0
9 NA 0 0 0 NA 0 0
10 0 1 0 0 0 0 0
dailySales28 daysStoreClosed_series daysStoreClosed_target dayOfMonth dayOfYear weekOfYear month
1 0 5 2 23 356 51 12
2 0 6 2 30 363 52 12
3 0 6 1 6 5 1 1
4 0 6 1 13 12 2 1
5 0 6 1 19 18 3 1
6 0 5 1 26 25 4 1
7 0 4 1 2 32 5 2
8 0 4 1 9 39 6 2
9 0 4 1 16 46 7 2
10 0 4 1 23 53 8 2
quarter
1 4
2 4
3 1
4 1
5 1
6 1
7 1
8 1
9 1
10 1
+3
source to share
No one has answered this question yet
Check out similar questions: