Condition and row and column double loop
I have an "out of bounds index" problem, I want to get the first and last month for each observation where I have three consecutive "1" or "True". I want to create 2 new columns "begin" and "end" where I will get the corresponding first month and last month. In my example for the first observation: start equal to avril and end equal to juin In 5 observations: start equal to ferrier and finish equal to avril In 9 observations: start equal to January and end equal to Mars ...
I tried to do this:
nom <- letters[1:5]
pseudo <- paste(name, 21:25, sep = "")
janvier <- c(0, 1, 1, 1, 0)
fevrier <- c(1, 1, 1, 1, 1)
mars <- c(0, 0, 0, 1, 1)
avril <- c(1, 1, 1, 0, 1)
mai <- c(1, 0, 1, 1, 1)
juin <- c(1, 1, 0, 1, 0)
df <- data.frame(nom =nom, pseudo = pseudo, janvier = janvier,
fevrier = fevrier, mars = mars, avril = avril,
mai = mai, juin = juin)
dfm <- as.matrix(df[, -c(1, 2)])
my_matrix <- matrix(nrow = 10, ncol = 6)
for(i in 1:dim(dfm)[1]){
for(j in 1:dim(dfm)[2]){
if(dfm[i, j] + dfm[i, j+1] + dfm[i, j+2] == 3){
my_matrix[i, j] <- "periode_ok"
my_matrix[i, j+1] <- "periode_ok"
my_matrix[i, j+2] <- "periode_ok"
}
}
}
The output should be as follows:
begin <- c("avril", "no info", "no info",
"janvier", "fevrier", "avril", "no info",
"no info", "janvier", "fevrier")
end <- c("juin", "no info", "no info", "mars",
"avril", "juin", "no info", "no info",
"mars", "avril")
output <- data.frame(nom =nom, pseudo = pseudo, janvier = janvier,
fevrier = fevrier, mars = mars, avril = avril,
mai = mai, juin = juin, begin = begin,end = end)
Any help would be appreciated
source to share
First of all, type constructs are 1:dim(dfm)[1]
dangerous, because if dim(dfm)[1]
equal to zero, you will get an absolutely valid vector 1:0
, and the loop will try to access the zero element of the vector, or in this case, the matrix. This is illegal and throws an error. The recommended solution is to use seq_len(...)
. Second, dim(dfm)[.]
I used nrow
and instead ncol
. Now for your mistake. You are trying to refer to columns j + 1
and j + 2
, therefore, when it j
reaches ncol(dfm)
, you are not bound by bonds. The code below removes the last two elements of the loop constraint.
n <- ncol(dfm)
for(i in seq_len(nrow(dfm))){
for(j in seq_len(n)[-c(n - 1, n)]){
if(dfm[i, j] + dfm[i, j+1] + dfm[i, j+2] == 3){
my_matrix[i, j] <- "periode_ok"
my_matrix[i, j+1] <- "periode_ok"
my_matrix[i, j+2] <- "periode_ok"
}
}
}
my_matrix
source to share
Of course, there is a vectorized solution for this, but if you want to fix the for loop, you need to constrain j
to size dfm
minus 2 as you are checking if there are two columns ahead. Based on what you have provided this will help you; however, it is not clear how you get 10 lines (duplicate twice) out of 5 lines df
.
my_matrix <- matrix("no info", nrow = 5, ncol = 2)
colnames(my_matrix) <- c("begin", "end")
for(i in 1:dim(dfm)[1]){
for(j in 1:(dim(dfm)[2]-2)){
if(dfm[i, j] + dfm[i, j+1] + dfm[i, j+2] == 3){
my_matrix[i, 1] <- colnames(dfm)[j]
my_matrix[i, 2] <- colnames(dfm)[j+2]
break
}
}
}
output <- cbind(df, my_matrix)
Then the result will be:
output
# nom pseudo janvier fevrier mars avril mai juin begin end
# 1 a name21 0 1 0 1 1 1 avril juin
# 2 b name22 1 1 0 1 0 1 no info no info
# 3 c name23 1 1 0 1 1 0 no info no info
# 4 d name24 1 1 1 0 1 1 janvier mars
# 5 e name25 0 1 1 1 1 0 fevrier avril
source to share