Choosing the number of smallest 5 values ​​for each row in a data frame in r

Let's say I have a dataframe:

df=df=data.frame('var1'=c(1,3,5,7),'var2'=c(4,6,8,10),var3=c(11,12,13,14))
df

  var1 var2 var3
    1    4   11
    3    6   12
    5    8   13
    7   10   14

      

Now I am calculating the distance of each line with every other line using var1 and var2

library(fields)
df_dist=df_dist=rdist(df[,1:2])
df_dist
         1        2        3        4
1 0.000000 2.828427 5.656854 8.485281
2 2.828427 0.000000 2.828427 5.656854
3 5.656854 2.828427 0.000000 2.828427
4 8.485281 5.656854 2.828427 0.000000

      

Now my goal is to select the two column names from each row that have the lowest values ​​in that row (excluding 0, i.e. distance from itself), so for row 1 the output should be colname = 2 and 3, similarly for row 2 the output should be 1 and 3, etc.

I can do this using a for loop, but it takes a long time for a large dataset, is there a better way to use apply, lapply, etc. that might save some money this time.

The loop code for the loop looks like this:

d=as.data.frame(df_dist)
#Setting the column and row names as var3 values
colnames(d)<-df$var3
rownames(d)<-df$var3

#Intitialiazing variable e
e<-NULL


for (i in 1:nrow(d))
{

  tmp=colnames(d)[order(d[i,], decreasing=FALSE)][2:3]  
  e<-rbind(e,tmp)
}

f=as.data.frame(e)

rownames(f)<-df$var3

      

+3


source to share


1 answer


It works:

df = read.table(text="1        2        3        4
1 0.000000 2.828427 5.656854 8.485281
2 2.828427 0.000000 2.828427 5.656854
3 5.656854 2.828427 0.000000 2.828427
4 8.485281 5.656854 2.828427 0.000000")

t(apply(df,1,function(x) colnames(df)[order(x)[2:3]]  ))

      

OUTPUT:



  [,1] [,2]
1 "X2" "X3"
2 "X1" "X3"
3 "X2" "X4"
4 "X3" "X2"

      

So for row 4, column X3 contains the lowest value and X2 contains the second.

Hope this helps!

0


source







All Articles