Choosing the number of smallest 5 values ββfor each row in a data frame in r
Let's say I have a dataframe:
df=df=data.frame('var1'=c(1,3,5,7),'var2'=c(4,6,8,10),var3=c(11,12,13,14))
df
var1 var2 var3
1 4 11
3 6 12
5 8 13
7 10 14
Now I am calculating the distance of each line with every other line using var1 and var2
library(fields)
df_dist=df_dist=rdist(df[,1:2])
df_dist
1 2 3 4
1 0.000000 2.828427 5.656854 8.485281
2 2.828427 0.000000 2.828427 5.656854
3 5.656854 2.828427 0.000000 2.828427
4 8.485281 5.656854 2.828427 0.000000
Now my goal is to select the two column names from each row that have the lowest values ββin that row (excluding 0, i.e. distance from itself), so for row 1 the output should be colname = 2 and 3, similarly for row 2 the output should be 1 and 3, etc.
I can do this using a for loop, but it takes a long time for a large dataset, is there a better way to use apply, lapply, etc. that might save some money this time.
The loop code for the loop looks like this:
d=as.data.frame(df_dist)
#Setting the column and row names as var3 values
colnames(d)<-df$var3
rownames(d)<-df$var3
#Intitialiazing variable e
e<-NULL
for (i in 1:nrow(d))
{
tmp=colnames(d)[order(d[i,], decreasing=FALSE)][2:3]
e<-rbind(e,tmp)
}
f=as.data.frame(e)
rownames(f)<-df$var3
+3
source to share
1 answer
It works:
df = read.table(text="1 2 3 4
1 0.000000 2.828427 5.656854 8.485281
2 2.828427 0.000000 2.828427 5.656854
3 5.656854 2.828427 0.000000 2.828427
4 8.485281 5.656854 2.828427 0.000000")
t(apply(df,1,function(x) colnames(df)[order(x)[2:3]] ))
OUTPUT:
[,1] [,2]
1 "X2" "X3"
2 "X1" "X3"
3 "X2" "X4"
4 "X3" "X2"
So for row 4, column X3 contains the lowest value and X2 contains the second.
Hope this helps!
0
source to share