Multivariate KS test in R
So, we can run the KS test to evaluate if there is a difference in the distribution of the dtwo datasets as described here .
So let's take the following data
set.seed(123)
N <- 1000
var1 <- runif(N, min=0, max=0.5)
var2 <- runif(N, min=0.3, max=0.7)
var3 <- rbinom(n=N, size=1, prob = 0.45)
df <- data.frame(var1, var2, var3)
Then we can split based on the var3 result
df.1 <- subset(df, var3 == 1) df.2 <- subset(df, var3 == 0)
We can now run the Kolmogorov-Smirnov test to test for differences in the distributions of each individual variable.
ks.test(jitter(df.1$var1), jitter(df.2$var1))
ks.test(jitter(df.1$var2), jitter(df.2$var2))
And it is not surprising that we do not get a difference and can assume that different data was taken from the same distribution. This can be visualized with:
plot(ecdf(df.1$var1), col=2)
lines(ecdf(df.2$var1))
plot(ecdf(df.1$var2), col=3)
lines(ecdf(df.2$var2), col=4)
But now we want to consider whether there are differences between var3==0
and var3==1
, when we consider both var1
, and var2
together.
Is there an R package to run such a test when we have multiple predictors
A similar question was asked here but got no answers
It looks like there is literature: Example 1 Example 2
But nothing has to do with R
source to share
No one has answered this question yet
Check out similar questions: