Multivariate KS test in R

So, we can run the KS test to evaluate if there is a difference in the distribution of the dtwo datasets as described here .

So let's take the following data

set.seed(123)
N <- 1000
var1 <- runif(N, min=0, max=0.5)
var2 <- runif(N, min=0.3, max=0.7)
var3 <- rbinom(n=N, size=1, prob = 0.45)

df <- data.frame(var1, var2, var3)

      

Then we can split based on the var3 result

df.1 <- subset(df, var3 == 1)
df.2 <- subset(df, var3 == 0)

      

We can now run the Kolmogorov-Smirnov test to test for differences in the distributions of each individual variable.

ks.test(jitter(df.1$var1), jitter(df.2$var1))
ks.test(jitter(df.1$var2), jitter(df.2$var2))

      

And it is not surprising that we do not get a difference and can assume that different data was taken from the same distribution. This can be visualized with:

plot(ecdf(df.1$var1), col=2)
lines(ecdf(df.2$var1))

plot(ecdf(df.1$var2), col=3)
lines(ecdf(df.2$var2), col=4)

      

But now we want to consider whether there are differences between var3==0

and var3==1

, when we consider both var1

, and var2

together. Is there an R package to run such a test when we have multiple predictors

A similar question was asked here but got no answers

It looks like there is literature: Example 1 Example 2

But nothing has to do with R

+3


source to share





All Articles