Computing Anderson-Darlene Test Statistics for Continuous Distributions in R

First of all, I'm not sure if this applies to CrossValidated or StackOverflow. Sorry if I posted this question on the wrong site.

I am comparing several datasets to an observational dataset using R. Each has about 10 million continuous float values โ€‹โ€‹(the length of the data vector is not exactly the same for each dataset).

I usually calculate the Kolmogorov-Smirnov statistics using a function ks.test()

from the standard package stats

, but now I am especially interested in the extreme values โ€‹โ€‹of the distributions. From what I understand, KS is pretty much hiding them. The same happens for Kullback-Leibler (feel free to correct me if I'm wrong).

On the other hand, the Anderson-Darling test is weighted to account for the extremes of the distributions. However, I have not been able to find a simple AD test implementation that only works on two vectors as inputs (as it stats::ks.test()

does by just outputting ks.test(,

where the two inputs are simple vectors) and neither of them "I was able to figure out how to adapt my data to the functions I tested.

I looked at the following features:

  • cvm.test()

    from package dgof

    , with option type="A2"

    : requires distribution as second input, not vector
  • ad.test()

    from package truncgof

    : requires distribution as second input
  • ad.test()

    from package goftest

    : as above
  • ad.test()

    from package ADGofTest

    : as above
  • ad.test()

    from the package kSamples

    : in this case it is not clear to me what the result represents and how I could normalize it as it seems to be highly dependent on the number of samples
  • ad.test()

    from package nortest

    : only tests for normality
  • ADbootstrap.test()

    from the package homtest

    : this seems to be completely different from the standard AD test

None of the above, in short, can be used as a standard function ks.test()

or as a Kullbach-Leibler function KLdiv

from a package flexmix

(which accepts a matrix of density values).

How can I calculate AD statistics between two distributions represented as simply two vectors of continuous data using R?


source to share

1 answer

I am not a statistics expert and I am researching AD test on my own and I have the same question. After reading some articles, I know how to interpret the results ad.test()

on kSamples


The original AD test is designed to test if a sample of numbers is from a specific distribution. Therefore, to compare two samples (or more), we have to use a function that tests the k-sample method instead of the original one.

If you put two vectors into a batch ad.test()

from kSamples


x <- ad.test(c(1,2,3,4,5), c(11,22,33,44,55))


the result gives you a matrix:


Anderson-Darling k-sample test.

Number of samples:  2
Sample sizes:  5, 5
Number of ties: 0

Mean of  Anderson-Darling  Criterion: 1
Standard deviation of  Anderson-Darling  Criterion: 0.63786

T.AD = ( Anderson-Darling  Criterion - mean)/sigma

Null Hypothesis: All samples come from a common population.

              AD  T.AD  asympt. P-value
version 1: 3.913 4.566          0.00517
version 2: 4.010 4.726          0.00452




               AD   T.AD  asympt. P-value
version 1: 3.9127 4.5664        0.0051703
version 2: 4.0100 4.7260        0.0045199


AD - Anderson-Darling statistic calculated according to the corresponding equations. ( ref article ), T.AD is calculated by the formula (AD- (k-1)) / sigma, where (k-1) denotes the limiting distribution of the statistic AD under the null hypothesis is the (k-1) -fold convolution of the asymptotic distribution for statistics with one AD sample; sigma is the standard deviation of AD statistics. Then asymp. The P value would be the "p-value" we are looking for. As for strings, version 1 is a K-shaped AD test in contiguous populations, and version 2 presents it with the original descrete population. So my guess is that if your data is contiguous, you should take the first p-value of the row, and if it is discrete, then the 2nd row.



All Articles