Python: inverse empirical cumulative distribution function (ECDF)?

We can create an ECDF with

import numpy as np
from statsmodels.distributions.empirical_distribution import ECDF
ecdf = ECDF([3, 3, 1, 4])

      

and get ECDF at point c

ecdf(x)

      

However, what if I want to know x for the 97.5% percentile?

Out http://www.statsmodels.org/stable/generated/statsmodels.distributions.empirical_distribution.ECDF.html?highlight=ecdf

doesn't seem to have been implemented.

Is there a way to do this? Or any other libraries?

+3


source to share


2 answers


Since the empirical CDF simply puts a 1 / n mass at each data point, the 97.5th quantile is just a data point that is greater than 97.5% of all other data points. To find this value, you can simply sort the data in ascending order and find the largest value, 0.975n.

sample = [1, 5, 2, 10, -19, 4, 7, 2, 0, -1]
n = len(sample)
sort = sorted(sample)
print sort[int(n * 0.975)]

      

What produces:

10

      



Since we remember how for discrete distributions (for example, empirical cdf), the quantile function is defined here (sorry, cannot insert, but for now this is my first post), we understand that we should take the largest value of 0.975nth (rounded) ...

Hope this helps!

Edited (1/16/18) for readability.

+5


source


This is my suggestion. Linear interpolation, because dfs are only efficiently estimated from fairly large samples. Interpolation line segments can be obtained because their endpoints occur at different values ​​in the pattern.

import statsmodels.distributions.empirical_distribution as edf
from scipy.interpolate import interp1d
import numpy as np
import matplotlib.pyplot as plt

sample = [1,4,2,6,5,5,3,3,5,7]
sample_edf = edf.ECDF(sample)

slope_changes = sorted(set(sample))

sample_edf_values_at_slope_changes = [ sample_edf(item) for item in slope_changes]
inverted_edf = interp1d(sample_edf_values_at_slope_changes, slope_changes)

x = np.linspace(0.1, 1)
y = inverted_edf(x)
plt.plot(x, y, 'ro', x, y, 'b-')
plt.show()

print ('97.5 percentile:', inverted_edf(0.975))

      

It produces the following output:



97.5 percentile: 6.75

      

and this graph. inverted empirical cdf

+2


source







All Articles