Incorporating uncertainty into the pymc3 model

I have a dataset for which I have the mean, standard deviation, and number of observations for each point (i.e., I have knowledge regarding the accuracy of the measure). In the traditional pymc3 model, where I only look at means, I can do something like:

x = data['mean']

with pm.Model() as m:
    a = pm.Normal('a', mu=0, sd=1)
    b = pm.Normal('b', mu=1, sd=1)
    y = a + b*x

    eps= pm.HalfNormal('eps', sd=1)
    likelihood = pm.Normal('likelihood', mu=y, sd=eps, observed=x)

      

What is the best way to include the observation variance information in the model? Obviously, the result must weigh the low variance observations more heavily than the large variance (less definite) observations.

One approach suggested by the statistician was as follows:

x = data['mean']   # mean of observation
x_sd = data['sd']  # sd of observation
x_n = data['n']    # of measures for observation
x_sem = x_sd/np.sqrt(x_n)

with pm.Model() as m:
    a = pm.Normal('a', mu=0, sd=1)
    b = pm.Normal('b', mu=1, sd=1)
    y = a + b*x

    eps = pm.HalfNormal('eps', sd=1)
    obs = mc.Normal('obs', mu=x, sd=x_sem, shape=len(x))
    likelihood = pm.Normal('likelihood', mu=y, eps=eps, observed=obs)

      

However, when I run this I get:

TypeError: observed needs to be data but got: <class 'pymc3.model.FreeRV'>

      

I am running pymc3 master branch (3.0 has some performance issues resulting in very slow fetch times).

+3


source to share


2 answers


You are close, you just need to make small changes. The main reason is that PyMC3 data is always constant. Check out the following code:

with pm.Model() as m:
    a = pm.Normal('a', mu=0, sd=1)
    b = pm.Normal('b', mu=1, sd=1)
    mu = a + b*x

    mu_est = pm.Normal('mu_est', mu, x_sem, shape=len(x))

    likelihood = pm.Normal('likelihood', mu=mu_est, sd=x_sd, observed=x)

      



Note that I am capturing the data and entering the observed uncertainty at two points: for estimate mu_est

and for probability. Of course, you can choose not to use x_sem

or x_sd

and instead evaluate them as you did in your code with a variable eps

.

In a historical note, code with "random data" used to work with PyMC3 (at least for some models), but assuming it was not specifically designed to work this way, the developers decided to prevent the user from using random data, and that explains Your message.

+4


source


I have a question regarding which part of the data has uncertainties:

  • How could you handle this with the uncertainty of y instead of x
  • And how can you deal with uncertainty in both x and y?


Thanks you

0


source







All Articles