Incorporating uncertainty into the pymc3 model

Question

Incorporating uncertainty into the pymc3 model

I have a dataset for which I have the mean, standard deviation, and number of observations for each point (i.e., I have knowledge regarding the accuracy of the measure). In the traditional pymc3 model, where I only look at means, I can do something like:

x = data['mean']

with pm.Model() as m:
    a = pm.Normal('a', mu=0, sd=1)
    b = pm.Normal('b', mu=1, sd=1)
    y = a + b*x

    eps= pm.HalfNormal('eps', sd=1)
    likelihood = pm.Normal('likelihood', mu=y, sd=eps, observed=x)

What is the best way to include the observation variance information in the model? Obviously, the result must weigh the low variance observations more heavily than the large variance (less definite) observations.

One approach suggested by the statistician was as follows:

x = data['mean']   # mean of observation
x_sd = data['sd']  # sd of observation
x_n = data['n']    # of measures for observation
x_sem = x_sd/np.sqrt(x_n)

with pm.Model() as m:
    a = pm.Normal('a', mu=0, sd=1)
    b = pm.Normal('b', mu=1, sd=1)
    y = a + b*x

    eps = pm.HalfNormal('eps', sd=1)
    obs = mc.Normal('obs', mu=x, sd=x_sem, shape=len(x))
    likelihood = pm.Normal('likelihood', mu=y, eps=eps, observed=obs)

However, when I run this I get:

TypeError: observed needs to be data but got: <class 'pymc3.model.FreeRV'>

I am running pymc3 master branch (3.0 has some performance issues resulting in very slow fetch times).

+3

statistics pymc3

Brad May 23 '17 at 18:19

source to share

2 answers

I have a question regarding which part of the data has uncertainties:

How could you handle this with the uncertainty of y instead of x
And how can you deal with uncertainty in both x and y?

Thanks you

0

enderswype June 13. At 10:12

source to share

aloctavodia · Accepted Answer · 2017-05-24T06:31:03+0000

You are close, you just need to make small changes. The main reason is that PyMC3 data is always constant. Check out the following code:

with pm.Model() as m:
    a = pm.Normal('a', mu=0, sd=1)
    b = pm.Normal('b', mu=1, sd=1)
    mu = a + b*x

    mu_est = pm.Normal('mu_est', mu, x_sem, shape=len(x))

    likelihood = pm.Normal('likelihood', mu=mu_est, sd=x_sd, observed=x)

Note that I am capturing the data and entering the observed uncertainty at two points: for estimate mu_est

and for probability. Of course, you can choose not to use x_sem

or x_sd

and instead evaluate them as you did in your code with a variable eps

.

In a historical note, code with "random data" used to work with PyMC3 (at least for some models), but assuming it was not specifically designed to work this way, the developers decided to prevent the user from using random data, and that explains Your message.

Incorporating uncertainty into the pymc3 model

More articles: