Predicting Basic Bayesian Linear Regression with PyMC3

I would like to use my PyMC3 LR model to get an 80% HPD range for the predicted variable value y

as new data becomes available. Thus, extrapolate the reliable distribution of values ​​for y

for a new value x

not in my original dataset.

Model:

with pm.Model() as model_tlr:
    alpha = pm.Normal('alpha', mu=0, sd=10)
    beta = pm.Normal('beta', mu=0, sd=10)
    epsilon = pm.Uniform('epsilon', 0, 25)

    nu = pm.Deterministic('nu', pm.Exponential('nu_', 1/29) + 1)
    mu = pm.Deterministic('mu', alpha + beta * x)

    yl = pm.StudentT('yl', mu=mu, sd=epsilon, nu=nu, observed=y)

    trace_tlr = pm.sample(50000, njobs=3)

      

After the burn, I take a sample from the back and get HPD

ppc_tlr = pm.sample_ppc(btrace_tlr, samples=10000, model=model_tlr)
ys = ppc_tlr['yl']
y_hpd = pm.stats.hpd(ys, alpha=0.2)

      

This is great for HPD rendering around a central trend (using fill_between) enter image description here

But I would like now to use a model to get HPD for y

when x=126.2

(for example) and the original dataset does not contain an observablex=126.2

The way I understand rear-end sampling is that x

there are 10k samples in the dataset for each of the available values , and hence there is no corresponding sample in ys

for x=126.2

, since it was not, t.

Basically, is there a way to use my model to get a distribution of valid values ​​(model-based) from a predictor value x=126.2

that only became available after the model was built? If so, how?

thank

EDIT:
Found SO Post which mentions

A feature under development (likely to be added to pymc3) to predict new data.

Does it exist?

+3


source to share


1 answer


OK, so this is possible, more or less as described in the previous SO post. However, a sample_ppc function has since been added to PyMC3 which makes the author's run_ppc redundant.

First, set up the Anano variable for x.

from theano import shared
x_shared = shared(x)

      

Then use x_shared when creating the model.

After creating the model, add a new database and update the shared variable

x_updated = np.append(x, 126.2)
x_shared.set_value(x_updated)

      

Re-run the PPC Sample Generator with original trace and model objects



new_ppc = pm.sample_ppc(btrace_tlr, samples=10000, model=model_tlr)

      

Sampling the back of a new database with

sample = new_ppc['yl'][:,-1]

      

Then I can get HPD with

pm.stats.hpd(sample)

      

array ([124.56126638, 128.63795388])

Sklearn messed up my idea that there should be a simple interface predict

...

+3


source







All Articles