Predicting Basic Bayesian Linear Regression with PyMC3

Question

Predicting Basic Bayesian Linear Regression with PyMC3

I would like to use my PyMC3 LR model to get an 80% HPD range for the predicted variable value y

as new data becomes available. Thus, extrapolate the reliable distribution of values for y

for a new value x

not in my original dataset.

Model:

with pm.Model() as model_tlr:
    alpha = pm.Normal('alpha', mu=0, sd=10)
    beta = pm.Normal('beta', mu=0, sd=10)
    epsilon = pm.Uniform('epsilon', 0, 25)

    nu = pm.Deterministic('nu', pm.Exponential('nu_', 1/29) + 1)
    mu = pm.Deterministic('mu', alpha + beta * x)

    yl = pm.StudentT('yl', mu=mu, sd=epsilon, nu=nu, observed=y)

    trace_tlr = pm.sample(50000, njobs=3)

After the burn, I take a sample from the back and get HPD

ppc_tlr = pm.sample_ppc(btrace_tlr, samples=10000, model=model_tlr)
ys = ppc_tlr['yl']
y_hpd = pm.stats.hpd(ys, alpha=0.2)

This is great for HPD rendering around a central trend (using fill_between)

But I would like now to use a model to get HPD for y

when x=126.2

(for example) and the original dataset does not contain an observablex=126.2

The way I understand rear-end sampling is that x

there are 10k samples in the dataset for each of the available values , and hence there is no corresponding sample in ys

for x=126.2

, since it was not, t.

Basically, is there a way to use my model to get a distribution of valid values (model-based) from a predictor value x=126.2

that only became available after the model was built? If so, how?

thank

EDIT:
Found SO Post which mentions

A feature under development (likely to be added to pymc3) to predict new data.

Does it exist?

+3

linear-regression pymc3

Nel Apr 15. 17 at 11:43

source to share

1 answer

Nel · Accepted Answer · 2017-04-15T18:58:54+0000

OK, so this is possible, more or less as described in the previous SO post. However, a sample_ppc function has since been added to PyMC3 which makes the author's run_ppc redundant.

First, set up the Anano variable for x.

from theano import shared
x_shared = shared(x)

Then use x_shared when creating the model.

After creating the model, add a new database and update the shared variable

x_updated = np.append(x, 126.2)
x_shared.set_value(x_updated)

Re-run the PPC Sample Generator with original trace and model objects

new_ppc = pm.sample_ppc(btrace_tlr, samples=10000, model=model_tlr)

Sampling the back of a new database with

sample = new_ppc['yl'][:,-1]

Then I can get HPD with

pm.stats.hpd(sample)

array ([124.56126638, 128.63795388])

Sklearn messed up my idea that there should be a simple interface predict

...

Predicting Basic Bayesian Linear Regression with PyMC3

More articles: