Predicting Basic Bayesian Linear Regression with PyMC3
I would like to use my PyMC3 LR model to get an 80% HPD range for the predicted variable value y
as new data becomes available. Thus, extrapolate the reliable distribution of values ββfor y
for a new value x
not in my original dataset.
Model:
with pm.Model() as model_tlr:
alpha = pm.Normal('alpha', mu=0, sd=10)
beta = pm.Normal('beta', mu=0, sd=10)
epsilon = pm.Uniform('epsilon', 0, 25)
nu = pm.Deterministic('nu', pm.Exponential('nu_', 1/29) + 1)
mu = pm.Deterministic('mu', alpha + beta * x)
yl = pm.StudentT('yl', mu=mu, sd=epsilon, nu=nu, observed=y)
trace_tlr = pm.sample(50000, njobs=3)
After the burn, I take a sample from the back and get HPD
ppc_tlr = pm.sample_ppc(btrace_tlr, samples=10000, model=model_tlr)
ys = ppc_tlr['yl']
y_hpd = pm.stats.hpd(ys, alpha=0.2)
This is great for HPD rendering around a central trend (using fill_between)
But I would like now to use a model to get HPD for y
when x=126.2
(for example) and the original dataset does not contain an observablex=126.2
The way I understand rear-end sampling is that x
there are 10k samples in the dataset for each of the available values , and hence there is no corresponding sample in ys
for x=126.2
, since it was not, t.
Basically, is there a way to use my model to get a distribution of valid values ββ(model-based) from a predictor value x=126.2
that only became available after the model was built? If so, how?
thank
EDIT:
Found SO Post which mentions
A feature under development (likely to be added to pymc3) to predict new data.
Does it exist?
source to share
OK, so this is possible, more or less as described in the previous SO post. However, a sample_ppc function has since been added to PyMC3 which makes the author's run_ppc redundant.
First, set up the Anano variable for x.
from theano import shared
x_shared = shared(x)
Then use x_shared when creating the model.
After creating the model, add a new database and update the shared variable
x_updated = np.append(x, 126.2) x_shared.set_value(x_updated)
Re-run the PPC Sample Generator with original trace and model objects
new_ppc = pm.sample_ppc(btrace_tlr, samples=10000, model=model_tlr)
Sampling the back of a new database with
sample = new_ppc['yl'][:,-1]
Then I can get HPD with
pm.stats.hpd(sample)
array ([124.56126638, 128.63795388])
Sklearn messed up my idea that there should be a simple interface predict
...
source to share