How to handle pymc3 form Deterministic variables
I have been working on creating a hierarchical model of some psychophysical behavioral data in pymc3. I'm incredibly impressed with things in general, but while trying to get up to speed with Theano and pymc3, I have a model that mostly works, but has a couple of problems.
The code is structured to match the parameterized version of Weibull for seven datasets. Each study is modeled as a binary Bernoulli result, and the thresholds (thact output as y values that are used to match a Gaussian function in height, width, and height (a, c, and d in a typical Gaussian).
Using parameterized Weibull seems to work well and is now hierarchical for the Weibull tilt, while the thresholds fit separately for each piece of data. However - the output I get from k and y_est leads me to believe that they may not be the correct size, and unlike the probability distributions, it doesn't look like I can specify the shape (unless there is an anal way of doing this I haven't found - although from what I've read, the defining figure in theanano is tricky).
Ultimately, I would like to use y_est to estimate the height or width of a gaussian, however the output right now leads to an incredible mess, which I think arises with the size problems in y_est and k. Any help would be fantastic - the code below should simulate some data followed by the model. The model does a great job with every single rapids and gets slopes, but falls apart when it deals with the rest.
Thanks for watching - I'm still impressed with pymc3!
EDIT: Ok, so the form output using y_est.tag.test_value.shape looks like this:
y_est.tag.test_value.shape (101, 7) k.tag.test_value.shape (7,)
I think this is where I ran into difficulties, although it may be poorly constructed on my part. k is well-formed (one k value per unique_xval). y_est outputs the full dataset (101x7) instead of one score (one y_est per unique_xval) for each difficulty level. Is there a way to indicate that y_est receives specific subsets of df_y_vals to manage this?
#Import necessary modules and define our weibull function import numpy as np import pylab as pl from scipy.stats import bernoulli #x stimulus intensity #g chance (0.5 for 2AFC) # m slope # t threshold # a performance level defining threshold def weib(x,g,a,m,t): k=-np.log(((1-a)/(1-g))**(1/t)) return 1- (1-g)*np.exp(- (k*x/t)**m); #Output values from weibull function xit=101 xvals=np.linspace(0.05,1,xit) out_weib=weib(xvals, 0.5, 0.8, 3, 0.6) #Okay, fitting the perfect output of a Weibull should be easy, contaminate with some noise #Slope of 3, threshold of 0.6 #How about 5% noise! noise=0.05*np.random.randn(np.size(out_weib)) out=out_weib+noise #Let make this more like a typical experiment - #i.e. no percent correct, just one or zero #Randomly pick based on the probability at each point whether they got the trial right or wrong trial=np.zeros_like(out) for i in np.arange(out.size): p=out_weib[i] trial[i] = bernoulli.rvs(p) #Iterate for 6 sets of data, similar slope (from a normal dist), different thresh (output from gaussian) #Gauss parameters= true_gauss_height = 0.3 true_gauss_width = 0.01 true_gauss_elevation = 0.2 #What thresholds will we get then? 6 discrete points along that gaussian, from 0 to 180 degree mask x_points=[0, 30, 60, 90, 120, 150, 180] x_points=np.asarray(x_points) gauss_points=true_gauss_height*np.exp(- ((x_points**2)/2*true_gauss_width**2))+true_gauss_elevation import pymc as pm2 import pymc3 as pm import pandas as pd slopes=pm2.rnormal(3, 3, size=7) out_weib=np.zeros([xvals.size,x_points.size]) for i in np.arange(x_points.size): out_weib[:,i]=weib(xvals, 0.5, 0.8, slopes[i], gauss_points[i]) #Let make this more like a typical experiment - i.e. no percent correct, just one or zero #Randomly pick based on the probability at each point whether they got the trial right or wrong trials=np.zeros_like(out_weib) for i in np.arange(len(trials)): for ii in np.arange(gauss_points.size): p=out_weib[i,ii] trials[i,ii] = bernoulli.rvs(p) #Let make that data into a DataFrame for pymc3 y_vals=np.tile(xvals, [7, 1]) df_correct = pd.DataFrame(trials, columns=x_points) df_y_vals = pd.DataFrame(y_vals.T, columns=x_points) unique_xvals=x_points import theano as th with pm.Model() as hierarchical_model: # Hyperpriors for group node mu_slope = pm.Normal('mu_slope', mu=3, sd=1) sigma_slope = pm.Uniform('sigma_slope', lower=0.1, upper=2) #Priors for the overall gaussian function - 3 params, the height of the gaussian #Width, and elevation gauss_width = pm.HalfNormal('gauss_width', sd=1) gauss_elevation = pm.HalfNormal('gauss_elevation', sd=1) slope = pm.Normal('slope', mu=mu_slope, sd=sigma_slope, shape=unique_xvals.size) thresh=pm.Uniform('thresh', upper=1, lower=0.1, shape=unique_xvals.size) k = -th.tensor.log(((1-0.8)/(1-0.5))**(1/thresh)) y_est=1-(1-0.5)*th.tensor.exp(-(k*df_y_vals/thresh)**slope) #We want our model to predict either height or width...height would be easier. #Our Gaussian function has y values estimated by y_est as the 82% thresholds #and Xvals based on where each of those psychometrics were taken. #height_est=pm.Deterministic('height_est', (y_est/(th.tensor.exp((-unique_xvals**2)/2*gauss_width)))+gauss_elevation) height_est = pm.Deterministic('height_est', (y_est-gauss_elevation)*th.tensor.exp((unique_xvals**2)/2*gauss_width**2)) #Define likelihood as Bernoulli for each binary trial likelihood = pm.Bernoulli('likelihood',p=y_est, shape=unique_xvals.size, observed=df_correct) #Find start start=pm.find_MAP() step=pm.NUTS(state=start) #Do MCMC trace = pm.sample(5000, step, njobs=1, progressbar=True) # draw 5000 posterior samples using NUTS sampling
source to share
I'm not sure exactly what you want to do when you say "Is there a way to indicate that y_est receives specific subsets of df_y_vals to manage this". Can you describe, for each y_est value, which df_y_vals values you should use? What is the form of df_y_vals? What should be the form of y_est? (7,)?
I suspect you want to index into df_y_vals using numpy advanced indexing , which works the same in PyMC as it does in numpy. It's hard to say for sure without additional information.
source to share