Correlation between columns python blaze

There was a simple question on how to use the python blaze module for parsing. So, I'm trying to do this code:

from blaze import SQL,Table
from sqlalchemy import create_engine
from scipy.stats import pearsonr
sql_path=r'/path/to/my/database.db'
e=create_engine('sqlite:///%s'%sql_path)
blz_sql=SQL(e,'analysis_dataframe')
blz_frame=Table(blz_sql)
blz_cols=blz_frame.columns
corr=pearsonr(blz_frame[blz_cols[0]],blz_frame[blz_cols[10]])
print(corr)

      

And so I got this error:

TypeError: len() of unsized object

      

After reading some blaze docs, I found that the problem is converting the blaze column to some structure like this:

import pandas as pd
from blaze import into
df=into(pd.DataFrame,blz_frame[blz_cols[0]]

      

But this conversion makes pearsonr's iterative calculation on the column list slower. So how can I just convert the blaze column to np.array to use computation (like pearsonr or statsmodels.api.Logit (blz_frame.y, blz_frame [[train_cols]]) on it?) If it makes sense I am using Anaconda for Python 3.4, my blaze version is:

import blaze
print(blaze.__version__)
#returns 0.6.3

      

+3


source to share


1 answer


Type modules scipy.stats

often expect a numpy array or pandas DataFrame explicitly. Their logic is baked into these data structures.

Blaze can help you do numpy or pandas as things on other people's datasets (like your sqlite database), but cannot get into type libraries scipy.stats

and change their code.

I see the following solutions:



  • Suck all data from sqlite into ndarray / DataFrame (as you are doing here) (this is slow)
  • Improve scipy.stats

    to not accept specific data structures. (this will require a mature codebase change)
  • Write some basic statistics about the more general interface that Blaze includes

In the case of Pearson's Correlation, it would be quite easy to redefine the algorithm in a more general way (# 3). Perhaps Blaze statistics or just general statistics will be relevant here.

Generally speaking, Blaze does not make a promise that existing scientific python code will run on external data structures. This is a lofty goal.

+3


source







All Articles