Pd.read_csv gives me str but need to float

I have a CSV that looks like this:

Date,Open,High,Low,Close,Adj Close,Volume
2007-07-25,4.929000,4.946000,4.896000,4.904000,4.904000,0
2007-07-26,4.863000,4.867000,4.759000,4.777000,4.777000,0
2007-07-27,4.741000,4.818000,4.741000,4.788000,4.788000,0
2007-07-30,4.763000,4.810000,4.763000,4.804000,4.804000,0

      

after

data = pd.read_csv(file, index_col='Date').drop(['Open','Close','Adj Close','Volume'], axis=1)

      

i ends with df, which looks like this:

                High       Low
Date                          
2007-07-25  4.946000  4.896000
2007-07-26  4.867000  4.759000
2007-07-27  4.818000  4.741000
2007-07-30  4.810000  4.763000
2007-07-31  4.843000  4.769000

      

Now I want to get High - Low. Tried:

np.diff(data.values, axis=1)

      

but getting error: unsupported operand type for -: 'str' and 'str'

but be sure why the values ​​in df are str in the first place. Grateful for any solution.

+3


source to share


2 answers


I think you need to_numeric

c errors='coerce'

because it seems like there is some bad data:



data = pd.read_csv(file, index_col='Date', usecols=['High','Low'])

data = data.apply(pd.to_numeric, errors='coerce')

      

+5


source


Read_csv dtype parameter not working?

from dtype documentation : Type name or dict of column -> type, default None Data type for data or columns. For example. {'A: np.float64,' b: np.int32} Use str or an object to store and not interpret the dtype. If converters are specified, they will be used in INSTEAD format for dtype conversion.



data = pd.read_csv(file,
    index_col='Date',
    usecols=['High','Low'],
    dtype={'High': np.float64, 'Low': np.float64})

      

+1


source







All Articles