Python histogram of split () data
I'm trying to do histgramm over a text file that contains float:
import matplotlib.pyplot as plt c1_file = open('densEst1.txt','r') c1_data = c1_file.read().split() c1_sum = float(c1_data.__len__()) plt.hist(c1_data) plt.show()
The output c1_data.__len__()
works fine, but hist()
throws:
C:\Python27\python.exe "C:/x.py"
Traceback (most recent call last):
File "C:/x.py", line 7, in <module>
plt.hist(c1_data)
File "C:\Python27\lib\site-packages\matplotlib\pyplot.py", line 2958, in hist
stacked=stacked, data=data, **kwargs)
File "C:\Python27\lib\site-packages\matplotlib\__init__.py", line 1812, in inner
return func(ax, *args, **kwargs)
File "C:\Python27\lib\site-packages\matplotlib\axes\_axes.py", line 5995, in hist
if len(xi) > 0:
TypeError: len() of unsized object
source to share
The main reason for rejection plt.hist
is that the argument c1_data
is a list containing strings. When you are a open
file and read
its result is a line containing the contents of the files:
To read the contents of files, call
f.read(size)
that reads some amount of data and returns it as a string (in text mode) or a byte object (in binary mode).
Emphasis on mine.
When you now have split
this long line, you will get a list containing the lines:
Returns a list of words in a string using sep as the separator string.
However, a list of strings is not a valid value for plt.hist
:
>>> import matplotlib.pyplot as plt
>>> plt.hist(['1', '2'])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
1 import matplotlib.pyplot as plt
----> 2 plt.hist(['1', '2'])
C:\...\lib\site-packages\matplotlib\pyplot.py in hist(x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, hold, data, **kwargs)
3079 histtype=histtype, align=align, orientation=orientation,
3080 rwidth=rwidth, log=log, color=color, label=label,
-> 3081 stacked=stacked, data=data, **kwargs)
3082 finally:
3083 ax._hold = washold
C:\...\lib\site-packages\matplotlib\__init__.py in inner(ax, *args, **kwargs)
1895 warnings.warn(msg % (label_namer, func.__name__),
1896 RuntimeWarning, stacklevel=2)
-> 1897 return func(ax, *args, **kwargs)
1898 pre_doc = inner.__doc__
1899 if pre_doc is None:
C:\...\lib\site-packages\matplotlib\axes\_axes.py in hist(***failed resolving arguments***)
6178 xmax = -np.inf
6179 for xi in x:
-> 6180 if len(xi) > 0:
6181 xmin = min(xmin, xi.min())
6182 xmax = max(xmax, xi.max())
TypeError: len() of unsized object
Decision:
You can simply convert it to a float array:
>>> import numpy as np
>>> plt.hist(np.array(c1_data, dtype=float))
source to share
Pointing to an example using numpy ... easy and results below with code.
pandas will work too, separation and datatype are readable (even if it is column data), also you can read as a vector (depends on data size) /
# !/usr/bin/env python
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
import numpy as np
# will be better to read with numpy because you use float ...
#a = np.fromfile(open('from_file', 'r'), sep='\n')
from_file = np.array([1, 2, 2.5]) #sample data a
c1_data = from_file.astype(float) # convert the data in float
plt.hist(c1_data) # plt.hist passes it arguments to np.histogram
plt.title("Histogram without 'auto' bins")
plt.show()
plt.hist(c1_data, bins='auto') # plt.hist passes it arguments to np.histogram
plt.title("Histogram with 'auto' bins")
plt.show()
source to share