Histogram configuration

We have a dataset. We want to get their histograms and plot them on a log scale. We use the following code:

y,binEdges=np.histogram(hist_data,bins=200)
bincenters = 0.8*(binEdges[1:]+binEdges[:-1])
p.plot(bincenters,y,'-')
p.yscale('log', nonposy='clip')

      

Result: Figure of bins = 200

However, when I try to increase the bit (i.e. from bins = 200 to bins = 600) the result is: Figure of bins = 600]

How can you only store the lines and not the entire spectrum of each histogram?

+3


source to share


2 answers


What you see is that some of the bins are empty, so it draws a rectangle that goes from f(y) -> 0 -> f(y+delta) -> 0 -> f(y+2*delta)

. A common trick to get around this is not to use an abrupt cut as your bin (we call it the kernel). You can use, for example, "Kernel Density Estimation" to "flatten" the histogram. In this case, you put a bunch of gaussians centered at your data points - the sum reflects a reflection of the underlying probability distribution. You can use scipy to run KDE, or a nice package seaborn

that will do this with automatic plotting. The image from the linked sea view example gives a good illustration of this:

enter image description here



To use matplotlib hist

without drawing and only using strings, go to histtype="step"

.

+2


source


If some of the bins are empty, you can filter them using boolean indexing :



p.plot(bincenters[y>0],y[y>0],'-')

      

+1


source







All Articles