Histogram configuration
We have a dataset. We want to get their histograms and plot them on a log scale. We use the following code:
y,binEdges=np.histogram(hist_data,bins=200)
bincenters = 0.8*(binEdges[1:]+binEdges[:-1])
p.plot(bincenters,y,'-')
p.yscale('log', nonposy='clip')
Result:
However, when I try to increase the bit (i.e. from bins = 200 to bins = 600) the result is:
How can you only store the lines and not the entire spectrum of each histogram?
source to share
What you see is that some of the bins are empty, so it draws a rectangle that goes from f(y) -> 0 -> f(y+delta) -> 0 -> f(y+2*delta)
. A common trick to get around this is not to use an abrupt cut as your bin (we call it the kernel). You can use, for example, "Kernel Density Estimation" to "flatten" the histogram. In this case, you put a bunch of gaussians centered at your data points - the sum reflects a reflection of the underlying probability distribution. You can use scipy to run KDE, or a nice package seaborn
that will do this with automatic plotting. The image from the linked sea view example gives a good illustration of this:
To use matplotlib hist
without drawing and only using strings, go to histtype="step"
.
source to share
If some of the bins are empty, you can filter them using boolean indexing :
p.plot(bincenters[y>0],y[y>0],'-')
source to share