Logarithmic plot with multiple sequences with the same stroke width

Question

Logarithmic plot with multiple sequences with the same stroke width

I have something like

import matplotlib.pyplot as plt
import numpy as np

a=[0.05, 0.1, 0.2, 1, 2, 3]
plt.hist((a*2, a*3), bins=[0, 0.1, 1, 10])
plt.gca().set_xscale("symlog", linthreshx=0.1)
plt.show()

which gives me the following plot: log histogram

As you can see, the rod width is not equal. In the linear part (0 to 0.1) everything finds, but after that the stripe width is still in linear scale and the axis is in logarithmic scale, which gives me an uneven width for the stripes and the gaps between them (checkmark not in the middle of the bars ).

Is there a way to fix this?

+3

numpy matplotlib histogram

JonathanK May 30 '15 at 21:47

source to share

3 answers

You can use histtype='stepfilled'

it if you are fine with a graph where datasets are laid out one by one. Of course, you will need to choose colors with alpha values carefully so that all of your data can still be seen ...

a = [0.05, 0.1, 0.2, 1, 2, 3] * 2
b = [0.05, 0.05, 0.05, 0.15, 0.15, 2]
colors = [(0.2, 0.2, 0.9, 0.5), (0.9, 0.2, 0.2, 0.5)]  # RGBA tuples
plt.hist((a, b), bins=[0, 0.1, 1, 10], histtype='stepfilled', color=colors)
plt.gca().set_xscale("symlog", linthreshx=0.1)
plt.show()

I have modified the data a bit to illustrate better. This gives me: Results

For some reason the overlap color seems to be wrong (matplotlib 1.3.1 with Python 3.4.0, is this a bug?), But this is one possible solution / alternative to your problem.

+1

Praveen May 31 '15 at 7:27

source to share

Ok, I found a real problem: when you create a histogram with these bin edge settings, the histogram creates bars with equal size and equal outer spacing on a non-local scale.

To demonstrate here a larger version of the plot in the question, but on a non-local scale: hist-non-log

Note that the first two bars are centered around (0 + 0.1) / 2 = 0.05 with a 0.1 / 10 = 0.01 gap at the edges, and the next two bars are centered around (0.1 + 1.0 ) / 2 = 0.55, with a gap of 1.1 / 10 = 0.11 on any edge.

When converting things to log scale, stripe width and edge widths all go to a huge chunk. This is compounded by the fact that you have a linear scale from 0 to 0.1, after which things become logarithmic.

I don't know how to do this, other than doing everything manually. I used the geometric bin edge tools to figure out what the border and width of the bar should be. Please note that this piece of code will only work for two datasets. If you have more datasets, you need some function that fills the bin edges with the geometric series appropriately.

import numpy as np
import matplotlib.pyplot as plt

def geometric_means(a):
    """Return pairwise geometric means of adjacent elements."""
    return np.sqrt(a[1:] * a[:-1])

a = [0.05, 0.1, 0.2, 1, 2, 3] * 2
b = [0.05, 0.1, 0.2, 1, 2, 3] * 3

# Find frequencies
bins = np.array([0, 0.1, 1, 10])
a_hist = np.histogram(a, bins=bins)[0]
b_hist = np.histogram(b, bins=bins)[0]

# Find log-scale mid-points for bar-edges
mid_vals = np.hstack((np.array([0.05,]), geometric_means(bins[1:])))

# Compute bar left-edges, and bar widths
a_x = np.empty(mid_vals.size * 2)
a_x = bins[:-1]
a_widths = mid_vals - bins[:-1]

b_x = np.empty(mid_vals.size * 2)
b_x = mid_vals
b_widths = bins[1:] - mid_vals

plt.bar(a_x, a_hist, width=a_widths, color='b')
plt.bar(b_x, b_hist, width=b_widths, color='g')

plt.gca().set_xscale("symlog", linthreshx=0.1)
plt.show()

And the end result: final-result

Sorry, but the neat gaps between the bars are getting killed. Again, this can be fixed by doing the appropriate geometric interpolation so that everything is linear on a logarithmic scale.

+1

Praveen May 31 '15 at 8:10

source to share

JonathanK · Accepted Answer · 2015-05-31T11:33:34+0000

Inspired by fooobar.com/questions/2229890 / ... I came up with the following solution:

import matplotlib.pyplot as plt
import numpy as np

d=[0.05, 0.1, 0.2, 1, 2, 3]


def LogHistPlot(data, bins):
    totalWidth=0.8
    colors=("b", "r", "g")
    for i, d in enumerate(data):
        heights = np.histogram(d, bins)[0]
        width=1/len(data)*totalWidth
        left=np.array(range(len(heights))) + i*width

        plt.bar(left, heights, width, color=colors[i], label=i)
        plt.xticks(range(len(bins)), bins)
    plt.legend(loc='best')

LogHistPlot((d*2, d*3, d*4), [0, 0.1, 1, 10])

plt.show()

What this plot produces: Correct logarithmic histogram with multiple datasets

The basic idea is to ditch the plt.hist function, calculate the histogram with numpy and plot it with plt.bar. Then you can easily use the linear x-axis, which makes calculating the bandwidth trivial. Finally, the tics are replaced by the edges of the bin, resulting in a logarithmic scale. And you don't have to deal with the symlog linear logarithmic lottery anymore.

Logarithmic plot with multiple sequences with the same stroke width

More articles: