Seaborn boxplot x-axis as numbers, not labels

Suppose I have pandas DataFrame

which is created like this:

df = pd.DataFrame(columns=['x_value', 'y_value'])
for x in [1.0, 3.0, 9.0]:
    for _ in range(1000):
        df = df.append({'x_value':x, 'y_value':np.random.random()}, ignore_index=True)

      

The result will look something like this:

In: df.head()
Out: 
    x_value y_value
0   1.0 0.616052
1   3.0 1.406715
2   9.0 8.774720
3   1.0 0.810729
4   3.0 1.309627

      

Using a marine vessel to create crates gives this result:

[In] sns.boxplot(x='x_value', y='y_value', data=df)
[Out]

      

enter image description here

I would like to create a set of boxes spaced apart, as if the x-axis values ​​were treated as numbers and not just labels.

Is it possible? Am I just looking at the wrong plot type to pass the variance information of my data if the boxes can't?

+3


source to share


1 answer


As @mwaskom pointed out in the comments below my original answer, using an argument order

can be used to create empty positions in the box between fields.

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

x = np.random.choice([1,3,9], size=1001)
y = np.random.rand(1001)*(4+np.log(x))
df = pd.DataFrame({"x":x, "y":y})

sns.boxplot(x='x', y='y', data=df, order=range(1,10))

plt.show()

      

enter image description here

Note that in this case the axes are still categorical, meaning that it starts at 0 with a step of 1, and only the labels indicate that it is different. In the case of a question, this is not a problem, but you need to be aware of this when, for example, plotting other quantitative graphs on the same graph. This will also only work if the column positions are integers .



Another more general solution is to use matplotlib.pyplot.boxplot . Then the decision will depend on whether you have the same number of values ​​for each "shade" category or not. In general, if they are different from each other, you should plot one square for each value in the loop. Then the axes are actually scaled and non-integers are not an issue.

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np


x = np.random.choice([1,3,9], size=1001)
y = np.random.rand(1001)*(4+np.log(x))
df = pd.DataFrame({"x":x, "y":y})

u = df.x.unique()
color=plt.cm.spectral(np.linspace(.1,.8, len(u)))
for c, (name, group) in zip(color,df.groupby("x")):
    bp = plt.boxplot(group.y.values, positions=[name], widths=0.8, patch_artist=True)
    bp['boxes'][0].set_facecolor(c)


plt.xticks(u,u)
plt.autoscale()
plt.show()

      

enter image description here

+1


source







All Articles