Seaborn boxplot x-axis as numbers, not labels
Suppose I have pandas DataFrame
which is created like this:
df = pd.DataFrame(columns=['x_value', 'y_value'])
for x in [1.0, 3.0, 9.0]:
for _ in range(1000):
df = df.append({'x_value':x, 'y_value':np.random.random()}, ignore_index=True)
The result will look something like this:
In: df.head()
Out:
x_value y_value
0 1.0 0.616052
1 3.0 1.406715
2 9.0 8.774720
3 1.0 0.810729
4 3.0 1.309627
Using a marine vessel to create crates gives this result:
[In] sns.boxplot(x='x_value', y='y_value', data=df)
[Out]
I would like to create a set of boxes spaced apart, as if the x-axis values ββwere treated as numbers and not just labels.
Is it possible? Am I just looking at the wrong plot type to pass the variance information of my data if the boxes can't?
source to share
As @mwaskom pointed out in the comments below my original answer, using an argument order
can be used to create empty positions in the box between fields.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
x = np.random.choice([1,3,9], size=1001)
y = np.random.rand(1001)*(4+np.log(x))
df = pd.DataFrame({"x":x, "y":y})
sns.boxplot(x='x', y='y', data=df, order=range(1,10))
plt.show()
Note that in this case the axes are still categorical, meaning that it starts at 0 with a step of 1, and only the labels indicate that it is different. In the case of a question, this is not a problem, but you need to be aware of this when, for example, plotting other quantitative graphs on the same graph. This will also only work if the column positions are integers .
Another more general solution is to use matplotlib.pyplot.boxplot . Then the decision will depend on whether you have the same number of values ββfor each "shade" category or not. In general, if they are different from each other, you should plot one square for each value in the loop. Then the axes are actually scaled and non-integers are not an issue.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
x = np.random.choice([1,3,9], size=1001)
y = np.random.rand(1001)*(4+np.log(x))
df = pd.DataFrame({"x":x, "y":y})
u = df.x.unique()
color=plt.cm.spectral(np.linspace(.1,.8, len(u)))
for c, (name, group) in zip(color,df.groupby("x")):
bp = plt.boxplot(group.y.values, positions=[name], widths=0.8, patch_artist=True)
bp['boxes'][0].set_facecolor(c)
plt.xticks(u,u)
plt.autoscale()
plt.show()
source to share