Scattering by category in pandas

This has bothered me for the last 30 minutes. What I would like to do is scatter the plot into categories. I looked at the documentation but I couldn't find an answer. I looked here but when I ran this in iPython Notebook I got nothing.

Here is my dataframe:

time    cpu   wait    category 
8       1     0.5     a 
9       2     0.2     a
2       3     0.1     b
10      4     0.7     c
3       5     0.2     c
5       6     0.8     b

      

Ideally, I would like to have a scatter plot showing the CPU on the x-axis, wait on the y-axis, and each point on the graph is different by category. For example, if a = red, b = blue and c = green, then point (1, 0.5) and (2, 0.2) should be red, (3, 0.1) and (6, 0.8 ) must be blue, etc.

How can I do this using pandas? or matplotlib? depending on what kind of work.

+3


source to share


3 answers


This is essentially the same answer as @JoeCondron, but two liners:

cmap = {'a': 'red', 'b': 'blue', 'c': 'yellow'}
df.plot(x='cpu', y='wait', kind='scatter', 
        colors=[cmap.get(c, 'black') for c in df.category])

      

If no color is displayed for a category, it is black by default.



EDIT:

The above works for Pandas 0.14.1. For 0.16.2, "colors" must be changed to "c":

df.plot(x='cpu', y='wait', kind='scatter', 
    c=[cmap.get(c, 'black') for c in df.category])

      

+1


source


You could do

color_map = {'a': 'r', 'b': 'b', 'c': 'y'}
ax = plt.subplot()
x, y = df.cpu, df.wait
colors = df.category.map(color_map)
ax.scatter(x, y, color=colors)

      



This will give you red for category a, blue for b, yellow for c. This way you can traverse a list of color aliases the same length as the arrays. You can check out the many colors available here: http://matplotlib.org/api/colors_api.html . I don't think the plot method is very useful for scattering.

+2


source


I would create a column with your colors based on the category, then do the following, where ax is axt matplotlib and df is your dataframe:

ax.scatter(df['cpu'], df['wait'], marker = '.', c = df['colors'], s = 100)

      

+1


source







All Articles