Frequency plot in Python / Pandas DataFrame

I have parsing a very large dataframe with values ​​like this and multiple columns:

Name Age Points ...
XYZ  42  32pts  ...
ABC  41  32pts  ...
DEF  32  35pts
GHI  52  35pts
JHK  72  35pts
MNU  43  42pts
LKT  32  32pts
LKI  42  42pts
JHI  42  35pts
JHP  42  42pts
XXX  42  42pts
XYY  42  35pts

      

I have imported numpy and matplotlib.

I need to plot a graph of the number of times a value occurs in the "Points" column. I don't need to have any conspiracy bunkers. So it's more of a plot to see how many times the same score indicates a large dataset.

So, essentially a bar graph (or a histogram, if I may put it that way) should show that 32pts occurs three times, 35pts occurs 5 times, and 42pts occurs 4 times. If I can display the values ​​in sorted order, so much the better. I tried df.hist () but it doesn't work for me. Any hints? Thank.

+7


source to share


2 answers


Just plot the results of the dataframe method value_count

directly:

import matplotlib.pyplot as plt
import pandas

data = load_my_data()
fig, ax = plt.subplots()
data['Points'].value_counts().plot(ax=ax, kind='bar')

      

If you want to remove the row "pnts" from all elements in your column, you can do something like this:



df['points_int'] = df['Points'].str.replace('pnts', '').astype(int)

      

This assumes they all end in "pnts". If it changes from line to line, you need to search for regular expressions like this: Split columns with pandas

And the official docs: http://pandas.pydata.org/pandas-docs/stable/text.html#text-string-methods

+21


source


The Seaborn package has a countplot function that you can use to plot a frequency plot.



import nautical as sns ax = sns.countplot (x = "Points", data = df)

0


source







All Articles