Frequency plot in Python / Pandas DataFrame
I have parsing a very large dataframe with values ββlike this and multiple columns:
Name Age Points ...
XYZ 42 32pts ...
ABC 41 32pts ...
DEF 32 35pts
GHI 52 35pts
JHK 72 35pts
MNU 43 42pts
LKT 32 32pts
LKI 42 42pts
JHI 42 35pts
JHP 42 42pts
XXX 42 42pts
XYY 42 35pts
I have imported numpy and matplotlib.
I need to plot a graph of the number of times a value occurs in the "Points" column. I don't need to have any conspiracy bunkers. So it's more of a plot to see how many times the same score indicates a large dataset.
So, essentially a bar graph (or a histogram, if I may put it that way) should show that 32pts occurs three times, 35pts occurs 5 times, and 42pts occurs 4 times. If I can display the values ββin sorted order, so much the better. I tried df.hist () but it doesn't work for me. Any hints? Thank.
source to share
Just plot the results of the dataframe method value_count
directly:
import matplotlib.pyplot as plt
import pandas
data = load_my_data()
fig, ax = plt.subplots()
data['Points'].value_counts().plot(ax=ax, kind='bar')
If you want to remove the row "pnts" from all elements in your column, you can do something like this:
df['points_int'] = df['Points'].str.replace('pnts', '').astype(int)
This assumes they all end in "pnts". If it changes from line to line, you need to search for regular expressions like this: Split columns with pandas
And the official docs: http://pandas.pydata.org/pandas-docs/stable/text.html#text-string-methods
source to share