Python, Seaborn: calculating frequencies with zero values
I have a Pandas series with values ββfor which I would like to plot counts. This creates roughly what I want:
dy = sns.countplot(rated.year, color="#53A2BE")
axes = dy.axes
dy.set(xlabel='Release Year', ylabel = "Count")
dy.spines['top'].set_color('none')
dy.spines['right'].set_color('none')
plt.show()
The problem is related to lack of data. There are 31 years with ratings, but over 42 years. This means that there must be blank cells that are not displayed. Is there a way to set this up in Seaborn / Matplotlib? Should I be using a different type of graph or is there another fix for this?
I wondered if it could be configured as a time series, but I have the same issue with rating scales. So on a scale of 1-10, the count for, for example, 4 might be zero, and therefore "4" is not in the Pandas dataset, which means it does not appear on the graph either.
The result I want is a full scale on the x-axis, with counts (for steps of one) on the y-axis and showing zero / empty bins for missing scale instances, instead of just showing the next bit for which data is available.
EDIT:
The data (rated.year) looks something like this:
import pandas as pd
rated = pd.DataFrame(data = [2016, 2004, 2007, 2010, 2015, 2016, 2016, 2015,
2011, 2010, 2016, 1975, 2011, 2016, 2015, 2016,
1993, 2011, 2013, 2011], columns = ["year"])
It has more meanings, but the format is the same. As you can see in ..
rated.year.value_counts()
.. there are quite a few x values ββfor which the count should be zero on the chart. The current graph looks like this:
source to share
I solved the problem using the solution suggested by @mwaskom in the comments to my question. That is, add an "order" to the counter with all allowable values ββfor the year, including the number equal to zero. This is the code that creates the graph:
import pandas as pd
import seaborn as sns
rated = pd.DataFrame(data = [2016, 2004, 2007, 2010, 2015, 2016, 2016, 2015,
2011, 2010, 2016, 1975, 2011, 2016, 2015, 2016,
1993, 2011, 2013, 2011], columns = ["year"])
dy = sns.countplot(rated.year, color="#53A2BE", order = list(range(rated.year.min(),rated.year.max()+1)))
axes = dy.axes
dy.set(xlabel='Release Year', ylabel = "Count")
dy.spines['top'].set_color('none')
dy.spines['right'].set_color('none')
plt.show()
source to share
Consider a marine bar slab by creating a reindexed series captured in a data frame:
# REINDEXED DATAFRAME
rated_ser = pd.DataFrame(rated['year'].value_counts().\
reindex(range(rated.year.min(),rated.year.max()+1), fill_value=0))\
.reset_index()
# SNS BAR PLOT
dy = sns.barplot(x='index', y='year', data=rated_ser, color="#53A2BE")
dy.set_xticklabels(dy.get_xticklabels(), rotation=90) # ROTATE LABELS, 90 DEG.
axes = dy.axes
dy.set(xlabel='Release Year', ylabel = "Count")
dy.spines['top'].set_color('none')
dy.spines['right'].set_color('none')
source to share