Python, Seaborn: calculating frequencies with zero values

I have a Pandas series with values ​​for which I would like to plot counts. This creates roughly what I want:

dy = sns.countplot(rated.year, color="#53A2BE")
axes = dy.axes
dy.set(xlabel='Release Year', ylabel = "Count")
dy.spines['top'].set_color('none')
dy.spines['right'].set_color('none')
plt.show()

      

The problem is related to lack of data. There are 31 years with ratings, but over 42 years. This means that there must be blank cells that are not displayed. Is there a way to set this up in Seaborn / Matplotlib? Should I be using a different type of graph or is there another fix for this?

I wondered if it could be configured as a time series, but I have the same issue with rating scales. So on a scale of 1-10, the count for, for example, 4 might be zero, and therefore "4" is not in the Pandas dataset, which means it does not appear on the graph either.

The result I want is a full scale on the x-axis, with counts (for steps of one) on the y-axis and showing zero / empty bins for missing scale instances, instead of just showing the next bit for which data is available.

EDIT:

The data (rated.year) looks something like this:

import pandas as pd

rated = pd.DataFrame(data = [2016, 2004, 2007, 2010, 2015, 2016, 2016, 2015,
                             2011, 2010, 2016, 1975, 2011, 2016, 2015, 2016, 
                             1993, 2011, 2013, 2011], columns = ["year"])

      

It has more meanings, but the format is the same. As you can see in ..

rated.year.value_counts()

      

.. there are quite a few x values ​​for which the count should be zero on the chart. The current graph looks like this:

Seaborn plot

+3


source to share


2 answers


I solved the problem using the solution suggested by @mwaskom in the comments to my question. That is, add an "order" to the counter with all allowable values ​​for the year, including the number equal to zero. This is the code that creates the graph:



import pandas as pd
import seaborn as sns

rated = pd.DataFrame(data = [2016, 2004, 2007, 2010, 2015, 2016, 2016, 2015,
                             2011, 2010, 2016, 1975, 2011, 2016, 2015, 2016, 
                             1993, 2011, 2013, 2011], columns = ["year"])

dy = sns.countplot(rated.year, color="#53A2BE", order = list(range(rated.year.min(),rated.year.max()+1)))
axes = dy.axes
dy.set(xlabel='Release Year', ylabel = "Count")
dy.spines['top'].set_color('none')
dy.spines['right'].set_color('none')
plt.show()

      

+4


source


Consider a marine bar slab by creating a reindexed series captured in a data frame:

# REINDEXED DATAFRAME
rated_ser = pd.DataFrame(rated['year'].value_counts().\
                         reindex(range(rated.year.min(),rated.year.max()+1), fill_value=0))\
                         .reset_index()

# SNS BAR PLOT
dy = sns.barplot(x='index', y='year', data=rated_ser, color="#53A2BE")
dy.set_xticklabels(dy.get_xticklabels(), rotation=90)   # ROTATE LABELS, 90 DEG.
axes = dy.axes
dy.set(xlabel='Release Year', ylabel = "Count")

dy.spines['top'].set_color('none')
dy.spines['right'].set_color('none')

      



Departure of a sea vessel

+1


source







All Articles