Rolling Unique Sum for 3 previous months in python

Below is the dataset I am looking at.

Input:-
Date          Name
01/01/2017    A
01/03/2017    B
02/05/2017    A
03/17/2017    C
04/08/2017    D
05/10/2017    B
06/12/2017    D

Output:-
Date      Unique Count
Jan 2017    2
Feb 2017    2
Mar 2017    3
Apr 2017    3
May 2017    3
Jun 2017    2

      

I want to get unique "Name" counts for the previous 3 months based on rental. For example, as of 06/12/2017 the previous 3 months including April, May, June. So April had a D, May had a B, and June had a D. Thus, the unique number of June months is 2. The same is for all other months.

I am looking for a pandas function that could help me with this. Or any custom code that could implement this.

Any help is appreciated.

+3


source to share


2 answers


Try:



months = pd.to_datetime(d.loc[:, "Date"]).dt.to_period("M")
out = pd.DataFrame([
    (month, len(d.loc[(-2 <= months - month) & (months - month <= 0), "Name"].unique()))
    for month in months.unique()])

      

+2


source


Let's start by creating a DataFrame and setting dates as an index:

df= pd.DataFrame({'Date': ['01-01-2017', '01-03-2017', '02-05-2017', '03-17-2017', '04-08-2017', '05-10-2017', '06-12-2017'], 
                  'Name': ['A','B', 'A', 'C', 'D', 'B', 'D']})

df['Date'] = pd.to_datetime(df['Date'])

df = df.set_index('Date')

      

First, we group by month, so that later we can do rolling counts per month:

groups = df.groupby(pd.TimeGrouper(freq='M'))

      

Now we need a way to save all the names that we saw each month. We can put them on a list.

all_names_per_month = groups['Name'].apply(list)

      

It looks like this:



Date
2017-01-31    [A, B]
2017-02-28       [A]
2017-03-31       [C]
2017-04-30       [D]
2017-05-31       [B]
2017-06-30       [D]
Freq: M, Name: Name, dtype: object

      

Next, ideally, we would like to use all_names_per_month.rolling(3).apply(...)

, but unfortunately apply

does not work with non-numeric values, so we can instead set up a custom rolling function to get the values ​​we want: / p>

def get_values(window_len, df):
    values = []
    for i in range(1, len(df)+1):
        if i < window_len:
            values.append(len(set(itertools.chain.from_iterable(all_names_per_month.iloc[0: i]))))
        else:
            values.append(len(set(itertools.chain.from_iterable(all_names_per_month.iloc[i-3:i]))))

    return values


values = get_values(3, all_names_per_month)

      

This gives us:

[2, 2, 3, 3, 3, 2]

      

Finally, we can put these values ​​in the DataFrame at the appropriate index, which we then modify to look like you indicated above:

result = pd.DataFrame(data=values, columns=['Unique Count'], index=all_names_per_month.index)

result.index = result.index.strftime('%B %Y')

result 

               Unique Count
January 2017              2
February 2017             2
March 2017                3
April 2017                3
May 2017                  3
June 2017                 2

      

+1


source







All Articles