Rolling Unique Sum for 3 previous months in python

Question

Rolling Unique Sum for 3 previous months in python

Below is the dataset I am looking at.

Input:-
Date          Name
01/01/2017    A
01/03/2017    B
02/05/2017    A
03/17/2017    C
04/08/2017    D
05/10/2017    B
06/12/2017    D

Output:-
Date      Unique Count
Jan 2017    2
Feb 2017    2
Mar 2017    3
Apr 2017    3
May 2017    3
Jun 2017    2

I want to get unique "Name" counts for the previous 3 months based on rental. For example, as of 06/12/2017 the previous 3 months including April, May, June. So April had a D, May had a B, and June had a D. Thus, the unique number of June months is 2. The same is for all other months.

I am looking for a pandas function that could help me with this. Or any custom code that could implement this.

Any help is appreciated.

+3

python python-3.x pandas

howard roark June 10. 17 at 22:47

source to share

2 answers

Let's start by creating a DataFrame and setting dates as an index:

df= pd.DataFrame({'Date': ['01-01-2017', '01-03-2017', '02-05-2017', '03-17-2017', '04-08-2017', '05-10-2017', '06-12-2017'], 
                  'Name': ['A','B', 'A', 'C', 'D', 'B', 'D']})

df['Date'] = pd.to_datetime(df['Date'])

df = df.set_index('Date')

First, we group by month, so that later we can do rolling counts per month:

groups = df.groupby(pd.TimeGrouper(freq='M'))

Now we need a way to save all the names that we saw each month. We can put them on a list.

all_names_per_month = groups['Name'].apply(list)

It looks like this:

Date
2017-01-31    [A, B]
2017-02-28       [A]
2017-03-31       [C]
2017-04-30       [D]
2017-05-31       [B]
2017-06-30       [D]
Freq: M, Name: Name, dtype: object

Next, ideally, we would like to use all_names_per_month.rolling(3).apply(...)

, but unfortunately apply

does not work with non-numeric values, so we can instead set up a custom rolling function to get the values we want: / p>

def get_values(window_len, df):
    values = []
    for i in range(1, len(df)+1):
        if i < window_len:
            values.append(len(set(itertools.chain.from_iterable(all_names_per_month.iloc[0: i]))))
        else:
            values.append(len(set(itertools.chain.from_iterable(all_names_per_month.iloc[i-3:i]))))

    return values


values = get_values(3, all_names_per_month)

This gives us:

[2, 2, 3, 3, 3, 2]

Finally, we can put these values in the DataFrame at the appropriate index, which we then modify to look like you indicated above:

result = pd.DataFrame(data=values, columns=['Unique Count'], index=all_names_per_month.index)

result.index = result.index.strftime('%B %Y')

result 

               Unique Count
January 2017              2
February 2017             2
March 2017                3
April 2017                3
May 2017                  3
June 2017                 2

+1

LateCoder June 11. 17 at 2:17

source to share

Kodiologist · Accepted Answer · 2017-06-11T01:47:25+0000

Try:

months = pd.to_datetime(d.loc[:, "Date"]).dt.to_period("M")
out = pd.DataFrame([
    (month, len(d.loc[(-2 <= months - month) & (months - month <= 0), "Name"].unique()))
    for month in months.unique()])

Rolling Unique Sum for 3 previous months in python

More articles: