# Define and count unique patterns in a pandas frame

You will find snippets with reproducible input and an example of the desired output at the end of the question.

** A task:**

I have a dataframe like this:

There are two columns in the data frame with patterns 1 and 0 like this:

Or that:

The number of columns will change as well as the length of the templates. However, the only numbers in the data frame will be 0 or 1.

I would like to identify these patterns, count each occurrence, and build a data file containing the results. To simplify it all, I would like to focus onand ignore **zeros** . The desired result in this particular case would be:

I would like the procedure to identify that, as an example, the pattern [1,1,1] occurs twice in column_A and not at all in column_B. Note that I have used template sums as indices in the dataframe.

** Reproducible input:**

```
import pandas as pd
df = pd.DataFrame({'column_A':[1,1,1,0,0,0,1,0,0,1,1,1],
'column_B':[1,1,1,1,1,0,0,0,1,1,0,0]})
colnames = list(df)
df[colnames] = df[colnames].apply(pd.to_numeric)
datelist = pd.date_range(pd.datetime.today().strftime('%Y-%m-%d'), periods=len(df)).tolist()
df['dates'] = datelist
df = df.set_index(['dates'])
print(df)
```

** Desired output:**

```
df2 = pd.DataFrame({'pattern':[5,3,2,1],
'column_A':[0,2,0,1],
'column_B':[1,0,1,0]})
df2 = df2.set_index(['pattern'])
print(df2)
```

** My attempts:**

I was working on a solution that includes nested for loops, where I calculate the running sums, which are reset every time the observation is zero. It also includes features like `df.apply(lambda x: x.value_counts())`

. But this is useless, to say the least, and is not yet 100% correct.

Thanks for any other suggestions!

source to share

Here's my attempt:

```
def fun(ser):
ser = ser.dropna()
ser = ser.diff().fillna(ser)
return ser.value_counts()
df.cumsum().where((df == 1) & (df != df.shift(-1))).apply(fun)
Out:
column_A column_B
1.0 1.0 NaN
2.0 NaN 1.0
3.0 2.0 NaN
5.0 NaN 1.0
```

The first part ( `df.cumsum().where((df == 1) & (df != df.shift(-1)))`

) creates cumulative amounts:

```
column_A column_B
dates
2017-08-04 NaN NaN
2017-08-05 NaN NaN
2017-08-06 3.0 NaN
2017-08-07 NaN NaN
2017-08-08 NaN 5.0
2017-08-09 NaN NaN
2017-08-10 4.0 NaN
2017-08-11 NaN NaN
2017-08-12 NaN NaN
2017-08-13 NaN 7.0
2017-08-14 NaN NaN
2017-08-15 7.0 NaN
```

So, if we ignore NaNs and accept differences, we can make a difference. This is what the function does: it discards the NaN and then takes the differences so that it is not the sum total. It finally returns values.

source to share