Counting value changes in each column in a dataframe in pandas

Is there any neat way to count the number of changes to a value in each column in a dataframe in pandas?

I don't want to loop over each column like:

import pandas as pd

frame = pd.DataFrame({
    'time':[1234567000,1234567005,1234567009],
    'X1':[96.32,96.01,96.05],
    'X2':[23.88,23.96,23.96]
},columns=['time','X1','X2']) 

print(frame)

changes = []
for column_name in frame.columns.values:
    print('column_name: {0}'.format(column_name))
    changes.append(sum(frame[column_name]!=frame[column_name].shift(1)))

print('changes: {0}'.format(changes))

      

returns:

         time     X1     X2
0  1234567000  96.32  23.88
1  1234567005  96.01  23.96
2  1234567009  96.05  23.96
column_name: time
column_name: X1
column_name: X2
changes: [3, 3, 2]

      

+1


source to share


1 answer


If the values ​​are numeric, you can use the differences between adjacent rows and check if the difference is different. Then take the sum down each column to count the number of changes in the value:

In [48]: (frame.diff(axis=0) != 0).sum(axis=0)
Out[48]: 
time    3
X1      3
X2      2
dtype: int64

      

If the values ​​are not necessarily numeric, then a more general way is to compare frame

with yourself shift

down one row - this is similar to the code you posted, except that the operation is performed on the entire DataFrame instead of column by column:



In [50]: (frame != frame.shift(axis=0)).sum(axis=0)
Out[50]: 
time    3
X1      3
X2      2
dtype: int64

      

The numeric version is faster, the shifted version is more reliable.

+2


source







All Articles