Counting value changes in each column in a dataframe in pandas
Is there any neat way to count the number of changes to a value in each column in a dataframe in pandas?
I don't want to loop over each column like:
import pandas as pd
frame = pd.DataFrame({
'time':[1234567000,1234567005,1234567009],
'X1':[96.32,96.01,96.05],
'X2':[23.88,23.96,23.96]
},columns=['time','X1','X2'])
print(frame)
changes = []
for column_name in frame.columns.values:
print('column_name: {0}'.format(column_name))
changes.append(sum(frame[column_name]!=frame[column_name].shift(1)))
print('changes: {0}'.format(changes))
returns:
time X1 X2
0 1234567000 96.32 23.88
1 1234567005 96.01 23.96
2 1234567009 96.05 23.96
column_name: time
column_name: X1
column_name: X2
changes: [3, 3, 2]
source to share
If the values are numeric, you can use the differences between adjacent rows and check if the difference is different. Then take the sum down each column to count the number of changes in the value:
In [48]: (frame.diff(axis=0) != 0).sum(axis=0)
Out[48]:
time 3
X1 3
X2 2
dtype: int64
If the values are not necessarily numeric, then a more general way is to compare frame
with yourself shift
down one row - this is similar to the code you posted, except that the operation is performed on the entire DataFrame instead of column by column:
In [50]: (frame != frame.shift(axis=0)).sum(axis=0)
Out[50]:
time 3
X1 3
X2 2
dtype: int64
The numeric version is faster, the shifted version is more reliable.
source to share