Counting changes in line items

I am dealing with a dataset with rows in a column and I need to count the number of changes in a dataframe for that column. Therefore, if the data frame was grouped by the "id" column, one instance of the group would look like this:

    id    vehicle
   'abc'  'bmw'
   'abc'  'bmw'
   'abc'  'yamaha'
   'abc'  'suzuki'
   'abc'  'suzuki'
   'abc'  'kawasaki'

      

So, in this case, I would like to say that id 'abc' changed the car brand 3 times. Is there an efficient way to do this over multiple groups for the "id" column?

+3


source to share


1 answer


I can imagine two ways:

1) groupby

in the 'id' and call apply

on the "vehicle" column and the transfer method nunique

, you need to subtract 1 as you are looking for a change, not just a total unique score:

In [292]:
df.groupby('id')['vehicle'].nunique() -1

Out[292]:
id
'abc'    3
Name: vehicle, dtype: int64

      

2) a apply

lambda that checks if the current car is the same as the previous car using shift

, this is more semantically correct, since it detects changes, not just a generic unique count, calling sum

on booleans converts True

both False

to 1

and 0

respectively:



In [293]:
df.groupby('id')['vehicle'].apply(lambda x: x != x.shift()).sum() - 1

Out[293]:
3

      

It is required -1

as for the first string to be compared to a string that does not exist, and comparisons to NaN

are meaningless in this case, see below:

In [301]:
df.groupby('id')['vehicle'].apply(lambda x: x != x.shift())

Out[301]:
0     True
1    False
2     True
3     True
4    False
5     True
Name: 'abc', dtype: bool

      

+2


source