Pandas Series - Recording Numeric Changes

I have a panel dataframe

with many observations of people's location data over 10 years. It looks something like this:

     personid     location_1991   location_1992  location_1993  location_1994 
0    111          1               1             2              2 
1    233          3               3             4              999  
2    332          1               3             3               3 
3    454          2               2             2               2             
4    567          2               1             1               1

      

I want to track each person's transitions by creating a variable for each transition type. I would like the column to be checked whenever a person navigates to each type of location. Ideally it would look like this:

     personid     transition_to_1    transition_to_2   transition_to_3   transition_to_4       
0    111          0                  1                 0                 0 
1    233          0                  0                 0                 1  
2    332          0                  0                 1                 0 
3    454          0                  0                 0                 0             
4    567          1                  0                 0                 0

      

So far, I've tried to iterate over each line and then loop through each element in the line to check if it's the same as the previous one. It seems intense. Is there a better way to keep track of the changing values ​​on each line of my frame?

+3


source to share


1 answer


I did some combination of first stacking these columns and then unfolding along them.

df = pd.DataFrame(pd.read_clipboard())
df2 = pd.DataFrame(df.set_index('personid').stack(), columns=['location'])
df2.reset_index(inplace=True)
df2.reset_index(inplace=True)
df3 = df2.pivot(index='index', columns='location', values='personid')
df3 = df3.fillna(0)

      

So far it looks like this:



location  1    2    3    4    999
index                            
0         111    0    0    0    0
1         111    0    0    0    0
2           0  111    0    0    0
3           0  111    0    0    0
4           0    0  233    0    0
5           0    0  233    0    0
6           0    0    0  233    0
7           0    0    0    0  233
8         332    0    0    0    0
9           0    0  332    0    0
10          0    0  332    0    0
11          0    0  332    0    0
12          0  454    0    0    0
13          0  454    0    0    0
14          0  454    0    0    0
15          0  454    0    0    0
16          0  567    0    0    0
17        567    0    0    0    0
18        567    0    0    0    0
19        567    0    0    0    0

df3['personid'] = df3.max(axis=0, skipna=True)
df3 = df3.set_index('personid', drop=True)
df3[df3 > 0] = 1

      

And here it is:

location  1    2    3    4    999
personid                         
111         1    0    0    0    0
567         1    0    0    0    0
567         0    1    0    0    0
332         0    1    0    0    0
233         0    0    1    0    0
233         0    0    1    0    0
233         0    0    0    1    0
233         0    0    0    0    1
332         1    0    0    0    0
332         0    0    1    0    0
332         0    0    1    0    0
332         0    0    1    0    0
454         0    1    0    0    0
454         0    1    0    0    0
454         0    1    0    0    0
454         0    1    0    0    0
567         0    1    0    0    0
567         1    0    0    0    0
567         1    0    0    0    0
567         1    0    0    0    0

      

+2


source







All Articles