Add a column "flag" about whether one identifier has specific values ​​in one column

The information frame looks like:

 In [1]: df
 Out[2]:
          userid type
    0      1       1
    1      1       2
    2      2       1
    3      3       1
    4      3       2
    5      3       3    

      

Now I want to add a column to it about whether the user id has specific values ​​in type columns (like type1 and type2). This is what I want to get:

 In [1]: df
 Out[2]:
          userid type  has_type_12
    0      1       1      1
    1      1       2      1
    2      2       1      0
    3      3       1      1
    4      3       2      1
    5      3       3      1

      

Is there a quick fix for this?


I have abandoned one situation where userID 3 can have more types, such as 3 or 4. In this case, I would like to mark has_type_12 = 1 for 3. I changed the input and desired output above.

+3


source to share


4 answers


In [308]: df['has_type_12'] = \
              df.groupby('userid')['type'].transform(lambda x: x[x.isin([1,2])].nunique() == 2)

In [309]: df
Out[309]:
   userid  type  has_type_12
0       1     1            1
1       1     2            1
2       2     1            0
3       3     1            1
4       3     2            1
5       3     3            1

      



+4


source


Use groupby

+ transform

with set

s:

cats = [1,2]
df['has_type_12'] = df.groupby('userid')['type'] \
                      .transform(lambda x: set(x) >= set((cats))) \
                      .astype(int) 
print (df)
   userid  type  has_type_12
0       1     1            1
1       1     2            1
2       2     1            0
3       3     1            1
4       3     2            1
5       3     3            1

      



Another solution with double any

(if only a few categories):

cats = [1,2]
df['has_type_12'] = df.groupby('userid')['type'] \
                      .transform(lambda x: ((x == 1).any()) & ((x == 2).any())) \
                      .astype(int) 
print (df)
   userid  type  has_type_12
0       1     1            1
1       1     2            1
2       2     1            0
3       3     1            1
4       3     2            1
5       3     3            1

      

+3


source


When used, the set

operator >=

checks if the right side is a subset of the left side. I am using the method ge

as a proxy for>=

Using groupby

m = df.groupby('userid').type.apply(set)
df.assign(
  has_type_12=df.userid.map(m).ge({1, 2}).astype(int)
)

   userid  type  has_type_12
0       1     1            1
1       1     2            1
2       2     1            0
3       3     1            1
4       3     2            1
5       3     3            1

      

Using defaultdict

from collections import defaultdict

d = defaultdict(set)
[d[k].add(v) for k, v in zip(df.userid.values.tolist(), df.type.values.tolist())];
df.assign(has_type_12=df.userid.map(d).ge({1, 2}).astype(int))

   userid  type  has_type_12
0       1     1            1
1       1     2            1
2       2     1            0
3       3     1            1
4       3     2            1
5       3     3            1

      


Timing
big data

np.random.seed([3,1415])
df = pd.DataFrame(dict(
        userid=np.random.randint(1000, size=100000),
        type=np.random.randint(100, size=100000)
    ))

%%timeit
d = defaultdict(set)
[d[k].add(v) for k, v in zip(df.userid.values.tolist(), df.type.values.tolist())];
df.userid.map(d).ge({1, 2}).astype(int)
10 loops, best of 3: 55.6 ms per loop

%%timeit 
m = df.groupby('userid').type.apply(set)
df.userid.map(m).ge({1, 2}).astype(int)
10 loops, best of 3: 76.1 ms per loop

%timeit df.groupby('userid')['type'] \
                      .transform(lambda x: set(x) >= set((cats))) \
                      .astype(int)
1 loop, best of 3: 206 ms per loop

      

+2


source


Use groupby and accept unique type identifiers for each use, then check if it contains {1,2}.

df['has_type_12'] = df.groupby('userid')['type']\
  .apply(lambda x: set(x).issuperset({1,2}))\
  .astype(int).values[df.userid]

      

0


source







All Articles