Checking which rows contain a value efficiently

I am trying to write a function that checks for the existence of a value in a row by columns. I have a script that does this by iterating over columns, but I am worried it will be inefficient when used on large datasets.

Here is my current code:

import pandas as pd

a = [1, 2, 3, 4]
b = [2, 3, 3, 2]
c = [5, 6, 1, 3]
d = [1, 0, 0, 99]

df = pd.DataFrame({'a': a,
                  'b': b,
                  'c': c,
                  'd': d})

cols = ['a', 'b', 'c', 'd']
df['e'] = 0
for col in cols:
    df['e'] = df['e'] + df[col] == 1
print(df)

      

result:

   a  b  c   d      e
0  1  2  5   1   True
1  2  3  6   0  False
2  3  3  1   0   True
3  4  2  3  99  False

      

As you can see, column e stores a record of whether the value "1" exists on that row. I was wondering if there is a better / more efficient way to achieve these results.

+3


source to share


2 answers


You can check if the values ​​in the dataframe are equal to one and see if they are true in the row (with axis = 1):



df['e'] = df.eq(1).any(1)
df
#   a   b   c   d   e
#0  1   2   5   1   True
#1  2   3   6   0   False
#2  3   3   1   0   True
#3  4   2   3   99  False

      

+4


source


Python supports 'in' and 'not in'.

Example:



>>> a = [1, 2, 5, 1]
>>> b = [2, 3, 6, 0]
>>> c = [5, 6, 1, 3]
>>> d = [1, 0, 0, 99]
>>> 1 in a
True
>>> 1 not in a
False
>>> 99 in d
True
>>> 99 not in d
False

      

Using this, you don't need to iterate over the array yourself for this case.

0


source







All Articles