DataFrame masking using multiple criteria

I know it is possible to mask certain lines in a dataframe, for example

(1) mask = df['A']=='a'

      

where df is a data frame having a column named "A". Calling df [mask] gives my new "masked" DataFrame.

You can of course also use multiple criteria with

(2) mask = (df['A']=='a') | (df['A']=='b')

      

However, this last step can be a little tedious if multiple criteria need to be met, such as

(3) mask = (df['A']=='a') | (df['A']=='b') | (df['A']=='c') | (df['A']=='d') | ...

      

Now let's say I have filtering criteria in an array as

(4) filter = ['a', 'b', 'c', 'd', ...]
    # ... here means a lot of other criteria

      

Is there a way to get the same result as in (3) above using a one-liner?

Something like:

(5) mask = df.where(df['A']==filter)
    df_new = df[mask]

      

In this case (5) obviously returns an error.

+3


source to share


1 answer


I would use Series.isin()

:

filter = ['a', 'b', 'c', 'd']
df_new = df[df["A"].isin(filter)]

      



df_new

is a DataFrame with rows where the record df["A"]

appears in filter

.

+4


source







All Articles