Pandas: changing values ​​in multiple columns according to boolean condition

It should be simple, but I can't wrap all sorts of ways to select and mask things in Pandas.

So, for a large dataframe (reading from a csv file), I want to change the values ​​of the column list according to some boolean condition (tested on the same selected columns).

I've already tried something like this which doesn't work due to size mismatch:

df.loc[df[my_cols]>0, my_cols] = 1

      

This also doesn't work (because I'm trying to change the values ​​in the wrong columns I think):

df[df[my_cols]>0] = 1

      

And it doesn't work because I'm only changing the copy of the dataframe:

df[my_cols][df[my_cols]>0] = 1

      

Here's the result df.info

:

Int64Index: 186171 entries, 0 to 186170
Columns: 737 entries, id to 733:zorg
dtypes: float64(734), int64(1), object(2)
memory usage: 1.0+ GB

      

Can any more advanced Pandas help the user? Thank.

+3


source to share


3 answers


So this is how I finally got the desired output, but I believe there should be a more pandas -ish solution for this task.



for col in my_cols:
    df.loc[df[col]>0, col] = 1 

      

+1


source


I'm sure there is a more elegant way, but this should work:

df = pd.DataFrame(np.random.randint(5, size=(3,4)), columns = ['a','b','c','d'])
mycols =['a','b']
cols_tochange = df.columns[df[mycols].all()>1]
df.loc[:,cols_tochange]  = 1

      



Note the usage all()

to get the condition for the whole column

0


source


Try pandas.DataFrame.where

Returns an object of the same form as self, and the corresponding entries are from themselves, where cond is True, otherwise from others.

In your case, this would become:

df[my_cols] = df[my_cols].where(~(df[my_cols]>0),other=1)

      

0


source







All Articles