Pandas: changing values in multiple columns according to boolean condition
It should be simple, but I can't wrap all sorts of ways to select and mask things in Pandas.
So, for a large dataframe (reading from a csv file), I want to change the values of the column list according to some boolean condition (tested on the same selected columns).
I've already tried something like this which doesn't work due to size mismatch:
df.loc[df[my_cols]>0, my_cols] = 1
This also doesn't work (because I'm trying to change the values in the wrong columns I think):
df[df[my_cols]>0] = 1
And it doesn't work because I'm only changing the copy of the dataframe:
df[my_cols][df[my_cols]>0] = 1
Here's the result df.info
:
Int64Index: 186171 entries, 0 to 186170
Columns: 737 entries, id to 733:zorg
dtypes: float64(734), int64(1), object(2)
memory usage: 1.0+ GB
Can any more advanced Pandas help the user? Thank.
source to share
I'm sure there is a more elegant way, but this should work:
df = pd.DataFrame(np.random.randint(5, size=(3,4)), columns = ['a','b','c','d'])
mycols =['a','b']
cols_tochange = df.columns[df[mycols].all()>1]
df.loc[:,cols_tochange] = 1
Note the usage all()
to get the condition for the whole column
source to share