Using np.where but keeping exisitng values if condition is False

Question

Using np.where but keeping exisitng values if condition is False

I love np.where but still haven't managed to fully grab it.

I have a dataframe lets say it looks like this:

import pandas as pd
import numpy as np
from numpy import nan as NA
DF = pd.DataFrame({'a' : [ 3, 0, 1, 0, 1, 14, 2, 0, 0, 0, 0],
                   'b' : [ 3, 0, 1, 0, 1, 14, 2, 0, 0, 0, 0],
                   'c' : [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                   'd' : [5, 1, 2 ,1, 1 ,22, 30, 1, 0, 0, 0]})

Now what I want to do is replace the 0 values with NaN values when all row values are zero. Critically, I want to support any other values in a string in cases where all string values are non-zero.

I want to do something like this:

cols = ['a', 'b', 'c', 'd']
condition = (DF[cols] == 0).all(axis=1)
for col in cols:
    DF[col] = np.where(condition, NA, ???)

I put??? to indicate that I don't know what value is there, if this condition is false, I just want to keep what is already there. Is this possible with np.where or should I use a different technique?

+6

numpy pandas where

Woody pride 08 Sep At 4:19 am

source to share

2 answers

You can do something like this:

    array_binary = np.where(array[i]<threshold,0,1)
    array_sparse = np.multiply(array_binary,np.ones_like(array))

perform elementwise multiplication of a binary array and an array of ones using np.multiply. Hence, non-zero elements will be restored / saved. array_sparse is a rare version of an array

0

Arvind subramaniam 02 Sep At 11:43

source to share

JaminSore · Accepted Answer · 2014-09-08T04:40:17+0000

There is a method for this task pandas.Series

(by the way where

). Seems a bit backward at first, but from the documentation.

Series.where (cond, other = nan, inplace = False, axis = None, level = None, try_cast = False, raise_on_error = True)

Return an object of the same form as self and whose corresponding entries are from self, where cond is True, otherwise from others.

So your example would become

cols = ['a', 'b', 'c', 'd']
condition = (DF[cols] == 0).all(axis=1)
for col in cols:
    DF[col].where(~condition, np.nan, inplace=True)

But if all you are trying to do is replace the rows of all zeros for a specific set of columns with NA

, you can do this instead

DF.loc[condition, cols] = NA

EDIT

To answer the original question np.where

follows the same broadcasting rules as other array operations, so you should replace ???

with DF[col]

, changing your example to:

cols = ['a', 'b', 'c', 'd']
condition = (DF[cols] == 0).all(axis=1)
for col in cols:
    DF[col] = np.where(condition, NA, DF[col])

Using np.where but keeping exisitng values ​​if condition is False

More articles:

Using np.where but keeping exisitng values if condition is False