Replacing values ​​greater than number in pandas dataframe

I have a large dataframe that looks like:

df1['A'].ix[1:3]
2017-01-01 02:00:00    [33, 34, 39]
2017-01-01 03:00:00    [3, 43, 9]

      

I want to replace every element greater than 9 with 11.

So the desired output for the above example is:

df1['A'].ix[1:3]
2017-01-01 02:00:00    [11, 11, 11]
2017-01-01 03:00:00    [3, 11, 9]

      

Edit:

My actual dataframe has about 20,000 rows and each row has a list of 2,000 in size.

Is there a way to use a function numpy.minimum

for each line? I assume it will be faster than the method list comprehension

?

+13


source to share


5 answers


You can use apply

with list comprehension

:

df1['A'] = df1['A'].apply(lambda x: [y if y <= 9 else 11 for y in x])
print (df1)
                                A
2017-01-01 02:00:00  [11, 11, 11]
2017-01-01 03:00:00    [3, 11, 9]

      



A faster solution is to convert to first numpy array

and then use : numpy.where

a = np.array(df1['A'].values.tolist())
print (a)
[[33 34 39]
 [ 3 43  9]]

df1['A'] = np.where(a > 9, 11, a).tolist()
print (df1)
                                A
2017-01-01 02:00:00  [11, 11, 11]
2017-01-01 03:00:00    [3, 11, 9]

      

+10


source


Very simple: df[df > 9] = 11



+9


source


You can use digital indexing, accessed .values

through the function .values

.

df['col'].values[df['col'].values > x] = y

where you replace any value greater than x with a y value.

So, for the example in the question:

df1['A'].values[df1['A'] > 9] = 11

+2


source


Hi, thanks for this solution, it helped me too, but I have another question for this. I have a CSV file with a lot of floating point values ​​and want to do the following:

Where v value> 0.001 = 1 and where v value <-0.001 = -1. All values ​​between (0.001; -0, 001) must be set to 0 or removed.

I tried the following for the first two steps:

import pandas as pd
import numpy as np

df = pd.read_csv('data.csv')

a = np.array(df['score'].values.tolist())
#print(a)

df['text']=np.where(a > 0.001, 1, a).tolist()
df['text']=np.where(a < -0.001, -1, a).tolist()
print(df)

      

With this approach I only get -1 in my list, +1 values ​​are ignored. Can anyone help me please?

0


source


I came up with a solution to replace every element greater than h with 1, otherwise 0, which has a simple solution:

df = (df > h) * 1

      

(This doesn't solve the OP's question, since all df & lt; = h are replaced with 0.)

0


source







All Articles