Replacing values greater than number in pandas dataframe

Question

Replacing values greater than number in pandas dataframe

I have a large dataframe that looks like:

df1['A'].ix[1:3]
2017-01-01 02:00:00    [33, 34, 39]
2017-01-01 03:00:00    [3, 43, 9]

I want to replace every element greater than 9 with 11.

So the desired output for the above example is:

df1['A'].ix[1:3]
2017-01-01 02:00:00    [11, 11, 11]
2017-01-01 03:00:00    [3, 11, 9]

Edit:

My actual dataframe has about 20,000 rows and each row has a list of 2,000 in size.

Is there a way to use a function numpy.minimum

for each line? I assume it will be faster than the method list comprehension

?

+13

python database pandas

Zanam 03 May '17 at 10:52

source to share

5 answers

Very simple: df[df > 9] = 11

+9

Edouard cuny 02 oct. '18 at 9:10

source to share

You can use digital indexing, accessed .values

through the function .values

.

df['col'].values[df['col'].values > x] = y

where you replace any value greater than x with a y value.

So, for the example in the question:

df1['A'].values[df1['A'] > 9] = 11

+2

D.Griffiths Jan 29. At 17:06

source to share

Hi, thanks for this solution, it helped me too, but I have another question for this. I have a CSV file with a lot of floating point values and want to do the following:

Where v value> 0.001 = 1 and where v value <-0.001 = -1. All values between (0.001; -0, 001) must be set to 0 or removed.

I tried the following for the first two steps:

import pandas as pd
import numpy as np

df = pd.read_csv('data.csv')

a = np.array(df['score'].values.tolist())
#print(a)

df['text']=np.where(a > 0.001, 1, a).tolist()
df['text']=np.where(a < -0.001, -1, a).tolist()
print(df)

With this approach I only get -1 in my list, +1 values are ignored. Can anyone help me please?

0

Fabs June 27. At 14:19

source to share

I came up with a solution to replace every element greater than h with 1, otherwise 0, which has a simple solution:

df = (df > h) * 1

(This doesn't solve the OP's question, since all df & lt; = h are replaced with 0.)

0

CFW Sep 18 At 8:07 am

source to share

jezrael · Accepted Answer · 2017-05-03T10:55:33+0000

You can use apply

with list comprehension

:

df1['A'] = df1['A'].apply(lambda x: [y if y <= 9 else 11 for y in x])
print (df1)
                                A
2017-01-01 02:00:00  [11, 11, 11]
2017-01-01 03:00:00    [3, 11, 9]

A faster solution is to convert to first numpy array

and then use : numpy.where

a = np.array(df1['A'].values.tolist())
print (a)
[[33 34 39]
 [ 3 43  9]]

df1['A'] = np.where(a > 9, 11, a).tolist()
print (df1)
                                A
2017-01-01 02:00:00  [11, 11, 11]
2017-01-01 03:00:00    [3, 11, 9]

Replacing values ​​greater than number in pandas dataframe

More articles:

Replacing values greater than number in pandas dataframe