Replacing values greater than number in pandas dataframe
I have a large dataframe that looks like:
df1['A'].ix[1:3]
2017-01-01 02:00:00 [33, 34, 39]
2017-01-01 03:00:00 [3, 43, 9]
I want to replace every element greater than 9 with 11.
So the desired output for the above example is:
df1['A'].ix[1:3]
2017-01-01 02:00:00 [11, 11, 11]
2017-01-01 03:00:00 [3, 11, 9]
Edit:
My actual dataframe has about 20,000 rows and each row has a list of 2,000 in size.
Is there a way to use a function numpy.minimum
for each line? I assume it will be faster than the method list comprehension
?
source to share
You can use apply
with list comprehension
:
df1['A'] = df1['A'].apply(lambda x: [y if y <= 9 else 11 for y in x])
print (df1)
A
2017-01-01 02:00:00 [11, 11, 11]
2017-01-01 03:00:00 [3, 11, 9]
A faster solution is to convert to first numpy array
and then use : numpy.where
a = np.array(df1['A'].values.tolist())
print (a)
[[33 34 39]
[ 3 43 9]]
df1['A'] = np.where(a > 9, 11, a).tolist()
print (df1)
A
2017-01-01 02:00:00 [11, 11, 11]
2017-01-01 03:00:00 [3, 11, 9]
source to share
Hi, thanks for this solution, it helped me too, but I have another question for this. I have a CSV file with a lot of floating point values and want to do the following:
Where v value> 0.001 = 1 and where v value <-0.001 = -1. All values between (0.001; -0, 001) must be set to 0 or removed.
I tried the following for the first two steps:
import pandas as pd
import numpy as np
df = pd.read_csv('data.csv')
a = np.array(df['score'].values.tolist())
#print(a)
df['text']=np.where(a > 0.001, 1, a).tolist()
df['text']=np.where(a < -0.001, -1, a).tolist()
print(df)
With this approach I only get -1 in my list, +1 values are ignored. Can anyone help me please?
source to share