How to get a panda column returning minimum values between another column and min_val
I am using pandas framework and I would like to create a column containing the minimum values between another column and min_val
. I created a simplified problem below:
import pandas as pd
import numpy as np
min_val = 0.5
np.random.seed(100)
df = pd.DataFrame(np.random.rand(10, 4), columns=['col{0}'.format(i) for i in range(1, 5)])
df['col_4_min'] = df['col4'].apply(lambda x: min(x, min_val))
df
col1 col2 col3 col4 col_4_min
0 0.7425 0.6302 0.5818 0.0204 0.0204
1 0.2100 0.5447 0.7691 0.2507 0.2507
2 0.2859 0.8524 0.9750 0.8849 0.5000
3 0.3595 0.5989 0.3548 0.3402 0.3402
4 0.1781 0.2377 0.0449 0.5054 0.5000
5 0.3763 0.5928 0.6299 0.1426 0.1426
6 0.9338 0.9464 0.6023 0.3878 0.3878
7 0.3632 0.2043 0.2768 0.2465 0.2465
8 0.1736 0.9666 0.9570 0.5980 0.5000
9 0.7313 0.3404 0.0921 0.4635 0.4635
The problem with this method is that I will be doing the computation on a data frame containing an incredibly large number of rows, so the computation should be fast (and therefore apply
not privileged in my case)
source to share
Use np.minimum
to compare scalar (or array) with your col
In [94]:
min_val = 0.5
df['col_4_min'] = np.minimum(min_val, df['col4'].values)
df
Out[94]:
col1 col2 col3 col4 col_4_min
0 0.7425 0.6302 0.5818 0.0204 0.0204
1 0.2100 0.5447 0.7691 0.2507 0.2507
2 0.2859 0.8524 0.9750 0.8849 0.5000
3 0.3595 0.5989 0.3548 0.3402 0.3402
4 0.1781 0.2377 0.0449 0.5054 0.5000
5 0.3763 0.5928 0.6299 0.1426 0.1426
6 0.9338 0.9464 0.6023 0.3878 0.3878
7 0.3632 0.2043 0.2768 0.2465 0.2465
8 0.1736 0.9666 0.9570 0.5980 0.5000
9 0.7313 0.3404 0.0921 0.4635 0.4635
Thanks to @Divakar that calling df['col4'].values
will speed this up even more than usingclip
source to share