More efficient pandas python command to delete Nan lines?

I have a DF called TI. I want to dump lines where BookDate is NaN. So, I run:

TI = TI.dropna(subset=['#Book_Date'])

      

When I run this the memory gets eaten up for some reason (I am on 100GB of RAM and about 50% of the RAM is used to store TI, when I run this Dropna line it goes 100% use and never finished executing the command ). Does it make a brand new copy? TI is a 64mm data frame, so it should be more efficient.

+3


source to share


1 answer


Chances are, the best way to do this is to deal with the fact that the column should be finite. You will need numpy for this.



from pandas import *
import numpy

TI = TI[np.isfinite(TI['#Book_Date'])]

      

0


source







All Articles