More efficient pandas python command to delete Nan lines?

Question

More efficient pandas python command to delete Nan lines?

I have a DF called TI. I want to dump lines where BookDate is NaN. So, I run:

TI = TI.dropna(subset=['#Book_Date'])

When I run this the memory gets eaten up for some reason (I am on 100GB of RAM and about 50% of the RAM is used to store TI, when I run this Dropna line it goes 100% use and never finished executing the command ). Does it make a brand new copy? TI is a 64mm data frame, so it should be more efficient.

+3

python pandas nan

robertevansanders 03 nov. 14 at 17:12

source to share

1 answer

PhysicalChemist · Answer 1 · 2015-03-06T18:09:49+0000

Chances are, the best way to do this is to deal with the fact that the column should be finite. You will need numpy for this.

from pandas import *
import numpy

TI = TI[np.isfinite(TI['#Book_Date'])]

More efficient pandas python command to delete Nan lines?

More articles: