Pandas read_csv: ignore trailing lines with empty data

I would like to read the following data from a csv file:

id;type;start;end
Test;OIS;01/07/2016;01/07/2018
;;;
;;;

      

However, pandas read_csv will try to read empty lines as well ;;;

. Is there a way to automatically ignore these trailing blank data lines?

These lines are causing a problem because I am using read_csv

c converters

, and the functions in the converters dutifully throw an exception when they encounter invalid data, which means I don't even get to a valid dataframe. I could change the functions to convert invalid data to NaN

and then discard NaN

from the dataframe, but then I would be silent about discarding the erroneous data as well as those blank lines.

Some clarifications:

  • Blank data lines will always end, this is a common problem with csv files generated from Excel.
  • The data is user generated, so manual cleaning is not an option.
+3


source to share


2 answers


Not sure if you can do this directly with read_csv, but you can use dropna:



import pandas as pd

df= pd.read_csv("in.csv", delimiter=";")
df.dropna(how="all", inplace=True) 
print(df)

      

+1


source


If you know you want to ignore the last two lines, you can pass param skipfooter=2

:



In [197]:
t="""id;type;start;end
Test;OIS;01/07/2016;01/07/2018
;;;
;;;"""
df = pd.read_csv(io.StringIO(t), sep=';', skipfooter=2)
df

Out[197]:
     id type       start         end
0  Test  OIS  01/07/2016  01/07/2018

      

0


source







All Articles