Pandas read_csv: ignore trailing lines with empty data
I would like to read the following data from a csv file:
id;type;start;end
Test;OIS;01/07/2016;01/07/2018
;;;
;;;
However, pandas read_csv will try to read empty lines as well ;;;
. Is there a way to automatically ignore these trailing blank data lines?
These lines are causing a problem because I am using read_csv
c converters
, and the functions in the converters dutifully throw an exception when they encounter invalid data, which means I don't even get to a valid dataframe. I could change the functions to convert invalid data to NaN
and then discard NaN
from the dataframe, but then I would be silent about discarding the erroneous data as well as those blank lines.
Some clarifications:
- Blank data lines will always end, this is a common problem with csv files generated from Excel.
- The data is user generated, so manual cleaning is not an option.
Not sure if you can do this directly with read_csv, but you can use dropna:
import pandas as pd
df= pd.read_csv("in.csv", delimiter=";")
df.dropna(how="all", inplace=True)
print(df)
If you know you want to ignore the last two lines, you can pass param skipfooter=2
:
In [197]:
t="""id;type;start;end
Test;OIS;01/07/2016;01/07/2018
;;;
;;;"""
df = pd.read_csv(io.StringIO(t), sep=';', skipfooter=2)
df
Out[197]:
id type start end
0 Test OIS 01/07/2016 01/07/2018