Pandas read_csv: ignore trailing lines with empty data

Question

Pandas read_csv: ignore trailing lines with empty data

I would like to read the following data from a csv file:

id;type;start;end
Test;OIS;01/07/2016;01/07/2018
;;;
;;;

However, pandas read_csv will try to read empty lines as well ;;;

. Is there a way to automatically ignore these trailing blank data lines?

These lines are causing a problem because I am using read_csv

c converters

, and the functions in the converters dutifully throw an exception when they encounter invalid data, which means I don't even get to a valid dataframe. I could change the functions to convert invalid data to NaN

and then discard NaN

from the dataframe, but then I would be silent about discarding the erroneous data as well as those blank lines.

Some clarifications:

Blank data lines will always end, this is a common problem with csv files generated from Excel.
The data is user generated, so manual cleaning is not an option.

+3

python pandas

Anne 01 jul. 15 at 11:23

source to share

2 answers

Padraic cunningham · Answer 1 · 2015-07-01T11:30:06+0000

Not sure if you can do this directly with read_csv, but you can use dropna:

import pandas as pd

df= pd.read_csv("in.csv", delimiter=";")
df.dropna(how="all", inplace=True) 
print(df)

EdChum · Answer 2 · 2015-07-01T12:27:45+0000

If you know you want to ignore the last two lines, you can pass param skipfooter=2

:

In [197]:
t="""id;type;start;end
Test;OIS;01/07/2016;01/07/2018
;;;
;;;"""
df = pd.read_csv(io.StringIO(t), sep=';', skipfooter=2)
df

Out[197]:
     id type       start         end
0  Test  OIS  01/07/2016  01/07/2018

Pandas read_csv: ignore trailing lines with empty data

More articles: