Remove lines containing a specific pattern [Python / Pandas]

I am new to Python and Pandas, I spent a lot of time searching but could not find an answer to my specific problem.

I have a dataframe where the first few lines are just comments starting with '#' followed by a normal dataframe containing rows and columns. I have hundreds of such text files that I need to read and manipulate. For example:

'#' blah1

'#' blah2

'#' blah3

Column1 Column2 Column3

a1 b1 c1

a2 b2 c2

and etc.

I want to delete all lines starting with '#'. Can anyone tell me how to do this in Pandas, preferably?

Alternatively, I tried using the following code to read in a text file:

my_input=pd.read_table(filename, comment='#', header=80)

      

But the problem was that the header line is different for every text file. Is there a way to generalize and tell Python that my title is below this last line that starts with '#'?

+3


source to share


1 answer


Upgrading to pandas 0.14.1 or higher allows you to properly skip the commented lines.

Older versions will leave strings as NaNs that can be dropped with .dropna (), but leave a broken header.

For older versions of pandas, you can use "skiprows", assuming you know how many lines are commented out.

In work [3]:



s = "# blah1\n# blah2\n# blah3\nCol1 Col2 Col3\na1 b1 c1\na2 b2 c2\n"
pd.read_table(StringIO(s), skiprows=3, sep=' ')

      

Out [3]:

Col1    Col2    Col3
0   a1  b1  c1
1   a2  b2  c2

      

+3


source







All Articles