Remove lines containing a specific pattern [Python / Pandas]
I am new to Python and Pandas, I spent a lot of time searching but could not find an answer to my specific problem.
I have a dataframe where the first few lines are just comments starting with '#' followed by a normal dataframe containing rows and columns. I have hundreds of such text files that I need to read and manipulate. For example:
'#' blah1
'#' blah2
'#' blah3
Column1 Column2 Column3
a1 b1 c1
a2 b2 c2
and etc.
I want to delete all lines starting with '#'. Can anyone tell me how to do this in Pandas, preferably?
Alternatively, I tried using the following code to read in a text file:
my_input=pd.read_table(filename, comment='#', header=80)
But the problem was that the header line is different for every text file. Is there a way to generalize and tell Python that my title is below this last line that starts with '#'?
Upgrading to pandas 0.14.1 or higher allows you to properly skip the commented lines.
Older versions will leave strings as NaNs that can be dropped with .dropna (), but leave a broken header.
For older versions of pandas, you can use "skiprows", assuming you know how many lines are commented out.
In work [3]:
s = "# blah1\n# blah2\n# blah3\nCol1 Col2 Col3\na1 b1 c1\na2 b2 c2\n"
pd.read_table(StringIO(s), skiprows=3, sep=' ')
Out [3]:
Col1 Col2 Col3
0 a1 b1 c1
1 a2 b2 c2