Reading csv with pandas with header comment
I have CSV files with #
in the header line:
s = '#one two three\n1 2 3'
If I use pd.read_csv
, the sign #
goes to the first heading:
import pandas as pd
from io import StringIO
pd.read_csv(StringIO(s), delim_whitespace=True)
#one two three
0 1 2 3
If I set the argument comment='#'
then it pandas
completely ignores the line.
Is there an easy way to handle this case?
The second related issue is how can I handle quoting in this case, it works without #
:
s = '"one one" two three\n1 2 3'
print(pd.read_csv(StringIO(s), delim_whitespace=True))
one one two three
0 1 2 3
it is not with #
:
s = '#"one one" two three\n1 2 3'
print(pd.read_csv(StringIO(s), delim_whitespace=True))
#"one one" two three
0 1 2 3 NaN
Thank!
++++++++++ Update
here is a test for the second example.
s = '#"one one" two three\n1 2 3'
# here I am cheating slicing the string
wanted_result = pd.read_csv(StringIO(s[1:]), delim_whitespace=True)
# is there a way to achieve the same result configuring somehow read_csv?
assert wanted_result.equals(pd.read_csv(StringIO(s), delim_whitespace=True))
source to share
You can delete the first # of your file like this:
s = u'#"one one" two three\n1 2 3'
import pandas as pd
from io import StringIO
wholefile=StringIO(s).read().split("#")[1]
pd.read_csv(StringIO(wholefile), delim_whitespace=True)
one one two three
0 1 2 3
The inconvenience is that you have to load the entire file into memory, but it works.
source to share