Reading CSV file with Pandas: complex delimiter

Question

Reading CSV file with Pandas: complex delimiter

I have a csv file that I want to read using python panda. The header and lines look like this:

 A           ^B^C^D^E  ^F          ^G           ^H^I^J^K^L^M^N

It is clear that he saw that the separator is ^, sometimes some odd spaces are encountered. How can I read this file perfectly?

I am using the following command to read the csv file:

df = pd.read_csv('input.csv', sep='^')

+3

python pandas csv

Mohammad saifullah May 14 '15 at 21:57

source to share

5 answers

Can't you use regex as delimiter?

sep = re.compile(r'[\^\s]+')

+4

Malik brahimi May 14 '15 at 22:09

source to share

Your delimiter can be a regular expression, so try something like this:

df = pd.read_csv('input.csv', sep="[ ^]+")

The regex must use any number of spaces or quotes (^) in the string as a separate separator.

+2

Zachary cross May 14 '15 at 22:08

source to share

Read the file how you did it and then strip the extra spaces for each column that is a string:

df = (pd.read_csv('input.csv', sep="^")
      .apply(lambda x: x.str.strip() if isinstance(x, str) else x))

0

Alexander May 14 '15 at 22:09

source to share

If the only white space in your file is extra space between columns (i.e. no column has raw text with spaces), a simple solution would be to just remove all spaces in the file. Example command for this:

<input.csv tr -d '[[:blank:]]' > new_input.txt

0

user2030378 May 14 '15 at 22:09

source to share

EdChum · Accepted Answer · 2015-05-14T22:09:28+0000

Use regex \s*\^

which means 0 or more spaces and ^, you have to specify the python engine here to avoid the regex support warning:

In [152]:

t="""A           ^B^C^D^E  ^F          ^G           ^H^I^J^K^L^M^N"""
df= pd.read_csv(io.StringIO(t), sep='\s*\^', engine='python')
df.columns
Out[152]:
Index(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N'], dtype='object')

Reading CSV file with Pandas: complex delimiter

More articles: