Reading CSV file with Pandas: complex delimiter

I have a csv file that I want to read using python panda. The header and lines look like this:

 A           ^B^C^D^E  ^F          ^G           ^H^I^J^K^L^M^N

      

It is clear that he saw that the separator is ^, sometimes some odd spaces are encountered. How can I read this file perfectly?

I am using the following command to read the csv file:

df = pd.read_csv('input.csv', sep='^')

      

+3


source to share


5 answers


Use regex \s*\^

which means 0 or more spaces and ^, you have to specify the python engine here to avoid the regex support warning:



In [152]:

t="""A           ^B^C^D^E  ^F          ^G           ^H^I^J^K^L^M^N"""
df= pd.read_csv(io.StringIO(t), sep='\s*\^', engine='python')
df.columns
Out[152]:
Index(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N'], dtype='object')

      

+6


source


Can't you use regex as delimiter?



sep = re.compile(r'[\^\s]+')

      

+4


source


Your delimiter can be a regular expression, so try something like this:

df = pd.read_csv('input.csv', sep="[ ^]+")

      

The regex must use any number of spaces or quotes (^) in the string as a separate separator.

+2


source


Read the file how you did it and then strip the extra spaces for each column that is a string:

df = (pd.read_csv('input.csv', sep="^")
      .apply(lambda x: x.str.strip() if isinstance(x, str) else x))

      

0


source


If the only white space in your file is extra space between columns (i.e. no column has raw text with spaces), a simple solution would be to just remove all spaces in the file. Example command for this:

<input.csv tr -d '[[:blank:]]' > new_input.txt

      

0


source







All Articles