Separate comma and how to exclude comma from quotes in split ... python

python 2.7 code

cStr = '"aaaa","bbbb","ccc,ddd"' 

newStr = cStr.split(',')

print newStr 

# result : ['"aaaa"','"bbbb"','"ccc','ddd"' ]

      

but, i want this result.

result = ['"aaa"','"bbb"','"ccc,ddd"'] 

      

Reference..

+5


source to share


6 answers


Solution using re.split () function :

import re

cStr = '"aaaa","bbbb","ccc,ddd"'
newStr = re.split(r',(?=")', cStr)

print newStr

      

Output:



['"aaaa"', '"bbbb"', '"ccc,ddd"']

      


,(?=")

- positive positive statement, ensures that the separator is ,

followed by a double quote"

+8


source


Try using CSV.

import csv
cStr = '"aaaa","bbbb","ccc,ddd"'
newStr = [ '"{}"'.format(x) for x in list(csv.reader([cStr], delimiter=',', quotechar='"'))[0] ]

print newStr

      



Check Python is parsing CSV ignoring double quoted comma

+5


source


In this case, it is better to use a regular expression. re.findall('".*?"', cStr)

returns exactly what you need

The asterisk is a greedy wildcard, if you used '".*"'

it would return the maximum match, i.e. everything between the first and the very last double quote. The question mark makes it non-greedy, so it '".*?"'

returns the smallest possible match.

+1


source


pyparsing has a built-in expression commaSeparatedList

:

cStr = '"aaaa","bbbb","ccc,ddd"' 
import pyparsing as pp
print(pp.commaSeparatedList.parseString(cStr).asList())

      

prints:

['"aaaa"', '"bbbb"', '"ccc,ddd"']

      

You can also add a parsing action to strip those double quotes (since you probably just want the content, not the quotes):

csv_line = pp.commaSeparatedList.copy().addParseAction(pp.tokenMap(lambda s: s.strip('"')))
print(csv_line.parseString(cStr).asList())

      

gives:

['aaaa', 'bbbb', 'ccc,ddd']

      

+1


source


You need a parser. You can create your own, or you can click on one of the library services. In this case, json

may be (ab).

import json

cStr = '"aaaa","bbbb","ccc,ddd"' 
jstr = '[' + cStr + ']'
result = json.loads( jstr)             # ['aaaa', 'bbbb', 'ccc,ddd']
result = [ '"'+r+'"' for r in result ] # ['"aaaa"', '"bbbb"', '"ccc,ddd"']

      

0


source


You can split the string by first "

, then filter ''

or ','

format it permanently, this might be the simplest way:

['"%s"' % s for s in cStr.split('"') if s and s != ',']

      

-1


source







All Articles