Separate comma and how to exclude comma from quotes in split ... python
Solution using re.split () function :
import re
cStr = '"aaaa","bbbb","ccc,ddd"'
newStr = re.split(r',(?=")', cStr)
print newStr
Output:
['"aaaa"', '"bbbb"', '"ccc,ddd"']
,(?=")
- positive positive statement, ensures that the separator is ,
followed by a double quote"
source to share
In this case, it is better to use a regular expression.
re.findall('".*?"', cStr)
returns exactly what you need
The asterisk is a greedy wildcard, if you used '".*"'
it would return the maximum match, i.e. everything between the first and the very last double quote. The question mark makes it non-greedy, so it '".*?"'
returns the smallest possible match.
source to share
pyparsing has a built-in expression commaSeparatedList
:
cStr = '"aaaa","bbbb","ccc,ddd"'
import pyparsing as pp
print(pp.commaSeparatedList.parseString(cStr).asList())
prints:
['"aaaa"', '"bbbb"', '"ccc,ddd"']
You can also add a parsing action to strip those double quotes (since you probably just want the content, not the quotes):
csv_line = pp.commaSeparatedList.copy().addParseAction(pp.tokenMap(lambda s: s.strip('"')))
print(csv_line.parseString(cStr).asList())
gives:
['aaaa', 'bbbb', 'ccc,ddd']
source to share
You need a parser. You can create your own, or you can click on one of the library services. In this case, json
may be (ab).
import json
cStr = '"aaaa","bbbb","ccc,ddd"'
jstr = '[' + cStr + ']'
result = json.loads( jstr) # ['aaaa', 'bbbb', 'ccc,ddd']
result = [ '"'+r+'"' for r in result ] # ['"aaaa"', '"bbbb"', '"ccc,ddd"']
source to share