Parse the string as a list of tuples
Entrance: '(tagname1, tagvalue1),(tagname2,tagvalue2), ( tagname3, tagvalue3 ), (tag name4,tag value4)'
Output: [("tagname1", "tagvalue1"), ("tagname2", "tagvalue2"), ("tagname3", "tagvalue3"), ("tag name4", "tag value4")]
I have a solution, but it only works if the input contains quotes for each item: "tagname1", "tagvalue1" ...
import ast
ast.literal_eval(input_string)
In my case, I get: ValueError: malformed string
Any solution to make it work (see also spaces)?
source to share
Try a different approach with regular expressions:
>>> import re
>>> s = '(tagname1, tagvalue1),(tagname2,tagvalue2), ( tagname3, tagvalue3 ), (tag name4,tag value4)'
>>> e = '\(\s?(.*?)\s?,\s?(.*?)\s?\)'
>>> re.findall(e, s)
[('tagname1', 'tagvalue1'), ('tagname2', 'tagvalue2'), ('tagname3', 'tagvalue3'), ('tag name4', 'tag value4')]
source to share
An alternative approach to what Burkhan suggested is using the power of backlinks. You can read more about backlinks here .
import re
# Input string
string = '(tagname1, tagvalue1),(tagname2,tagvalue2), ( tagname3, tagvalue3 ), (tag name4,tag value4)'
# Regular expression pattern
pattern = re.compile(r"\(([a-z0-9 ]+), ?([a-z0-9 ]+)\)", re.I)
list_of_tupples = []
for matched_object in pattern.finditer(string):
list_of_tupples.append((matched_object.group(1), matched_object.group(2)))
You can see a regex demo here .
Note:
I also used usage pattern.finditter()
because this way you can iterate over all template results in the text. From re.finditer documentation:
re.finditer (pattern, string, flags = 0) Returns an iterator yielding MatchObject examples for all matching matches for RE pattern in string. The string is scanned from left to right and matches are returned in the order found. Blank matches are included in the result unless they are related to the start of another match.
source to share
Here's a re.findall
variant that handles multiple spaces (non-word characters):
>>> import re
>>> s = '(tagname1, tagvalue1 ), ( tagname2 , tagvalue2 ), ( tagname3, tagvalue3 ), (tag name4, tag value4 )'
>>> re.findall('\(\W*([\w\s]*?)\W*,\W*([\w\s]*?)\W*\)', s)
[('tagname1', 'tagvalue1'), ('tagname2', 'tagvalue2'), ('tagname3', 'tagvalue3'), ('tag name4', 'tag value4')]
Pay attention to not greedy closing (repeating a classifier) after of word characters (including spaces) [\w\s]*?
. This guarantees all words for every tag name / value, but excluding all leading and trailing spaces . This is why it is "tag value4"
correctly captured above.
source to share
Another method without regex:
def string_to_tuples(s):
def tuple_strip(s): # Wrapper to pass to map
return s.strip(" ()")
sl = map(tuple_strip, s.split(","))
return zip(sl[::2], sl[1::2])
What gives:
>>> string_to_tuples('(tagname1, tagvalue1),(tagname2,tagvalue2), ( tagname3, tagvalue3 ), (tag name4,tag value4)')
[('tagname1', 'tagvalue1'), ('tagname2', 'tagvalue2'), ('tagname3', 'tagvalue3'), ('tag name4', 'tag value4')]
and will work as long as tags cannot start or end with
(space), (
or )
not include any ,
s.
source to share