Parse the string as a list of tuples

Entrance: '(tagname1, tagvalue1),(tagname2,tagvalue2), ( tagname3, tagvalue3 ), (tag name4,tag value4)'

Output: [("tagname1", "tagvalue1"), ("tagname2", "tagvalue2"), ("tagname3", "tagvalue3"), ("tag name4", "tag value4")]

I have a solution, but it only works if the input contains quotes for each item: "tagname1", "tagvalue1" ...

import ast
ast.literal_eval(input_string)

      

In my case, I get: ValueError: malformed string

Any solution to make it work (see also spaces)?

+3


source to share


4 answers


Try a different approach with regular expressions:



>>> import re
>>> s = '(tagname1, tagvalue1),(tagname2,tagvalue2), ( tagname3, tagvalue3 ), (tag name4,tag value4)'
>>> e = '\(\s?(.*?)\s?,\s?(.*?)\s?\)'
>>> re.findall(e, s)
[('tagname1', 'tagvalue1'), ('tagname2', 'tagvalue2'), ('tagname3', 'tagvalue3'), ('tag name4', 'tag value4')]

      

+11


source


An alternative approach to what Burkhan suggested is using the power of backlinks. You can read more about backlinks here .

import re

# Input string
string = '(tagname1, tagvalue1),(tagname2,tagvalue2), ( tagname3, tagvalue3 ), (tag name4,tag value4)'

# Regular expression pattern 
pattern = re.compile(r"\(([a-z0-9 ]+), ?([a-z0-9 ]+)\)", re.I)

list_of_tupples = []
for matched_object in pattern.finditer(string):
    list_of_tupples.append((matched_object.group(1), matched_object.group(2)))

      

You can see a regex demo here .



Note:

I also used usage pattern.finditter()

because this way you can iterate over all template results in the text. From re.finditer documentation:

re.finditer (pattern, string, flags = 0) Returns an iterator yielding MatchObject examples for all matching matches for RE pattern in string. The string is scanned from left to right and matches are returned in the order found. Blank matches are included in the result unless they are related to the start of another match.

+1


source


Here's a re.findall

variant that handles multiple spaces (non-word characters):

>>> import re
>>> s = '(tagname1, tagvalue1  ),  ( tagname2 ,   tagvalue2   ), (      tagname3, tagvalue3 ), (tag name4,   tag value4   )'
>>> re.findall('\(\W*([\w\s]*?)\W*,\W*([\w\s]*?)\W*\)', s)
[('tagname1', 'tagvalue1'), ('tagname2', 'tagvalue2'), ('tagname3', 'tagvalue3'), ('tag name4', 'tag value4')]

      

Pay attention to not greedy closing (repeating a classifier) after of word characters (including spaces) [\w\s]*?

. This guarantees all words for every tag name / value, but excluding all leading and trailing spaces . This is why it is "tag value4"

correctly captured above.

+1


source


Another method without regex:

def string_to_tuples(s):
    def tuple_strip(s):  # Wrapper to pass to map
        return s.strip(" ()")

    sl = map(tuple_strip, s.split(","))
    return zip(sl[::2], sl[1::2])

      

What gives:

>>> string_to_tuples('(tagname1, tagvalue1),(tagname2,tagvalue2), ( tagname3, tagvalue3 ), (tag name4,tag value4)')
[('tagname1', 'tagvalue1'), ('tagname2', 'tagvalue2'), ('tagname3', 'tagvalue3'), ('tag name4', 'tag value4')]

      

and will work as long as tags cannot start or end with

(space), (

or )

not include any ,

s.

+1


source







All Articles