Python Parse XML to JSON

I am currently working with interesting XML answers. Basically the XML I am getting is nested, but it reads as a CSV file. Example:

xml = <?xml version="1.0" encoding="ISO-8859-1"?>
<ThisDocument protocol="OCI" xmlns="C" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<sessionId xmlns="">29348u29!!4nthisSucks!==</sessionId>
  <command echo="" xsi:type="GroupGetListInServiceProviderResponse" xmlns="">
    <groupTable>
      <colHeading>Group Id</colHeading>
      <colHeading>Group Name</colHeading>
      <colHeading>User Limit</colHeading>
      <row>
        <col>LRB7905</col>
        <col>Test1</col>
        <col>25</col>
      </row>
      <row>
        <col>LRB9294</col>
        <col>Test2</col>
        <col>100</col>
      </row>
      <row>
        <col>LRB8270</col>
        <col>Test3</col>
        <col>10</col>
      </row>
      <row>
        <col>LRB8212</col>
        <col>Test4</col>
        <col>25</col>
      </row>
      <row>
        <col>LRB8175</col>
        <col>Test5</col>
        <col>25</col>
      </row>
    </groupTable>
  </command>
</ThisDocument>

      

In the responses I get from the server in question, "colHeading" is the "key" and "col" for each "row" is a value. It seems like a simple structure to map, but I can't think of a "PYTHONIC" way to accomplish this task. Desired output:

{
  "groupTable": [
    {
        "Group ID": "LRB7905",
        "Group Name": "Test1",
        "User Limit": "25"
    },
    {
        "Group ID": "LRB9294",
        "Group Name": "Test2",
        "User Limit": "100"
    },
    {
        "Group ID": "LRB8270",
        "Group Name": "Test3",
        "User Limit": "10"
    },
    {
        "Group ID": "LRB8212",
        "Group Name": "Test4",
        "User Limit": "25"
    },
    {
        "Group ID": "LRB8175",
        "Group Name": "Test5",
        "User Limit": "25"
    }
  ]
}

      

The information I really need is in the "col" fields of the XML, and the number of colHeadings corresponds to the number of values ​​in each row. Until now, I could manipulate values ​​in CSV files, but ultimately I need to create JSON objects (dicts) with key, value pairs. I have used different libraries / modules, etc ... but the best approach I have come up with is to split colHeadings and Values ​​into two lists and then concatenate them.

The code so far:

xmlroot = ET.fromstring(xml)

headings =[]
values = []

def breakoutLists(xmlroot):
    for columnHeading in root.iter('colHeading'):
        headings.append(columnHeading.text)
    for column in root.iter('col'):
        values.append(column.text)
    return headings, values

breakoutLists(xmlroot)

zipped = dict(itertools.izip(values, itertools.cycle(headings)))
print zipped

      

This creates a dictionary, but ok values: keys

instead keys: values

.

I would appreciate any suggestions on the best approach to this task. Thanks in advance.

EDIT Thanks to Eric's help, I was able to complete my task!

groupResp = {'groupResponse': []}    
def breakoutLists(root):
    headings = [h.text for h in root.iter('colHeading')]

    return (
        {
            h: col.text
            for h, col in zip(headings, row.iter('col'))
        }
        for row in root.iter('row')
    )

data = list(breakoutLists(root))

for item in data:
    groupResp['groupResponse'].append(item)

print json.dumps(groupResp)

      

I can maybe clean it up a bit to add a dictionary during the initial function, but I'm happy!

+3


source to share


1 answer


Your code flattens the data, which is inconvenient - you have to iterate over the string objects



def breakoutLists(xmlroot):
    headings = [h.text for h in root.iter('colHeading')]

    return (
        {
            h: col.text
            for h, col in zip(headings, row.iter('column'))
        }
        for row in root.iter('row')
    )

data = list(breakoutLists(html))

      

+2


source







All Articles