Python XML parsing with ElementTree returns None
I am trying to parse this xml string using ElementTree in Python,
data stored as a string,
xml = '''<?xml version="1.0" encoding="utf-8"?>
<SearchResults xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Student>
<RollNumber>1</RollNumber>
<Name>Abel</Name>
<PhoneNumber>Not Included</PhoneNumber>
<Email>abel@hisschool.edu</Email>
<Grade>7</Grade>
</Student>
<Student>
<RollNumber>2</RollNumber>
<Name>Joseph</Name>
<PhoneNumber>Not Included</PhoneNumber>
<Email>joseph@hisschool.edu</Email>
<Grade>7</Grade>
</Student>
<Student>
<RollNumber>3</RollNumber>
<Name>Mike</Name>
<PhoneNumber>Not Included</PhoneNumber>
<Email>mike@hisschool.edu</Email>
<Grade>7</Grade>
</Student>
</SearchResults>'''
The code I used to parse this string as xml is
from xml.etree import ElementTree
xml = ElementTree.fromstring(xml)
results = xml.findall('Student')
for students in results:
for student in students:
print student.get('Name')
print results
outputs the results as Items,
[<Element 'Student' at 0x7feb615b4ad0>, <Element 'Student' at 0x7feb615b4c50>, <Element 'Student' at 0x7feb615b4e10>]
inside a for loop, print students
outputs the same
<Element 'Student' at 0x7fd722d88ad0>
<Element 'Student' at 0x7fd722d88c50>
<Element 'Student' at 0x7fd722d88e10>
Anyway, when I try to get the student name using print student.get('Name')
, the program returns None.
What I am trying to do is output the values ββfrom the xml for each tag and build a dict.
source to share
Here you have a double loop:
for students in results:
for student in students:
print student.get('Name')
students
- one element <Student>
. By repeating this, you end up with the individual elements contained within that element. Those contained elements ( <RollNumber>
, <Name>
etc.) have no attribute Name
.
The method .get()
is only available for accessing attributes, but you want to get the element <Name>
. Use .find()
either XPath expression instead :
for student in results:
name = student.find('Name')
if name is not None:
print name.text
or
for student_name in xml.findall('.//Student/Name'):
print name.text
source to share
If you are new to XML processing:
- lxml is a fast and powerful library for interacting with XML in python. The standard library is not fully supported
xpath
. - xpath is a query language for learning XML documents, it has a cool learning curve, but easy to get help on StackOverflow,
xpath
so useful that I started using JSON to XML when using the API just so I could write queriesxpath
instead of crazy dereferenced dereferences dictionaries.
from lxml import etree
from pprint import pprint
doc = etree.XML('''<?xml version="1.0" encoding="utf-8"?>
<SearchResults xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Student>
<RollNumber>1</RollNumber>
<Name>Abel</Name>
<PhoneNumber>Not Included</PhoneNumber>
<Email>abel@hisschool.edu</Email>
<Grade>7</Grade>
</Student>
<Student>
<RollNumber>2</RollNumber>
<Name>Joseph</Name>
<PhoneNumber>Not Included</PhoneNumber>
<Email>joseph@hisschool.edu</Email>
<Grade>7</Grade>
</Student>
<Student>
<RollNumber>3</RollNumber>
<Name>Mike</Name>
<PhoneNumber>Not Included</PhoneNumber>
<Email>mike@hisschool.edu</Email>
<Grade>7</Grade>
</Student>
</SearchResults>''')
def first(seq,default=None):
for item in seq:
return item
return default
def simple_children_to_dict(element):
result = {}
for child in element:
result[child.tag] = child.text
return result
def get_by_rollnumber(number,search_results):
student_element = first(search_results.xpath('Student[./RollNumber=$number]',number=number))
if student_element is None:
raise Exception("Student Number {0} not found".format(number))
return simple_children_to_dict(student_element)
def get_all_students(search_results):
students = []
for student_element in doc.xpath('Student'):
students.append(simple_children_to_dict(student_element))
return students
Then:
>>> pprint(get_by_rollnumber(2,doc))
{'Email': 'joseph@hisschool.edu',
'Grade': '7',
'Name': 'Joseph',
'PhoneNumber': 'Not Included',
'RollNumber': '2'}
>>>
>>> pprint(get_all_students(doc))
[{'Email': 'abel@hisschool.edu',
'Grade': '7',
'Name': 'Abel',
'PhoneNumber': 'Not Included',
'RollNumber': '1'},
{'Email': 'joseph@hisschool.edu',
'Grade': '7',
'Name': 'Joseph',
'PhoneNumber': 'Not Included',
'RollNumber': '2'},
{'Email': 'mike@hisschool.edu',
'Grade': '7',
'Name': 'Mike',
'PhoneNumber': 'Not Included',
'RollNumber': '3'}]
Subtleties:
-
xpath
queries usually return a result set because most queries can have more than one match. Hence, a helper function is usedfirst
.
source to share