QuickStart to ElementTree: Manipulating XML in Python
QuickStart to ElementTree: Manipulating XML in Python
Learn by example: a quick step-by-step tutorial on main ElementTree features. Read, find element(s), create, iterate and save to XML.
The xml.etree.ElementTree module implements a simple and efficient API for parsing and creating XML data in Python.
https://docs.python.org/3/library/xml.etree.elementtree.html
1. Import module
import xml.etree.ElementTree as ET2. Read XML from file
# Download a XML fileimport urllib.request
import urllib.errorURL_1 = 'https://gist.githubusercontent.com/pjbelo/c4ddfad14234d9d6b7d746ff17df12ed/raw/6f454a31073767e61cab17b497e1d56704819e27/top10movies.xml'try:
with urllib.request.urlopen(URL_1) as f:
content = f.read()
except urllib.error.URLError as e:
print(e.reason)open('top10movies.xml', 'wb').write(content)# Read and parse a xml file
filename = 'top10movies.xml'
tree = ET.parse(filename)
root = tree.getroot()
Let’s see what’s inside our file.
dump writes an element tree or element structure to sys.stdout. This function should be used for debugging only.
ET.dump(root)3. Read XML from URL and string
There is no function/method to read from URL. So we must use Python resources (urlib) to read the file from URL and decode the content into a string.
# Read from URL (URL to string) and decode to stringtry:
with urllib.request.urlopen(URL_1) as f:
doc = f.read().decode('utf-8')
except urllib.error.URLError as e:
print(e.reason)
and now we read the string and parse it using fromstring.
# Read from string
root = ET.fromstring(doc)# ElementTree wrapper class. This class represents an entire element hierarchy, and adds some extra support for serialization to and from standard XML.
tree = ET.ElementTree(root)
Let’s check the content:
ET.dump(root)4. Find the first element
find finds the first subelement matching match. match may be a tag name or a path. Returns an element instance or None.
# find first movie
movie = root.find('movie')
# print movie title
title = movie.find('title').text
print(title)5. Find a set of elements
# find all movies
movies = root.findall('movie')
print('number of movies:', len(movies))# get third movie title
title = movies[2].find('title')
print(title.text)# using XPATH - find all movies from 1994
m = root.findall(".//movie[year='1994']")
for i in m:
ET.dump(i)# print all titles
for movie in movies:
print(movie.find('title').text)
6. Iterate
# Iterate trough all elements, print tag and value (text)
for el in root.iter():
print(el.tag,':', el.text)7. Create new element
new_movie = ET.Element('movie')Insert a subelement
new_movie_year = ET.SubElement(new_movie, 'year')Insert another subelement in a different way: create a new element (title) and then append it to the parent (movie).
new_movie_title = ET.Element('title')
new_movie.append(new_movie_title)Set the values (text) for the created elements
new_movie_title.text = 'The Greatest New Movie'
new_movie_year.text = '2020'And append the new movie to the root element
root.append(new_movie)Let’s check the complete tree. Our new movie should appear at the end.
ET.dump(root)8. Save to file
Write the element tree to a file, as XML.
file is a file name, or a file object opened for writing. the default output encoding is US-ASCII.
tree.write('top11movies.xml', encoding='utf-8')I hope this article can be useful for you.
You can also check the Google Colab and the Github Gist.
Images:
XML logo: ™/®The World Wide Web Consortium (W3C), Public domain, via Wikimedia Commons
Python logo: www.python.org, GPL , via Wikimedia Commons

Comments
Post a Comment