QuickStart to ElementTree: Manipulating XML in Python
QuickStart to ElementTree: Manipulating XML in Python
Learn by example: a quick step-by-step tutorial on main ElementTree features. Read, find element(s), create, iterate and save to XML.
The xml.etree.ElementTree
module implements a simple and efficient API for parsing and creating XML data in Python.
https://docs.python.org/3/library/xml.etree.elementtree.html
1. Import module
import xml.etree.ElementTree as ET
2. Read XML from file
# Download a XML fileimport urllib.request
import urllib.errorURL_1 = 'https://gist.githubusercontent.com/pjbelo/c4ddfad14234d9d6b7d746ff17df12ed/raw/6f454a31073767e61cab17b497e1d56704819e27/top10movies.xml'try:
with urllib.request.urlopen(URL_1) as f:
content = f.read()
except urllib.error.URLError as e:
print(e.reason)open('top10movies.xml', 'wb').write(content)# Read and parse a xml file
filename = 'top10movies.xml'
tree = ET.parse(filename)
root = tree.getroot()
Let’s see what’s inside our file.
dump
writes an element tree or element structure to sys.stdout
. This function should be used for debugging only.
ET.dump(root)
3. Read XML from URL and string
There is no function/method to read from URL. So we must use Python resources (urlib
) to read the file from URL and decode the content into a string.
# Read from URL (URL to string) and decode to stringtry:
with urllib.request.urlopen(URL_1) as f:
doc = f.read().decode('utf-8')
except urllib.error.URLError as e:
print(e.reason)
and now we read the string and parse it using fromstring
.
# Read from string
root = ET.fromstring(doc)# ElementTree wrapper class. This class represents an entire element hierarchy, and adds some extra support for serialization to and from standard XML.
tree = ET.ElementTree(root)
Let’s check the content:
ET.dump(root)
4. Find the first element
find
finds the first subelement matching match
. match
may be a tag name or a path. Returns an element instance or None.
# find first movie
movie = root.find('movie')
# print movie title
title = movie.find('title').text
print(title)
5. Find a set of elements
# find all movies
movies = root.findall('movie')
print('number of movies:', len(movies))# get third movie title
title = movies[2].find('title')
print(title.text)# using XPATH - find all movies from 1994
m = root.findall(".//movie[year='1994']")
for i in m:
ET.dump(i)# print all titles
for movie in movies:
print(movie.find('title').text)
6. Iterate
# Iterate trough all elements, print tag and value (text)
for el in root.iter():
print(el.tag,':', el.text)
7. Create new element
new_movie = ET.Element('movie')
Insert a subelement
new_movie_year = ET.SubElement(new_movie, 'year')
Insert another subelement in a different way: create a new element (title) and then append it to the parent (movie).
new_movie_title = ET.Element('title')
new_movie.append(new_movie_title)
Set the values (text) for the created elements
new_movie_title.text = 'The Greatest New Movie'
new_movie_year.text = '2020'
And append the new movie to the root
element
root.append(new_movie)
Let’s check the complete tree. Our new movie should appear at the end.
ET.dump(root)
8. Save to file
Write the element tree to a file, as XML.
file is a file name, or a file object opened for writing. the default output encoding is US-ASCII.
tree.write('top11movies.xml', encoding='utf-8')
I hope this article can be useful for you.
You can also check the Google Colab and the Github Gist.
Images:
XML logo: ™/®The World Wide Web Consortium (W3C), Public domain, via Wikimedia Commons
Python logo: www.python.org, GPL , via Wikimedia Commons
Comments
Post a Comment