QuickStart to ElementTree: Manipulating XML in Python

 

QuickStart to ElementTree: Manipulating XML in Python


xml and python logos

Learn by example: a quick step-by-step tutorial on main ElementTree features. Read, find element(s), create, iterate and save to XML.


The xml.etree.ElementTree module implements a simple and efficient API for parsing and creating XML data in Python.

https://docs.python.org/3/library/xml.etree.elementtree.html

1. Import module

import xml.etree.ElementTree as ET

2. Read XML from file

# Download a XML fileimport urllib.request
import urllib.error
URL_1 = 'https://gist.githubusercontent.com/pjbelo/c4ddfad14234d9d6b7d746ff17df12ed/raw/6f454a31073767e61cab17b497e1d56704819e27/top10movies.xml'try:
with urllib.request.urlopen(URL_1) as f:
content = f.read()
except urllib.error.URLError as e:
print(e.reason)
open('top10movies.xml', 'wb').write(content)# Read and parse a xml file
filename = 'top10movies.xml'
tree = ET.parse(filename)
root = tree.getroot()

Let’s see what’s inside our file.

dump writes an element tree or element structure to sys.stdout. This function should be used for debugging only.

ET.dump(root)

3. Read XML from URL and string

There is no function/method to read from URL. So we must use Python resources (urlib) to read the file from URL and decode the content into a string.

# Read from URL (URL to string) and decode to stringtry:
with urllib.request.urlopen(URL_1) as f:
doc = f.read().decode('utf-8')
except urllib.error.URLError as e:
print(e.reason)

and now we read the string and parse it using fromstring.

# Read from string
root = ET.fromstring(doc)
# ElementTree wrapper class. This class represents an entire element hierarchy, and adds some extra support for serialization to and from standard XML.
tree = ET.ElementTree(root)

Let’s check the content:

ET.dump(root)

4. Find the first element

find finds the first subelement matching matchmatch may be a tag name or a path. Returns an element instance or None.

# find first movie
movie = root.find('movie')
# print movie title
title = movie.find('title').text
print(title)

5. Find a set of elements

# find all movies
movies = root.findall('movie')
print('number of movies:', len(movies))
# get third movie title
title = movies[2].find('title')
print(title.text)
# using XPATH - find all movies from 1994
m = root.findall(".//movie[year='1994']")
for i in m:
ET.dump(i)
# print all titles
for movie in movies:
print(movie.find('title').text)

6. Iterate

# Iterate trough all elements, print tag and value (text)
for el in root.iter():
print(el.tag,':', el.text)

7. Create new element

new_movie = ET.Element('movie')

Insert a subelement

new_movie_year = ET.SubElement(new_movie, 'year')

Insert another subelement in a different way: create a new element (title) and then append it to the parent (movie).

new_movie_title = ET.Element('title')
new_movie.append(new_movie_title)

Set the values (text) for the created elements

new_movie_title.text = 'The Greatest New Movie'
new_movie_year.text = '2020'

And append the new movie to the root element

root.append(new_movie)

Let’s check the complete tree. Our new movie should appear at the end.

ET.dump(root)

8. Save to file

Write the element tree to a file, as XML.

file is a file name, or a file object opened for writing. the default output encoding is US-ASCII.

tree.write('top11movies.xml', encoding='utf-8')

I hope this article can be useful for you.

You can also check the Google Colab and the Github Gist.

Images:
XML logo: ™/®The World Wide Web Consortium (W3C), Public domain, via Wikimedia Commons
Python logo: www.python.orgGPL , via Wikimedia Commons


Comments