Reading XML is the cornerstone of handling XML in any application. If your application is unable to read XML, then you won't be able to do much. There are several ways to read XML, and this chapter will give you an insight into what methods are available to you.
KeywordsParent Property Extensible Markup Root Property Root Element XPath Query
Reading XML is the cornerstone of handling XML in any application. If your application is unable to read XML, then you won’t be able to do much. There are several ways to read XML, and this chapter will give you an insight into what methods are available to you.
The XmlDocument class was the first way to handle reading and writing XML using the .NET Framework with C# and is included in the System.Xml namespace. With XmlDocument you can not only read XML but also manipulate and write XML, which will be covered in later chapters.
To start we will need to have an XML document to demostrate with. The following example is a small database of books and movies in our imaginary library:
To be able to do anything with this XML document, we first need to load the XML into an XmlDocument instance. There are two ways to do this: by file and by string.
Loading XML from a File
Loading XML from a String
Once we have a file loaded, we can begin reading from the contents, which can be done in multiple ways.
Searching with XPath
Think of using XPath as having random read access to the XML document. It can be used to retrieve a single node or a collection of nodes. When wanting to select multiple nodes, it requires the use of the SelectNodes method . Going back to the library.xml example, we can use SelectNodes to return all the books. To get at the books, however, the proper XPath query is needed.
Starting at the top level, there is the library node, so the XPath must start with library. From there, the books child node contains all of the books that are in the library, which means that the books node is going to be appended to the XPath to give us library/books. That XPath alone will give the books node, including all of the children, but that is one step above what we want to get at so we append book to the XPath query to finally give us the query library/books/book.
You’ll notice that the output above returns the first book that is under the books element. SelectSingleNode will take whatever element it sees first and return that for the given XPath expression. There are ways to get a specific node though using a query such as the following:
The query will return the same book element that we saw above, but in this example it is using a filter to find the book that has the title To Kill a Mockingbird.
Search Using Attributes
Up until now, there has been a focus solely on searching based on elements. There is more that can be searched on than just elements. For instance, let’s say somebody asks us to find all books that are checked out. It is an easy feat if we just use SelectNodes with XPath.
Explicitly Finding Movies and Books That Are Not Checked Out
The above code uses the attribute notation to find both movies and books that are not checked out. In order to search for both, it is necessary to add a pipe between the two XPath queries. You can think of it like the double pipe OR statement in C#. This notation is useful when there are many different children of the root element and you want to filter it down. If we had CDs in this library, we would be able to use the above code to only find books and movies and ignore the CDs.
The problem with the above code is that it is very verbose, especially if there are many children of the node that you are searching under. There is an easier way to find this information without having to type out every single possibility and that is to use the star operator and recursive search in XPath.
Using the Recursive Search and Star
The above code contains two different shortcuts. First, there is the double slash. This is a way to get all children recursively. For this library XML file, this means it would look not only at the books and movies elements but also the children of those nodes. This gives us access to all elements under the library node. Be careful when using the double slash in your XPath queries as it will select all elements regardless of the type of element. If any other elements were added to the library element, those elements would be included in the results as well.
The second shortcut that is in the code above is the star. The star is a shortcut that ignores the element type. What that means is that it treats an element of book the same as the movie element. It is extremely useful when you have several different children or grandchildren under a single element and want to search all of them. We don’t have to use the | operator to combine multiple queries, which drastically cuts down on the amount of code that is needed.
The previous example used attributes to search, but it is also possible to inspect what attributes are on an element by using the Attributes property. Attributes will return an XmlAttributeCollection, which can be iterated on.
Up until now, there only has been straight XML with no namespaces required. While this may happen when you have full control of the XML, chances are that you will encounter namespaces and will need to know how to handle them when it comes to using the XmlDocument class. Namespaces are useful when it comes to preventing collisions with names and so the XmlDocument must take that into consideration.
As we have added a new namespace to our library.xml example, we need to load the namespace into our XmlDocument:
Adding a Namespace to the XmlDocument before Loading XML
In order to add a namespace to an XmlDocument, it is necessary to use the name table, which is of type XmlNameTable, from the XmlDocument. This is the class that handles keeping track of all of the namespaces. Once we have that, an XmlNamespaceManager needs to be created as that is what will allow us to add or remove namespaces. Being able to remove a namespace is just using the RemoveNamespace method of XmlNamespaceManager, which takes the same arguments as AddNamespace.
The XPathDocument class is similar to the XmlDocument class, but the difference is that, unlike XmlDocument, XPathDocument is read only. It is excellent for reading XML when you have no intention of modifying the data. The XPathDocument class relies on two separate classes to do the actual querying: XPathNavigator and XPathNodeIterator.
The XPathNavigator class is what is used to actually query the XML. It allows the use of XPath queries or generic methods that allow you to get at elements and attributes without having to know any XPath.
In order to start using the XPathNavigator, there are two steps involved. An XPathDocument needs to be instantiated and then use that instance to create the XPathNavigator. Once the XPathNavigator is instantiated, it will open up the ability to query the XML.
Create XPathDocument and XPathNavigator Instances
The XPathNavigator is only the first step into being able to read and query XML. To query the document, one must create an XPathNodeIterator. The XPathNodeIterator will provide access to all the elements under the root element.
Iterating on the Children of the Root Element
Reading with XmlReader
XmlReader is different from the other XML handling classes that we have used as it is stream-based. What that means for us is that it will only operate going forward and prevents querying. XmlReader is a good option if you are handling large XML files and don’t care about random access to the elements in the XML document. Because XmlReader uses streams to load the XML document, you can read in files that are too large for the XmlDocument. XmlReaders require much more setup than the other classes we’ve looked at prior, but because of their ability to handle large data it is more than worth it. We can start with a basic example:
Creating an XmlReader and Reading the Library File
Notice that there is a lot of blank space as well as only the values. The reason for that is the way that the XmlReader handles the underlying XML stream. Remember it is a forward-only stream. Since we used the value property, it is only going to give us the values of elements that have one. But why the space? Simple. Each one of those spaces represents an element that did not have a value. This is where XmlReader becomes more complicated than other methods of reading; it doesn’t differentiate the type of XML that is being read. The reader does store the type information, but we must manually check it. If we want to get at the type, we can use the XmlNodeType enumeration. We have only been focusing on elements and attributes, so let’s create a reader that will handle both.
Handling Both Elements and Values
Remember when I said there was a reason you would want both? That reason is when you want to write the XML, you are reading from the XmlReader stream.
Using LINQ to XML
XmlDocument was introduced in .NET 2.0 and remained the only way to handle XML until .NET 3.5 was released. In the .NET 3.5 release, we saw the introduction of LINQ (Language Integrated Query) and the advent of LINQ to XML. LINQ to XML is now the preferred method of handling XML, so let’s dive in using the library XML example from the last section.
XDocument is LINQ to XML’s equivelent to the XmlDocument. The nature of LINQ gives XDocument a whole different feel, but don’t worry because you can still fall back on XPath. For instance, the following code will instantiate an XDocument as well as get the values of every book.
Document vs. Document.Root : Getting to the Root Document
There are two ways to get at the root element using XDocument: using the instantiated XDocument class directly or using the Root property on the instantiated class. There is a subtle difference and that is the Root property is an XElement instead of an XDocument. Because of that, you are able to use all of the methods that you would normally get with XElement but by using the root element directly.
We used the XDocument in the example above because the XDocument allows access to the descendants of the root element, which is what we needed in order to traverse the XML structure and find the book elements. If we had used the Root property, it would have allowed us to not only get descendants but to add elements as well as get to the attributes of the root element.
Searching for Attributes
We saw how to search using attributes when handling XML with XmlDocument and how it required XPath to get at the attributes relatively easily; that same task becomes much easier with XML to LINQ. Let’s say we want to find all the elements that have a checkedout attribute. We can use our LINQ expressions to find all elements that have the checkedout attribute.
There is a drawback to using LINQ, which is that it is much more verbose than using straight XPath. With XDocument we could still use XPath to get the same results. Where LINQ shines is when you have more complex queries that may be difficult to read in XPath or require more in-depth knowledge of XPath. For instance, we could look for all movie and book elements, which we did in the previous chapter by using LINQ instead of XPath.
Or we could retrieve movies that were released between certain years. For instance, let’s look for movies that were released between 1990 and 2016. That would be an incredibly complex XPath query that would be horrible to maintain in the future. On the other hand, by using LINQ and XDocument it becomes a simple where clause to filter out the unwanted titles. We can do all that in the following code:
Now there are many things in the above code that may need an explanation because we have not seen it before or it may not be intuitive. First off, we are searching for the year element instead of directly for a movie element. This allows us to easily get at the value of that element, which is the year the movie was made, instead of having to filter down even more based on the movie element’s children. The only reason why that method is feasible is because of the Parent property. The Parent property will return the XElement of the parent of the current XElement. In this case, the element movie is the parent of the title element, so we can get back up to the movie element after we are done filtering. We are also doing some parsing of the year into integer type; however, this is not recommended in production code as this could throw exceptions. I am doing this here for demonstration purposes.
LINQ to XML allows us to use all of the extension methods that come with LINQ, which gives us access to the Select method . This method can allow us to transform our results into a different class or anonymous class. We have a way to get to the books in our XML library, but we haven’t done anything with them yet. That is about to change. We are going to capture the information about the books and put them in a C# class called Book that is defined below.
The above code filters the XML document down to just book elements and then transforms the title and author into a new Book class that we had defined. Notice that the Element method is used instead of Descendants because we know that the title and author elements are children of the book element, rather than grandchildren, so we don’t need to go any deeper.
Using XPath with XDocument
As mentioned before, it is possible to use XPath with XDocument, though not recommended. For instance, we could use XPath to get a list of all movies:
One thing to note is that this does not return XmlNode like the XmlDocument does, but instead returns XElement. There is also the XPathSelectElement method , which is equivalent to the XmlDocument SelectSingleNode. Just like SelectSingleNode, if you use an XPath query that will return multiple results, the first element is only returned.
In the end, LINQ to XML allows for easy access to querying data from an XML document in a more consistent format as well.