edit MOOC index

Working with XML

Understand the Domain Object Model and the DOM tree
Understand that HTML is just a special dialect of XML
Understand the relationship between HTML and XML

In the previous lesson we have seen several requirements that call for separation of content proper and structural or layout information. In this lesson, I will show you how to work with the previous XML file such that it becomes easy to generate structure and/or layout.

Let us first rephrase the core ideas of XML:

Markup pieces of content by understandable "parentheses", e.g. <artist>Bobby McFerring</artist>, "<artist></artist>" is markup, "Bobby McFerring" is content (problem 2)
Allow new applications, e.g. multimedia display instead of text display, to use new markup (problem 3)
Allow for nesting of markup such that larger pieces of content may be treated in a meaningful way, e.g. "<record><artist>Bobby McFerring</artist><track>...</track><track>Koblenz</track>...</record>" (problem 2)
Simplify tools by having some simple rules (problem 4):
1. each opening tag like "<city>" requires a corresponding closing tag "</city>"
2. proper nesting of markup: "<record><artist>Bobby McFerring</artist><track>...</track><track>Koblenz</track></record>" is allowed; "<record><artist>Bobby McFerring</artist><track>...</track><track>Koblenz</record></track>" is disallowed
Each XML document comes with a preamble specifying the character enconding and XML version used, i.e. most often now: "<?xml version="1.0" encoding="UTF-8"?>" allowing for internationalization (problem 5)

An XML document that follows these rules is called "well-formed". You can check the well-formedness of a document by XML validators, such as http://www.w3schools.com/xml/xml_validator.asp

Implications that arise from these rules are that:

Properly nested Markup gives us a data structure: a DOM tree
DOM tree can be navigated recursively by different applications and repurposed for different types of devices and device sizes (problem 1)

Let's look back out our running example (unfortunately I have not found a DOM viewer without installation that shows the DOM tree correctly. Below is a pretty print of the XML code, but not a proper DOM tree view, my previous uploads in the place used http://software.hixie.ch/utilities/js/live-dom-viewer/ but it delivers a buggy tree, e.g. tracktitles not being correctly sorted beneath tracks. Here is the pretty print

And here is the buggy DOM Tree

[[1]]

Let us assume some simple rules that map our element names as follows (some of my wikientry is not displayed !!!)

from	to (long name)	to (short name)
<playlist>	<document>	<html>
</playlist>	</document>	</html>
<playlistTitle>	<heading>	<h1>
</playlistTitle>	</heading>	</h1>
<owner>	<emphasized>	<em>
</owner>	</emphasized>	</em>
<tracks>	<numberedList>	<ol>
</tracks>	</numberedList>	</ol>
<track>	<listItem>	<li>
</track>	</listItem>	</li>
<playDate>	<emphasized>	<em>
</playDate>	</emphasized>	</em>
<trackTitle>	<subheading>	<h3>
</trackTitle>	</subheading>	</h3>
<record>	-suppress-	-suppress-
</record>	-suppress-	-suppress-
<recordTitle>	<emphasized>	<em>
</recordTitle>	</emphasized>	</em>
<artist>	<strong>	<em>
</artist>	</strong>	</em>

Then executing a corresponding transformation gives us:

 <document>
	<heading>
		This is <emphasized>Rene Pickhardt</emphasized>'s playlist. 
	</heading>
	<numberedList>
		<listItem>
			<emphasized>2013-11-02 01:54 CET</emphasized>
			<subheading>Lords of the boards</subheading>
			on <emphasized>Proud like a God</emphasized> by <strong>Guano Apes</strong>
		</listItem>
		<listItem>
			<emphasized>2013-11-02 01:57 CET</emphasized>
			<subheading>Toxicity</subheading>
			on <emphasized>Toxicity</emphasized> by <strong>System of A Down</strong>
		</listItem>
		<listItem>
			<emphasized>2013-11-02 02:01 CET</emphasized>
			<subheading>B.Y.O.B</subheading>
			on <emphasized>Mezmerize</emphasized> by <strong>System of A Down</strong>
		</listItem>
	</numberedList>
 </document>

Or using our shorthand description:

 <html>
	<h1>
		This is <em>Rene Pickhardt</em>'s playlist.
	</h1>
	<ol>
 		<li>
 	 		<em>2013-11-02 01:54 CET</em>
 	 		<h3>Lords of the boards</h3> 
			on <em>Proud like a God</em> by <strong>Guano Apes</strong>
 		</li>
 		<li>
			<em>2013-11-02 01:57 CET</em>
			<h3>Toxicity</h3> 
			on <em>Toxicity</em> by <strong>System of A Down</strong>
		</li>
		<li>
			<em>2013-11-02 02:01 CET</em>
			<h3>B.Y.O.B</h3> 
			on <em>Mezmerize</em> by <strong>System of A Down</strong>
		</li>
	</ol>
 </html>

The shorthand notation is actually an example of HTML, a piece of Hypertext Markup Language as it is used in the Web. As you note this, you may also recognize that HTML, at least in its version of 4.0 and higher, is actually well-formed XML. Just like our description of records that we started with is well-formed XML. Hence, XML is not one language for structuring data and content, it is actually a meta-language that allows you to come up with infinitely many different languages for structuring data and content. In fact, you can even do it in other encodings than Latin characters, say Kanji or Arabic.

What you may have observed further is that the kind of mapping we have shown based on mapping elements onto new elements is awkward as it requires very strictly observing a certain order of XML elements in the source file. However, you can program a better transformation tool, use a generic tool like AWK or you can use existing tools for manipulating XML, such as XPath, XQuery, XSL (eXtensible Stylesheet Language) consisting of XSLT (eXtensible Stylesheet Language Transformation) and XSL FO (XSL Formatting Objects). We do not go into the details of all these tools as this would require a whole course of its own, but for you it is important to know that such languages and tools exist and you can dig out their description whenever you have a problem in this direction and build your solution based on standardized mechanisms.

The core lesson to be learned about XML here is that XML-based markup gives you a very flexible handle for:

selecting content
repurposing content
reformatting content

What have we not considered here?

Conformance of a particular XML document to a schema prescription, e.g. DTD https://en.wikipedia.org/wiki/Document_type_definition or XML Schema https://en.wikipedia.org/wiki/XML_schema
Many other (infinitely many) XML applications
Interactive XML formats
"Infinitely" long streams, i.e. streaming of video, audio, other data...

What we will show you next are some details about HTML and about formatting HTML pages with HTML and Cascading Style Sheets before we then go towards multimedia.

https://en.wikipedia.org/wiki/XML
Further references (not same as further reading!) http://www.w3.org/TR/xml/

	screen reader
	Braille display
	touch screen
	tablet
	mobile
	TV set
	Web crawler

	<tracktitle id="42">Aguas di Marco</tracktitle id="42">
	<tracktitle id="42"/>Aguas di Marco</tracktitle>
	<track><tracktitle>Aguas di Marco</track></track>
	<track><tracktitle>Aguas di Marco</track></tracktitle>
	<track/>
	<track><tracktitle>Aguas di Marco<trackartist>Aguas di Marco</track>

Web Science/Part1: Foundations of the web/Web content/Working with XML

Working with XML

Learning goals

Video

Script

Quiz

Other Quiz ideas[edit | edit source]

Further reading

Discussion

Navigation menu

	is only another representation of the XML text
	comes with an API to select parts of an XML document
	can be mapped bijectively to the XML text

Web Science/Part1: Foundations of the web/Web content/Working with XML

Working with XML

Learning goals

Video

Script

Quiz

Other Quiz ideas[edit | edit source]

Further reading

Discussion

Navigation menu

Search