XML BASICS

Similar to HTML, XML is a markup language that follows a standard defined by the W3C. The difference is that the standard only defines the syntax conventions, and a few basic rules. You can invent your own tag vocabulary to apply to your own particular problem. As long as you follow the rules, your XML formatted documents can be processed with standard toolkits that are available in many languages.

The XML standard was derived by simplifying SGML and eliminating much complexity and inconsistencies that make working with SGML a full-time job. The W3C sponsored the XML development effort because they realized the true potential of the Web for communication could only be reached if people had a flexible way to describe just about any kind of data. The use of XML, however, has spread well beyond the Internet and can now be found in many computing applications.

XML has the same requirement for the nesting of tag pairs that HTML does. An XML document that has tags nested correctly and conforms to a few other requirements is said to be well formed. Empty tags in XML are required to end with the /> character sequence, so a program reading XML can be sure that it doesn't have to look for a matching closing tag. In XHTML, the <br> tag must be written <br/> to comply.

All tag names in XML are case sensitive, so <br/> , <Br/> , and <BR/> are three different tags. In XHTML, all tags used like HTML tags must be in lowercase, so only <br/> would produce a text break in an XHTML compliant browser. Another name requirement is that general users must not use tag names beginning with x or X. These names are reserved for future expansion of the standard.

Although it is not required, a complete XML document typically begins with a statement in a specialized format that states the version of the standard it conforms to. The following is an example:

<?xml version="1.0"?>

Complete details on XML formatting are beyond the scope of this course. You can find the formal documents at the W3C Web site, and any search engine can help you locate XML tutorials.

In addition to the concept of well-formedness, XML documents can be valid . The content and arrangement of elements in a valid XML document conform to a specification in a Document Type Declaration (DTD) or an XML Schema. An XML document may be well formed and not valid, but it can never be valid and not well formed.

An Example XML Document

Suppose you want to store all of the text in one of these lessons as an XML document. The following is a skeleton of the document structure you could use, showing just the first page and leaving out the bulk text.

<?xml version="1.0"?>
<lesson number="5" author="Brogden">
<title>Markup Languages</title>
  <page pagetitle="What is a Markup Language?">
    <paragraph>A markup language is used to .etc.
    </paragraph>
   <!-- more paragraphs go here -->
  </page>
   <!-- more pages go here -->
</lesson>

That sure looks very similar to the HTML you might use to display the lesson, but instead of the standard HTML tags you get to make up your own. Why would you want to do that? The following section takes look at some possibilities.