XML AND DATA MANIPULATION

The following are some tasks that can be done by processing the XML document representing this lesson:

  • Build a table of contents for the lesson by extracting the pagetitle attributes from the <page> elements.
  • Reformat the document as a single HTML page.
  • Reformat the document as a series of HTML pages, one per <page> element.
  • Reformat the document in the widely used PDF (Portable Document Format).
  • Reformat the document as a series of small pages in WML (Wireless Markup Language), used by WAP-enabled cell phones.
  • Store each <page> element in an XML enabled database.

Libraries of code for processing XML documents exist in many computer languages. There are basically two approaches -- parsing into separate tags and parts of elements, and building objects that contain entire elements.

In the separate parts approach, a parsing program takes the document apart at the points where tags appear and hands the programmer the separate parts in the order of the original document. Your code has to decide what to do with each part of the document on the fly. This approach is particularly good if your program has to extract a small chunk out of the whole document; for example, extracting the pagetitle attributes to build a table of contents.

In the building objects approach, your program is handed an object that contains the whole document, organized so you can rapidly locate any part of it. This is essentially the document object model approach used in JavaScript manipulation of a Web page. It's handy when your program has to access various parts of the document repeatedly; for example, when creating and editing.

XML and the Future of Programming

It's safe to say that in the future, almost all programmers need to know the basics of processing XML, and almost all creators of content for the Web need to know the basics of creating XML documents.

Web Services

If you pay any attention to the business news, you know that something called Web Services is a hot topic. The idea with a Web Service is that a company provides a special sort of Web server that can respond automatically to requests for particular kinds of data. Both the request and the response are formatted as XML documents that can be read and interpreted by programs.

For example, suppose vendors of auto parts provide their entire catalog as Web Services. You could have a program that automatically searches all vendors for the best price and availability for a particular part. By using XML, it is easy for both humans and programs to understand the process.

XML versus EDI

Corporations have been using EDI (Electronic Data Interchange) formats to send documents such as bids, orders, and invoices back and forth for years. The big problem with EDI formats has been that because they were created when transmission speeds were slow and long distance connectivity was expensive, data gets transmitted in a compact form using esoteric codes that can't be read by humans. Now that connectivity is cheap and transmission rapid, it makes more sense to use human-readable XML formats.

XML-Enabled Office Applications

Microsoft has announced that the applications in the 2003 generation of Microsoft Office will be capable of reading, writing, and processing XML documents, abandoning the previous proprietary formats. This move by Microsoft is necessary because more and more businesses are using XML as a standard for document storage and interchange. Having corporate data in XML format makes it easy to communicate with your customers through Web Services or to generate the content of your corporate Web site.

The Open Office project, which provides free programs for typical office productivity applications, provides the option of using XML for documents. For users of Linux and those who don't want to pay Microsoft's licensing fees, the Open Office project is a viable alternative.

Moving On

This lesson introduced you to features of the markup languages HTML and XML, and showed how markup languages are intimately involved with today's programming tools. You also got an indication of how marked up documents can be treated as objects. To understand modern programming, you have to know something about markup languages. The next lesson delves into storing data in memory as well file manipulation and memory management.

Before you move on, be sure to complete the assignment and quiz for this lesson. Don't forget to drop by the Message Board to see what your fellow students have to say.