![]() ![]() |
XML: What Do Help Authors Need to Know? Part 1By Scott BogganThis article contains links to sample pages that require an XML viewer, such as Microsoft Internet Explorer 5. Screen captures of the sample pages are also shown for the benefit of those readers who do not have an XML viewer. It has received more hype than the return of Star Wars and Austin Powers combined. If you believe what you hear, XMLor Extensible Markup Languagewill suddenly make the Web incredibly fast and oh-so-easy to use. XML has been called the ASCII of the future, an Esperanto for the computing world. Like most hype-ridden things, XML is very difficult to comprehend; ask 10 Web jockeys to explain it and you'll get 10 different stories. But if you sort through the hype, you'll see that XML has great promise for hypertext authors, especially for use in structured documents such as online Help. This two-part article will give you an overview of XML and its helper technologiesExtensible Stylesheet Language (XSL) and Xlinkand predict what this exciting new area holds for Help authors. This article contains samples that will require an XML viewer. I've focused on the XML implementation in Microsoft Internet Explorer 5, which is available at www.microsoft.com/windows/ie/default.htm. What is XML?XML is a set of rules that let you create custom tags describing the meaning and structure of a document. Such tags are often called "metadata," or data about data. To help put this in perspective, it's useful to think about how we use markup languages. As Help authors, most of us are very familiar with two prominent examplesHTML and RTFboth of which use markup tags to format a document. A review of HTML's 80-odd tags reveals that with but a few exceptions ( Let's look at a simple HTML document that lists a book. HTML tags such as
<!DOCTYPE HTML PUBLIC
"-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>A Book</title>
</head>
<body>
<h1>Book</h1>
<p><b>Title:</b>
The Autobiography of Benjamin Franklin <br>
<b>Author:</b> Benjamin Franklin<br>
<b>Price:</b> $8.99<br>
</body>
</html>
HTML is known as a specific markup language because it was developed for use with a specific processor: a Web browser. Likewise, RTF was specifically designed to format text in a word processor. Markup languages that format a document for a specific reader are fine in some cases, but are not very flexible. A big problem is that re-using the document on another system requires you to convert it and sometimes perform manual cleanup. Another problem with specific markup languages is that their tag set is not extensible. In the case of HTML, this limitation forces authors to find a workaround or wait for a standards body like the World Wide Web Consortium (W3C) to invent a new tag. Consider an example from the world of Microsoft HTML Help: wouldn't it be easier for authors if there were a simple HTML tag for adding an A-keyword? Instead, Microsoft implemented A-keywords using awkward ActiveX Finally, specific markup languages make it difficult for software developers to write programs that process data. We humans quickly recognize our HTML document as a book, but a computer program won't have any idea what it describes. Once our data is in XML, Help applications can do a much better job of delivering our content. This will not only provide the user with more targeted information but also simplify the authoring process. If specific markup languages describe the presentation of a document, generalized markup languages take a different approach and use tags to describe a document's structure or meaning. The most popular example is the Standard Generalized Markup Language, or SGML. SGML is powerful but complicated, so in 1996 Jon Bosak of Sun Microsystems formed a (W3C) working group to marry the flexibility of SGML with the simplicity of HTMLin effect, to create an SGML "lite." Now let's look at our book listing example in a generalized markup language. Not to get into the syntax details just yet, but here's what it might look like in XML.
<?xml version="1.0"?>
<BOOK>
<TITLE>The Autobiography
of Benjamin Franklin</TITLE>
<AUTHOR>
<FIRST-NAME>Benjamin</FIRST-NAME>
<LAST-NAME>Franklin</LAST-NAME>
</AUTHOR>
<PRICE>8.99</PRICE>
</BOOK>
Because the tags describe the meaning (or "semantics") of the data, this document can be read by virtually any application. This ability to share data in a standard format will enable the next generation of Web-based applications. And unlike most file formats, even humans can read it! No wonder that in just a short time, XML has captivated the attention of many and it is now a W3C "recommendation" (see www.w3.org/xml Domain-Specific Markup LanguagesBecause it allows authors to create their own tags, XML has spawned a variety of "domain-specific" markup languages: tag sets that are unique to a particular profession. Before considering how XML might be used in Help, let's look at how other industries are using it.
Potential Uses for XML in HelpAs you can see, plenty of other industries have latched on to XML as an answer to their publishing needs. How might we use XML in Help? Here are five possibilities I've dreamed up; certainly not all of these ideas will bear fruit, but perhaps a few will get your wheels turning.
What Does XML Look Like?Let's look at a more expanded version of our XML book sample. Once again, notice that each tag describes the data it contains; for example,
<?xml version="1.0"?>
<?xml-stylesheet href="books2.xsl"
type="text/xsl" ?>
<!-- This file represents a fragment
of a book store inventory database -->
<BOOKSTORE>
<BOOK GENRE="autobiography">
<TITLE>The Autobiography
of Benjamin Franklin</TITLE>
<AUTHOR>
<FIRST-NAME>Benjamin</FIRST-NAME>
<LAST-NAME>Franklin</LAST-NAME>
</AUTHOR>
<PRICE>8.99</PRICE>
</BOOK>
<BOOK GENRE="novel">
<TITLE>The Confidence Man</TITLE>
<AUTHOR>
<FIRST-NAME>Herman</FIRST-NAME>
<LAST-NAME>Melville</LAST-NAME>
</AUTHOR>
<PRICE>11.99</PRICE>
</BOOK>
<BOOK GENRE="philosophy">
<TITLE>The Gorgias</TITLE>
<AUTHOR>
<NAME>Plato</NAME>
</AUTHOR>
<PRICE>9.99</PRICE>
</BOOK>
</BOOKSTORE>
A programmer might look at our XML document and think of it as a tree. The root element is the bookstore element, which contains elements for genre, title, author, and price. The tree structure begins at the root and gradually branches out to the other elements.
This tree structure isn't important to us as Help authors or to our readers, but it makes it easy for developers to write software programs that process XML documents. Also, thinking of your XML documents in terms of a tree structure should highlight XML's appeal for producing structured documents such as online reference manuals. An Overview of XML SyntaxLet's briefly look at some of the rules for creating XML documents. First of all, you'll notice that our sample begins with a declaration: If you're familiar with HTML, XML's syntax has a few twists that will take some getting used to.
Another difference between HTML and XML is that while Web browsers are very forgiving about bad HTML code, XML processors are not. A single invalid tag will result in an error message and prevent your document from appearing at all, as in the following example:
If you code your HTML pages manually, now's a good time to start paying attention to make sure that you close tags whenever possible, add quotation marks around your attributes, and use consistent capitalization. Adding an XSL StylesheetAs it sits, our XML document is very boring: opening it in Internet Explorer 5 displays its structure, a view that you'd never want to inflict on your readers. (Netscape users are currently unable to view this document, but the upcoming "Gecko" release is scheduled to support XML. For more information, see www.mozilla.org. Click here to view the XML document. For those readers without an XML processor, the XML document appears in Internet Explorer 5 as:
Linking to an XSL stylesheet lets us format our document into something more usable. We won't get into the particulars of XSL just yet, but we'll point out a few things about our stylesheet. First, you'll discover that the stylesheet begins with an XML declaration, since XSL stylesheets are themselves XML documents.
<?xml version='1.0'?>
<body bgcolor="ivory"
xmlns:xsl="http://www.w3.org/TR/WD-xsl">
<style> {font-size: medium; font-family:
Verdana;} </style>
<table border="2"
cellpadding="5">
<tr>
<th>Author</th>
<th>Title</th>
<th>Price</th>
</tr>
<xsl:for-each select="BOOKSTORE/BOOK">
<tr>
<td><xsl:value-of select="TITLE"/></td>
<td><xsl:value-of select="AUTHOR"/></td>
<td><xsl:value-of select="PRICE"/></td>
</tr>
</xsl:for-each>
</table>
</body>
Notice also that our stylesheet contains HTML tags such as Here's what our stylesheet-enabled XML document looks like. For those readers without an XML processor, the XML document appears in Internet Explorer 5 as:
Coming in Part 2In Part 2 of our article, we'll explore XML syntax in greater detail. We'll also examine tools for creating XML documents and look at the power of XSL and XLink in transforming XML documents. Scott Boggan is co-author of the award-winning Developing Online Help for Windows and a forthcoming book on HTML Help. He is a popular speaker at numerous conferences throughout the world and also teaches through the University of Washington. Scott is principal of HelpCraft (www.helpcraft.com
|