Chapter 3. Getting the tools working

Table of Contents

Installing the DocBook DTD
Finding the DTD during processing
Character entities
Validation
Installing the DocBook stylesheets
Installing an XSLT processor
Installing xsltproc
Using xsltproc
Installing Saxon
Using Saxon
Installing Xalan
Using Xalan
Installing an XSL-FO processor
Installing FOP
Using FOP
Using other XSL-FO processors
Makefiles
XSL servers

The first step to using the DocBook XSL stylesheets is to get the processing tools installed, configured, and tested to make sure they are working. There are several components that need to be installed:

This chapter provides the details for obtaining, installing, and executing the individual tools to process DocBook files.

You can avoid most of those details by installing one of the already-assembled packages of DocBook tools that are available for download. A good number of them are inventoried at the DocBook Wiki site at http://www.docbook.org/wiki/moin.cgi/DocBookPackages. There are RPM and Debian packages for Linux systems, Fink packages for Mac systems, and Cygwin and other packages for Windows systems. The packages include most or all of the components listed above, and usually a convenience script to help you get started. If one of those packages meet your needs, then go for it.

The disadvantage of such packages is that they may not keep up with the latest releases of all of the components. Each of the components follows its own development schedule, and it is hard for all of the package developers to quickly integrate each new release into a new package. Installing your own components lets you update the components individually whenever they become available. Even if you install one of the packages, the information in this chapter can help you update individual components when you need to.

Note on Windows pathnames

For most XSL tools, pathnames on a Microsoft Windows system should be specified using the standard URI syntax. For example, a Windows pathname such as c:\xml\docbook-xsl should be entered as file:///C:/xml/docbook-xsl. If you have spaces in part of the path, they must be escaped as %20. So a pathname such as c:\xml\docbook xsl would be entered as file:///C:/xml/docbook%20xsl. Generally spaces in pathnames should be avoided where possible.

Installing the DocBook DTD

You can download the DocBook XML DTD from the OASIS website where it is maintained. Go to http://www.oasis-open.org/docbook/xml/ and select the current version of the DocBook XML DTD. As of this writing the current version is 4.4. From there you should be able to download the zip archive of the XML DTD. You don't want the SGML DTD, nor do you want the experimental version 5 DTD.

If you prefer to use the package installation software on your operating system, the DocBook DTD is also available in some package formats. Check the DocBook Wiki packages page to see if there is a DTD package for your system. If you install from a package, you might want to note where the files install so you can refer to that path later.

The DocBook XML DTD consists of a main file docbookx.dtd and several module files. You only need to reference the main file, and it will pull in the other module files to make up the complete DTD.

Finding the DTD during processing

In general, your XML documents must identify the DTD they are written against by means of a DOCTYPE declaration at the top of the file. The information in the DOCTYPE provides the processor with clues for finding the DTD files. It might contain a PUBLIC identifier, and either a local file reference, or a URL reference (that may still be resolved to a local file).

Local DTD

DocBook documents written to version 4.4 of the DocBook XML DTD might look like this:

Linux example:
<!DOCTYPE book SYSTEM "/usr/share/docbook-4.4/docbookx.dtd">

Windows example:
<!DOCTYPE book SYSTEM "file:///C:/xml/docbook44/docbookx.dtd">

This is a simple direct reference to a specific file location on your machine. It will work if the main DTD file is at that location. The problem with a specific reference like this is that it is not flexible. If you move your XML file to another machine where the DTD is installed somewhere else, or if you move the DTD on your machine, then the connection is lost. Fixing it is not a big problem if you have just a few files, but if you have hundreds of XML files, it is a tedious chore. It's also unnecessary if you use catalog files.

Network DTD

It is possible to fetch the DocBook DTD over the web. The XML standard supports using URLs for DTD references. The advantage is that the DTD is always available, as long as a web connection is available. That makes the document very portable.

The DOCTYPE declaration for network DTD access looks like this:

<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
                    "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd">

Most XSL processors know how to fetch the DTD over the web, including all the DTD file modules that it references. This isn't recommended for a slow network or flaky network connection. Even with a fast connection, it is slower than a local filesystem access. The next section shows you how to use a catalog to combine local and network access.

XML catalog to locate the DTD

With an XML catalog, you can have the best of both local and network access. The catalog lets you map the standard network URL to a local file. If the catalog processor finds the local file during processing, it will use it. Otherwise, it falls back to using the network URL. With this arrangement, you get the speed of local access with the reliability and portability of network access.

An XML catalog entry looks like this:

<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
  <group id="DocbookDTD" prefer="public">
      <system  
         systemId="http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"
         uri="file:///usr/share/xml/docbook44/docbookx.dtd"/>
  </group>
</catalog>

When processed with a catalog-aware XSL processor, a DOCTYPE reference to the URL http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd will be replaced with the uri attribute value /usr/share/xml/docbook44/docbookx.dtd if that file exists on the local system. If not, then the URL is used through network access.

See Chapter 4, XML catalogs for complete information on XML catalogs.

SGML catalog to locate the DTD

You can achieve a similar mapping with an SGML catalog, which is an older technology using a simpler syntax. Here is an example of an SGML catalog that maps both the PUBLIC and SYSTEM identifiers to a local file:

PUBLIC  "-//OASIS//DTD DocBook XML V4.4//EN"  \
                    "/usr/share/xml/docbook44/docbookx.dtd"
SYSTEM  "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"  \
                    "/usr/share/xml/docbook44/docbookx.dtd"

Character entities

The DocBook DTD defines the character entities that make it easy to add special characters to your XML files. To enter a copyright symbol, for example, it is easier to remember a name like &copy; than the equivalent &#x00A9; Unicode entity.

Depending on the source for your DTD download, the files that define the character entity names may not be included. They are on the OASIS website under http://www.oasis-open.org/docbook/xmlcharent/index.shtml. Create a directory named ent in the directory where the DocBook DTD files are, and extract the entity files into ent. The DTD will find them in that location because it has references that look like "ent/iso-amsa.ent".

Validation

Validation is the act of checking your document against the element names and rules in the DTD. Most XSL processors won't automatically take the time to validate your document while it is converting it to HTML or XSL-FO. The processor will read a DOCTYPE declaration in your file and try to find the DTD, and will likely report an error if it cannot. If it does find the DTD, it will read and use an entity declarations in it. If you use any DocBook character entities, the processor must be able to find the DTD to resolve those entity references.

Since the XSL processor doesn't automatically validate your document, it is possible to process invalid but well-formed DocBook documents. But you do so at your own risk, because the DocBook stylesheets expect to be processing a valid document. Your output may not be what you expect if you don't follow the rules. You will have fewer mysterious problems if you validate your documents before processing them.

Some of the XSLT processors described in later sections include validation utilities. For example, xsltproc includes a program called xmllint that can validate an XML document using a command like the following:

xmllint  --valid  --noout  document.xml

If you are looking for a Java-based validation tool, there is a XML validation tool hidden in the distribution of Xalan 2.4 or newer. The xalansamples.jar file that is located with the other Xalan jar files has a Validate utility. If you include the xalansamples.jar file in your Java CLASSPATH along with the other Xalan jar files, then you should be able to use this command:

java  Validate  document.xml