Processing your modular documents

You can run XSLT processing on individual modules or on whole documents assembled from modules. If you process whole documents, you will need a processor that can resolve any XIncludes. The following is an example with xsltproc and its --xinclude option:

xsltproc  \
      --xinclude \
      --stringparam base.dir htmlout/  \
      docbook-xsl/html/chunk.xsl  bookfile.xml

DocBook modules with a DOCTYPE declaration are valid mini documents, and they can be processed individually. This is useful for quick unit testing, but won't produce well integrated output. You generally will want to process your content for output using larger master documents that assemble modules. There are several reasons for doing this:

When a modular file is processed on its own, certain context information is missing. For example, the third chapter in a book does not know it is the third chapter when processed by itself, so its chapter number appears as "1". Likewise, all printed chapters will begin on page 1. In order to process your content in modules and have each bit of output fit into the whole, you would have to create a customization that feeds the processor context information such as chapter number and starting page number.

If you decide to process individual modules for testing, you might want to output the results to a directory separate from where you output the whole document. That way you don't mix up partial builds with complete builds.

Java processors and XIncludes

Some XSL processors such as Saxon and Xalan don't fully handle XIncludes yet. Although xsltproc handles XIncludes, you may be required to use Saxon or Xalan to take advantage of some of its extension functions. Until full XInclude support is provided in the Java XSLT processors, you have three choices for handling XIncludes.

  • Use Xerces with Saxon or Xalan.

  • Use xmllint as a preprocessor to resolve XIncludes.

  • Use XIncluder as a preprocessor to resolve XIncludes.

Using Xerces to resolve XIncludes

You can use Xerces-J version 2.5.0 or later as the XML parser for your XSLT processing. Currently Xerces will only include whole files, so any kind of XPointer syntax is ignored. But if you intend to include whole files, then it can meet your needs. It integrates completely with Saxon or Xalan. You will need to download the latest Xerces-J from http://xml.apache.org/xerces2-j/index.html, add the xercesImpl.jar file to your Java CLASSPATH, and add a couple of options to your java command. The following is an example using Saxon:

Example 22.4. XIncludes with Saxon and Xerces

java -cp "/xml/saxon653/saxon.jar:/xml/xerces-2_6_2/xercesImpl.jar" \
    -Djavax.xml.parsers.DocumentBuilderFactory=\
       org.apache.xerces.jaxp.DocumentBuilderFactoryImpl \
    -Djavax.xml.parsers.SAXParserFactory=\
       org.apache.xerces.jaxp.SAXParserFactoryImpl \
    -Dorg.apache.xerces.xni.parser.XMLParserConfiguration=\
       org.apache.xerces.parsers.XIncludeParserConfiguration \
    com.icl.saxon.StyleSheet \
    -o bookfile.html \
    bookfile.xml \
    ../docbook-xsl-1.68.1/html/docbook.xsl

The first two -D options set up Xerces as the XML parser in Saxon, and the third one turns on the XInclude feature.

If you are also using an XML catalog, you will need to add the catalog resolver options to the command line. They appear after com.icl.saxon.StyleSheet, because those are options understood by that classname, not the Java interpreter. You must also add to your Java CLASSPATH the resolver.jar file and the directory containing the CatalogManager.properties file, as described in Chapter 4, XML catalogs. The following example shows the full command line:

Example 22.5. XIncludes and XML catalogs with Saxon and Xerces

java -cp "/xml/saxon653/saxon.jar:/xml/xerces-2_6_2/xercesImpl.jar:resolver.jar:." \
    -Djavax.xml.parsers.DocumentBuilderFactory=\
       org.apache.xerces.jaxp.DocumentBuilderFactoryImpl \
    -Djavax.xml.parsers.SAXParserFactory=\
       org.apache.xerces.jaxp.SAXParserFactoryImpl \
    -Dorg.apache.xerces.xni.parser.XMLParserConfiguration=\
       org.apache.xerces.parsers.XIncludeParserConfiguration \
    com.icl.saxon.StyleSheet \
    -x org.apache.xml.resolver.tools.ResolvingXMLReader \
    -y org.apache.xml.resolver.tools.ResolvingXMLReader \   
    -r org.apache.xml.resolver.tools.CatalogResolver \
    -o bookfile.html \
    bookfile.xml \
    ../docbook-xsl-1.68.1/html/docbook.xsl

To add XInclude processing to Xalan, you only need to use the third -D option, because Xalan already is set up to use the Xerces XML parser. The XInclude version of Xerces has been included since Xalan version 2.6.0. If you are using an older version, you will need at least Xalan-J version 2.5.1 and Xerces 2.5.0. The following is an example command:

Example 22.6. XIncludes with Xalan and Xerces

java \
   -Djava.endorsed.dirs="/xml/xerces-2_6_2;/xml/xalan-2_6_0/bin"  \
   -Dorg.apache.xerces.xni.parser.XMLParserConfiguration=\
      org.apache.xerces.parsers.XIncludeParserConfiguration \
   org.apache.xalan.xslt.Process  \
   -out bookfile.html \
   -in bookfile.xml \
   -xsl ../docbook-xsl-1.68.1/html/docbook.xsl

This example uses the java.endorsed.dirs option to make sure Java uses the newer version of Xalan. See the section “Bypassing the old Xalan installed with Java” for more information. That option identifies the directories that contain the necessary jar files. Put the path to the Xerces directory first so that version of xercesImpl.jar will be used instead of the possibly older one that is distributed with Xalan.

Using Xerces-J to validate XIncludes

Starting with version 2.5.0, the Xerces-J XML parser can validate files that have XIncludes. It uses a utility program sax.Counter that is included in the xercesSamples.jar file that comes with the Xerces-J distribution. The following is an example of how it is used.

Example 22.7. Validating XIncludes with Xerces

java \
        -cp "xerces-2_6_2/xercesSamples.jar:xerces-2_6_2/xercesImpl.jar" \
        -Dorg.apache.xerces.xni.parser.XMLParserConfiguration=\
            org.apache.xerces.parsers.XIncludeParserConfiguration \
        sax.Counter -v myfile.xml

Here myfile.xml contains one or more XIncludes. The file must also validate before the XIncludes are resolved, which means the xi:include element must be in the DTD. See the section “DTD customizations for XIncludes” for more information.

Using xmllint to resolve XIncludes

You can use xmllint's --xinclude option to generate a version of the document with all the XIncludes resolved, and then process the output with Saxon and the DocBook XSL stylesheets. The xmllint tool is included with libxml2 and is available for most platforms. The following example shows how it can be used.

xmllint  --xinclude  bookfile.xml  >  resolved.xml

java  com.icl.saxon.StyleSheet  resolved.xml \
      docbook-xsl/html/chunk.xsl  base.dir="htmlout/"

The result file resolved.xml is a copy of the input file bookfile.xml but with the XIncludes resolved. You can validate resolved.xml as a second step. The XInclude fallback feature is implemented in xmllint, as is the XPointer syntax that is supported in xsltproc.

Using XIncluder to resolve XIncludes

If you want a Java tool to preprocess XIncludes, you can try XIncluder written by Elliotte Rusty Harold. It is available at ftp://ftp.ibiblio.org/pub/languages/java/javafaq/xincluder.tar.gz. The package supports XPointer syntax for selecting content, and includes several tools for integrating the engine into applications. But if you just want to resolve a document so you can validate or process it, you can use a command like the following:

java \
    -cp "../XInclude/xincluder.jar:../xerces-2_0_2/xercesImpl.jar" \
    com.elharo.xml.xinclude.SAXXIncluder  bookfile.xml  >  resolved.xml

java  com.icl.saxon.StyleSheet  resolved.xml \
      docbook-xsl/html/chunk.xsl  base.dir="htmlout/"

You need to specify the CLASSPATH to the xincluder.jar file from the distribution, as well as an XML parser such as Xerces. This example uses the -cp option to specify the CLASSPATH. On Windows systems, replace the colon in the CLASSPATH with a semicolon.

The result file resolved.xml is a copy of the input file bookfile.xml but with the XIncludes resolved. You can validate resolved.xml as a second step. In the current version of XIncluder, the fallback feature is not implemented and the command will fail if an XInclude cannot be resolved. Also, it can only include whole files, not a part of a file.