Chapter 23. Olinking between documents

Chapter 23. Olinking between documents
	Part IV. Special DocBook features

When writing technical documentation, it is often necessary to cross reference to other information. When that other information is in the current document, then DocBook provides support with the xref and link elements. But if the information is in another document, you cannot use those elements because their linkend attribute value must point to an id attribute value that is in the current document.

The olink element is the equivalent for linking outside the current document. It has an attribute for specifying a document identifier (targetdoc) as well as the id of the target element (targetptr). The combination of those two attributes provides a unique identifier to locate cross references. These attributes on olink are available starting with the DocBook XML DTD version 4.2.

Note

The olink element has another set of attributes that support an older style of cross referencing using system entities. Those other olink attributes are targetdocent, linkmode, and localinfo. Those attributes are not used in the olink mechanism described here.

But how are external cross references resolved? By contrast, resolving internal cross references is easy. When a document is parsed, it is loaded into memory and all of its linkends can be connected to ids within memory. But external documents are not loaded into memory, so there must be another mechanism for resolving olinks. The simplest mechanism would be to open each external document, find the target id, and resolve the cross reference. But such a mechanism would not scale well. It would require parsing a potentially large document to find one target, and then repeating that for as many olinks as you have. A more efficient mechanism would parse each document once and save the cross reference target information in a separate target database that can be loaded into memory for quick lookup.

The DocBook XSL stylesheets use such an external cross reference database to resolve olinks. You first process all of your documents in a mode that collects the target information, and then you can process them in the normal mode to produce HTML or print output. The different processing mode is controlled using XSL stylesheet parameters.

How to link between documents

To use olinks to form cross references between documents, you have to spend a little time setting up your files so they can find each other's information. This section describes how to do that. Four of these six steps are performed only once, after which only the last two steps are required to process your documents as needed. This procedure covers olinking for HTML output. A later section describes the differences for linking in PDF output.

Using olink

Identify the documents
Decide which documents are to be included in the domain for cross referencing, and assign a document id to each. A document id is a name string that is unique for each document in your collection. Your naming scheme can be as simple or elaborate as your needs require.
For example, you might be writing mail agent documentation that includes a user's guide, an administrator's guide, and a reference document. These could be assigned simple document ids such as ug, ag, and ref, respectively. But if you expect to also cross reference to other user guides, you might need to be more specific, such as MailUserGuide, MailAdminGuide, and MailReference.
You can add new documents to a collection at any time. You can also have more than one collection, each of which defines a domain of documents among which you can cross reference. A given document can be in more than one collection.
Add olinks to your documents
Insert an olink element where you want to form a cross reference to another document. You supply two attributes in each olink: targetptr is the id value of the element you are pointing to, and targetdoc is the document id that contains the element.
For example, the Mail Administrator's Guide might have a chapter on user accounts like this:
```
<chapter id="user_accounts">
<title>Administering User Accounts</title>
<para>blah blah</para>
...
```
You can form a cross reference to that chapter in the Admin Guide by adding an olink in the User's Guide like this:
```
You may need to update your
<olink targetdoc="MailAdminGuide" targetptr="user_accounts">user accounts
</olink>
when you get a new machine.
```
When the User's Guide is processed into HTML, the text user accounts will become a hot spot that links to the Admin Guide.
If instead you create an empty olink element with the same attributes, then the hot text will be generated by the stylesheet from the title in the other document. In this example, the hot text would be Administering User Accounts. This has the advantage of being automatically updated when the title in the Admin Guide is updated.
Decide on your HTML output hierarchy
To form cross references between documents in HTML, their relative locations must be known. Generally, the HTML files for multiple documents are output to different directories, particularly if chunking is used. So before going any further, you must decide on the names and arrangement of the HTML output directories for all the documents in your collection.
Here are the output directories for our example docs:
```
documentation
    |
    |-- guides
    |      |-- mailuser      contains MailUserGuide files
    |      |-- mailadmin     contains MailAdminGuide files
    | 
    |-- reference
           |-- mailref       contains MailReference files
```
It is only the relative location that counts; the top level name is not used. The stylesheet will compute the relative path for cross reference URLs using the relative locations.

Create the target database document

Each collection of documents has a master target database document that is used to resolve all olinks in that collection. The target database document is an XML file that is created once, by hand. It provides a framework that pulls in the target data for each of the documents in the collection. Since all the document data is pulled in dynamically, the database document itself is static, except for changes to the collection.

The following is an example target database document named olinkdb.xml. It structures the documents in the collection into a sitemap element that provides the relative locations of the outputs for HTML. Then it pulls in the individual target data using system entity references to the files generated in step 5 below.

Example 23.1. Target database document

<?xml version="1.0" encoding="utf-8"?> 
<!DOCTYPE targetset 
       SYSTEM "file:///tools/docbook-xsl/common/targetdatabase.dtd" [
<!ENTITY ugtargets SYSTEM "file:///doc/userguide/target.db"> 
<!ENTITY agtargets SYSTEM "file:///doc/adminguide/target.db">
<!ENTITY reftargets SYSTEM "file:///doc/man/target.db">
]>
<targetset> 
  <targetsetinfo> 
    Description of this target database document,
    which is for the examples in olink doc.
  </targetsetinfo>

  <!-- Site map for generating relative paths between documents -->
  <sitemap> 
    <dir name="documentation"> 
      <dir name="guides"> 
        <dir name="mailuser"> 
          <document targetdoc="MailUserGuide"    
                    baseuri="userguide.html"> 
            &ugtargets; 
          </document>
        </dir>
        <dir name="mailadmin">
          <document targetdoc="MailAdminGuide">
            &agtargets;
          </document>
        </dir>
      </dir>
      <dir name="reference">
        <dir name="mailref">
          <document targetdoc="MailReference">
            &reftargets;
          </document>
        </dir>
      </dir>
    </dir>
  </sitemap>
</targetset>

	Set the database encoding to `utf-8` for the database, regardless of what encoding your documents are written in. The individual data files are written out in `utf-8` so a database can have mixed languages and not have mixed encodings.
	Declare a system entity for each document target data file. This assigns a path to the `target.db` file for each document in the collection.
	Root element for the database is `targetset`.
	The `targetsetinfo` element is optional, and contains a description of the collection.
	The `sitemap` element contains the framework for the hierarchy of HTML output directories.
	Directory that contains all the HTML output directories.
	Directory that contains only other directories, not documents.
	Directory that contains one or more document output.
	The `document` element has the document identifier in its `targetdoc` attribute.
	For documents processed without chunking, the output filename must be provided in the `baseuri` attribute since that name is not generated by the document itself. Then cross references can be resolved using the form `filename.html#targetptr`. An alternative process is to leave off the `baseuri` attribute and instead set the `olink.base.uri` parameter to the HTML filename when you generate its `target.db` file. That lets you set the filename at runtime.
	The system entity reference pulls in the target data for this document.

When this document is processed, the content of the target.db file is pulled into its proper location in the hierarchy using its system entity reference, thus forming the complete cross reference database. That makes all the information available to the XSL stylesheets to look up olink references and resolve them using the information in the database.

The use of system entities permits the individual target.db data files for each document to be updated as needed, and the database automatically gets the update the next time it is processed.

System entities also permit the use of XML catalogs to resolve the location of the various data files.

Generate target data files
For each document in your collection, you generate a data file that contains all the potential cross reference targets. You do that by processing the document using your regular DocBook XSL stylesheet but with an additional collect.xref.targets parameter. The following is an example command.
```
xsltproc  \
    --stringparam  collect.xref.targets  "only"  \
    docbook.xsl  \
    userguide.xml
```
This command should generate in the current directory a target data file, named target.db by default. You can change the filename by setting the parameter targets.filename. The generated file is an XML file that contains only the information needed to form cross references to each element in the document.
The DocBook XSL stylesheets contain the code needed to generate the target data file. The parameter collect.xref.targets controls when that code is applied, and has three possible values.
no
Don't generate the target data file (this is the default). Use this setting when you want to process just your document for output without first regenerating the target data file. This is the default because any documents without olinks don't need to do this extra processing step.
yes
Generate the target data file, and then process the document for output. Use this setting when you change your document and want to regenerate both the target data file and the output.
only
Generate the target data file, but don't process the document for output. Use this setting when you want to update the target data file for use by other documents, or when you set things up for the first time.
In the command examples above, docbook.xsl should be the pathname to the DocBook stylesheet file you normally use to process your document for HTML output. For example, that might be:
```
../html/docbook.xsl
```
If you use the DocBook chunking feature, then it would be the path to chunk.xsl instead. If you use a DocBook XSL customization file, then it should be pathname to that file. It will work if your customization file imports either docbook.xsl or chunk.xsl, and it will pick up whatever customizations you have for cross reference text. If you use different stylesheet variations for different documents, be sure to use the right one for each document. For example, you might use chunking on some long documents, but not on short documents. Use Makefiles or batch files to keep it all consistent.
If you are processing your document for print, then generate the targets.db file using the HTML stylesheet, and then process your document with the FO stylesheet.
Process each document for output
Now all that remains is to process each document to generate its output. That's done using the normal XSL DocBook stylesheet with an additional parameter, the database filename. The DocBook XSL stylesheets know how to resolve olinks using the target database.
The following are command examples for three XSL processors:
```
xsltproc:
xsltproc  --output /http/guides/mailuser/userguide.html \
   --stringparam target.database.document "olinkdb.xml" \
   --stringparam current.docid "MailUserGuide" \
   docbook.xsl  userguide.xml

Saxon:
java com.icl.saxon.StyleSheet -o /http/guides/mailuser/userguide.html \
         userguide.xml  docbook.xsl \
         target.database.document="/projects/mail/olinkdb.xml" \
         current.docid="MailUserGuide"
         
Xalan:
java org.apache.xalan.xslt.Process \
         -OUT /http/guides/mailuser/userguide.html  \
         -IN userguide.xml \
         -XSL  docbook.xsl \
         -PARAM target.database.document "/projects/mail/olinkdb.xml" \
         -PARAM current.docid "MailUserGuide"
         
```
The only difference from the normal document processing is the addition of the two parameters. The target.database.document parameter provides the location of the master target database file. As your document is processed, when the stylesheet encounters an olink that has targetdoc and targetptr attributes, it looks up the values in the target database and resolves the reference. If it cannot open the database or find a particular olink reference, then it reports an error.
The other parameter current.docid informs the processor of the current document's targetdoc identifier. That lets the stylesheet compute relative pathname references based on the sitemap in the master database document. The current document's identifier is not recorded in the document itself, so the processor must be told of it by using this parameter.
Note
If you specify a relative path in the target.database.document parameter, it is taken as relative to the document you are processing. You can also use a full path or an XML catalog to locate the file.

Example of target data

The following is an example of target data collected for a short document. The document it was extracted from consists of a chapter that contains just a table and one sect1.

Example 23.2. Olink target data

<?xml version="1.0" ?> 
<div element="chapter" href="#publish" number="1" targetptr="publish"> 
  <ttl>Publishing DocBook Documents</ttl> 
  <xreftext>Chapter 1</xreftext> 
  <obj element="table" href="xsl-processors" number="1.1" 
       targetptr="xsl-processors"> 
    <ttl>XSL Processors</ttl>
    <xreftext>Table 1.1</xreftext>
  </obj>
  <div element="sect1" href="#xsl-arch" number="" targetptr="xsl-arch">
    <ttl>DocBook XSL Architecture</ttl>
    <xreftext>the section called “DocBook XSL Architecture”</xreftext>
  </div>
</div>

	It is a well-formed XML fragment that follows the `targetdatabase.dtd` DTD. However, because the file may be used as a system entity, it should not have a DOCTYPE declaration.
	DocBook structure elements are recorded in `div` tags. Some information is stored in attributes and other information in child elements. Attributes record the element's name, generated number, id value (as `targetptr`), and potential href fragment.
	The `ttl` tag records the object's title.
	The `xreftext` tag records the generated text that would be output if an `xref` pointed to that element. This field uses the gentext strings of the stylesheet that generates the target data file. The exception is when a target element includes an `xreflabel` attribute, which overrides the gentext string as it would for an `xref`.
	Non-structural elements like tables and figures are recorded in `obj` (object) tags. The `div` elements can nest, but the `obj` elements do not.

Similar data files are generated for other documents in your collection. These separate data files are assembled into one large target database by pulling them in as system entities to a master database document. See Example 23.1, “Target database document” to see how these data files are inserted into document elements within the master file. Keeping them as separate system entities means they can be individually updated as needed. Yet they are all accessible from a single master document.

For the database to work, all of the system entities referenced in it must be available when processing takes place. A missing data file will be reported as an error, and any olinks to that document will not resolve. If a set of linked documents has a definite publishing date, you can freeze a copy of the database as a snapshot of the released documents for future documents to reference. If you replace the system entity references with the actual data for each document, you can save it as one big file.


Inserting dynamic content		Details to watch out for