Language support

The DocBook XSL stylesheets support documents written in many languages. This support is made easier by the fact that XML itself supports Unicode, which includes characters for most of the world's languages. To write a DocBook document in a given language, you just have to identify a character encoding that expresses the language, and then indicate that character encoding in the XML declaration that must appear at the top of each XML file, such as <?xml version="1.0" encoding="iso-8859-1"?>. You write the text of your document using that character encoding, and you use the standard DocBook tags (which have English names) to mark the XML elements. Then you just have to make sure the XSLT processor you use supports your encoding.

Using the lang attribute

The preferred method of indicating language is by adding a lang attribute with a language code value, usually on the document root element . This method records the language within the document itself, so it is clear to anyone examining the document. Also, the attribute triggers automatic processing in that language by the stylesheets. That means you don't have to indicate the language on the processing command line.

Since lang is one of the common DocBook attributes, it is permissible for all DocBook elements. The attribute applies to the element it is in, and all of that element's descendants. If one of the descendants has a different lang attribute, then it overrides the ancestor's value for the scope of that descendant. For example, if a document's root element is book, you can put a lang attribute in the book start tag so it applies to the whole document. If one of your chapters is written in a different language, then it can have a lang attribute whose value applies only to that chapter. The following example illustrates this usage.

<book  lang="de">
  ...
  <chapter>
    <title>Profil verwalten</title>
    ...
  </chapter>
  <chapter  lang="en">
    <title>Special Features</title>
    ...
  </chapter>
  <chapter>
    <title>Junk-E-Mails vermeiden</title>
    ...
  </chapter>
</book>

In this example, the document root element sets the lang to de (German) for the document. So the chapters Profil verwalten and Junk-E-Mails vermeiden are processed as German. But the Special Features chapter has its own lang set to en (English). So the second chapter is processed as English. Its label will be Chapter in the chapter title page, the book's table of contents and any cross references to that chapter.

Using language parameters

You can also indicate the language of a document at processing time by using a stylesheet parameter set to a language code. This is useful if you are processing a document that doesn't have a lang attribute and you cannot edit it to add one, or if you want to override the attribute it does have. There are two stylesheet parameters that can be used to set the processing language:

  • The parameterl10n.gentext.language will override any lang attribute set in the document. This parameter is only needed if the document is a single language that is not English, and one of the following conditions.

    • It does not have a lang attribute.

    • The lang attribute it does have is wrong.

    • The lang attribute it does have is not one of those supported by the stylesheets.

  • The parameterl10n.gentext.default.language can be used in the same circumstances as the previous parameter, but it won't override any lang attributes in the stylesheet. It will apply only to those elements for which no lang attribute applies. Thus if there is a lang attribute on the document's root element, then the parameter will have no effect.

If you wondering about the names of these parameters, you probably don't recognize the odd abbreviation l10n, which is a lower case L followed by the number 10 and the letter n. This is an abbreviation of “localization” (the first and last letters, and 10 letters in between). It means the gentext strings are adapted to a particular locale in the world. This abbreviation is similar to i18n, which is an abbreviation for “internationalization”.

Language codes

As of this writing, DocBook XSL supports 45 languages. That means it has translations for the generated text strings in 45 languages. The translations are stored in XML files named for the language code, such as en.xml, fr.xml, etc. These are stored in the common subdirectory of the stylesheet distribution. So if you want to check if a given language is supported, look in that directory for an XML file of that name. The top of each file looks like this:

<?xml version="1.0" encoding="US-ASCII"?>
<l:l10n xmlns:l="http://docbook.sourceforge.net/xmlns/l10n/1.0" 
        language="it"
        english-language-name="Italian">

The language attribute identifies the language code. It is this attribute value that the stylesheet uses to match to a lang attribute in a document. The filename just happens to have the same name. The english-language-name attribute gives the language name in English for each language.

Most of the language codes are two-letters, named using the ISO 639 standard. A few have variations to reflect how a given language is used in a different country. For example the pt_br language is for Portuguese as spoken in Brazil. The country codes that are used in the second part of the name are listed in the ISO 3166 alpha-2 standard.

When you specify a language code for your document in an attribute or parameter, you can use upper- or lower-case letters. If it has a country extension, you can use either dash or underscore as the separator. In all these cases the stylesheets will map the code to the supported value.

If you specify a country extension, and there is no translation for that extension, the stylesheet will fall back to using just the two-letter language code. If a two-letter code is not supported, then the stylesheets fall back to English.

Extending the set of languages

In theory, DocBook XSL can support any language that can be expressed in Unicode. In practice, only 45 languages have translated text strings that the stylesheets can access. If you need a language that is not currently available, then you can make the translations and add them to your stylesheets. You should copy the English file common/en.xml to a new language code XML file, and then translate the text attributes in the file. The translations should use Unicode numerical character references for any non-ASCII characters.

The easiest way to add a new language to the stylesheets is to submit your translation to the DocBook XSL project for integration into the next release. Send email to the project admins at the DocBook SourceForge site. Then your new translation will be included in future stylesheet distributions. It also makes it available to other users, who can make contributions to it as well.

If you want to include your translation only in your own stylesheet, you need to do the following:

  1. Copy the stylesheet file common/l10n.xml to a new filename, such as common/my-l10n.xml. It is best to keep it in the same directory because it references all the other language files in that directory.

  2. Edit your new file to add a SYSTEM entity declaration to the DOCTYPE and an entity reference to the body of your copied file. Just copy similar lines from the file itself. The entity declaration should point to your new language file location, relative to the common directory.

    <!ENTITY fy SYSTEM "../mystuff/fy.xml">
    ...
    &fy;
    
  3. Create a stylesheet customization layer if you don't already have one.

  4. Add the following line to your customization file:

    <xsl:param name="l10n.xml"
         select="document('../mystuff/my-l10n.xml')"/>
    

    The path to your enhanced my-l10n.xml file should be relative to your stylesheet customization file.

The document() function loads your customized file into the stylesheet parameter l10n.xml. That parameter is searched when looking for a translation.

This arrangement is a bit awkward, and will need to be repeated with each new stylesheet release. It's best to complete the translation and submit it to the DocBook project.