The DocBook XSL stylesheets support documents written in many languages. This support is made easier by the fact that XML itself supports Unicode, which includes characters for most of the world's languages. To write a DocBook document in a given language, you just have to identify a character encoding that expresses the language, and then indicate that character encoding in the XML declaration that must appear at the top of each XML file, such as
<?xml version="1.0" encoding="iso-8859-1"?>. You write the text of your document using that character encoding, and you use the standard DocBook tags (which have English names) to mark the XML elements. Then you just have to make sure the XSLT processor you use supports your encoding.
The language support in the DocBook XSL stylesheets is primarily for generated text that the stylesheets produce. For example, an English document should label a chapter with
Chapter 3, while a German document's chapter should be labeled
The XML document encoding does not tell the stylesheets what language the document is written in. You have to supply that information with either a
lang attribute in the document or a stylesheet parameter at processing time.
Indexing in DocBook XSL does not sort properly for non-English languages. But there is a customization available that does sort properly. See the section “Internationalized indexes”.
The preferred method of indicating language is by adding a
lang attribute with a language code value, usually on the document root element . This method records the language within the document itself, so it is clear to anyone examining the document. Also, the attribute triggers automatic processing in that language by the stylesheets. That means you don't have to indicate the language on the processing command line.
lang is one of the common DocBook attributes, it is permissible for all DocBook elements. The attribute applies to the element it is in, and all of that element's descendants. If one of the descendants has a different
lang attribute, then it overrides the ancestor's value for the scope of that descendant. For example, if a document's root element is
book, you can put a
lang attribute in the book start tag so it applies to the whole document. If one of your chapters is written in a different language, then it can have a
lang attribute whose value applies only to that chapter. The following example illustrates this usage.
<book lang="de"> ... <chapter> <title>Profil verwalten</title> ... </chapter> <chapter lang="en"> <title>Special Features</title> ... </chapter> <chapter> <title>Junk-E-Mails vermeiden</title> ... </chapter> </book>
In this example, the document root element sets the lang to
de (German) for the document. So the chapters
Profil verwalten and
Junk-E-Mails vermeiden are processed as German. But the
Special Features chapter has its own lang set to
en (English). So the second chapter is processed as English. Its label will be
Chapter in the chapter title page, the book's table of contents and any cross references to that chapter.
You can also indicate the language of a document at processing time by using a stylesheet parameter set to a language code. This is useful if you are processing a document that doesn't have a
lang attribute and you cannot edit it to add one, or if you want to override the attribute it does have. There are two stylesheet parameters that can be used to set the processing language:
l10n.gentext.language will override any lang attribute set in the
This parameter is only needed if the document is a single language
that is not English, and one of the following conditions.
It does not have a lang attribute.
The lang attribute it does have is wrong.
The lang attribute it does have is not one of those supported by the stylesheets.
l10n.gentext.default.language can be used in the same circumstances as the
parameter, but it won't override any lang attributes in the
stylesheet. It will apply only to those elements for which no lang
attribute applies. Thus if there is a lang attribute on the
document's root element, then the parameter will have no
If you wondering about the names of these parameters, you probably don't recognize the odd abbreviation
l10n, which is a lower case L followed by the number 10 and the letter n. This is an abbreviation of “localization” (the first and last letters, and 10 letters in between). It means the gentext strings are adapted to a particular locale in the world. This abbreviation is similar to
i18n, which is an abbreviation for “internationalization”.
As of this writing, DocBook XSL supports 45 languages. That means it has translations for the generated text strings in 45 languages. The translations are stored in XML files named for the language code, such as
fr.xml, etc. These are stored in the
common subdirectory of the stylesheet distribution. So if you want to check if a given language is supported, look in that directory for an XML file of that name. The top of each file looks like this:
<?xml version="1.0" encoding="US-ASCII"?> <l:l10n xmlns:l="http://docbook.sourceforge.net/xmlns/l10n/1.0" language="it" english-language-name="Italian">
language attribute identifies the language code. It is this attribute value that the stylesheet uses to match to a
lang attribute in a document. The filename just happens to have the same name. The
english-language-name attribute gives the language name in English for each language.
Most of the language codes are two-letters, named using the ISO 639 standard. A few have variations to reflect how a given language is used in a different country. For example the
pt_br language is for Portuguese as spoken in Brazil. The country codes that are used in the second part of the name are listed in the ISO 3166 alpha-2
When you specify a language code for your document in an attribute or parameter, you can use upper- or lower-case letters. If it has a country extension, you can use either dash or underscore as the separator. In all these cases the stylesheets will map the code to the supported value.
If you specify a country extension, and there is no translation for that extension, the stylesheet will fall back to using just the two-letter language code. If a two-letter code is not supported, then the stylesheets fall back to English.
In theory, DocBook XSL can support any language that can be expressed in Unicode. In practice, only 45 languages have translated text strings that the stylesheets can access. If you need a language that is not currently available, then you can make the translations and add them to your stylesheets. You should copy the English file
common/en.xml to a new language code XML file, and then translate the
text attributes in the file. The translations should use Unicode numerical character references for any non-ASCII characters.
The easiest way to add a new language to the stylesheets is to submit your translation to the DocBook XSL project for integration into the next release. Send email to the project admins at the DocBook SourceForge site. Then your new translation will be included in future stylesheet distributions. It also makes it available to other users, who can make contributions to it as well.
If you want to include your translation only in your own stylesheet, you need to do the following:
Copy the stylesheet file
common/l10n.xml to a new filename, such as
common/my-l10n.xml. It is best to keep it in the same directory because it references all the other language files in that directory.
Edit your new file to add a SYSTEM entity declaration to the DOCTYPE and an entity reference to the body of your copied file. Just copy similar lines from the file itself. The entity declaration should point to your new language file location, relative to the
<!ENTITY fy SYSTEM "../mystuff/fy.xml"> ... &fy;
Create a stylesheet customization layer if you don't already have one.
Add the following line to your customization file:
<xsl:param name="l10n.xml" select="document('../mystuff/my-l10n.xml')"/>
The path to your enhanced
my-l10n.xml file should be relative to your stylesheet customization file.
document() function loads your customized file into the stylesheet parameter
l10n.xml. That parameter is searched when looking for a
This arrangement is a bit awkward, and will need to be repeated with each new stylesheet release. It's best to complete the translation and submit it to the DocBook project.
|DocBook XSL: The Complete Guide - 3rd Edition||PDF version available|
Copyright © 2002-2005 Sagehill Enterprises