This section aims to dig a little deeper into the issues of UI aesthetics and principles, in order to provide some background into the underlying encoding of documents in the XPFE framework. The main portion is taken up by a discussion of Unicode. There is some background to what Unicode is, how Mozilla uses it, and some practical conversion utilities to ensure that your files are in the correct encoding.
Unicode is a broad topic and we cannot hope to give you anywhere near a full understanding of what it is. However, a brief introduction will highlight its importance in the software world and show how it is used as one of the internationalization cornerstones in the Mozilla project.
For more in-depth information, refer to the book The Unicode Standard, Version 3.0 by the Unicode Consortium, published by Addison Wesley Longman. Another useful reference is Unicode: A Primer by Tony Graham, published by M&T Books. |
Unicode is an encoding system used to represent every character with a unique number. It is a standard that came about when multiple encoding systems were merged. It became clear that keeping separate systems was hindering global communication, and applications were not able to exchange information with one another successfully. Now all major systems and applications are standardizing on Unicode. Most major operating systems, such as Windows, AIX, Solaris, and Mac OS, have already adopted it. The latest browsers, including Mozilla, support it. This quote from the Unicode Consortium (http://www.unicode.org/unicode/standard/WhatIsUnicode.html) sums it up the best:
Unicode enables a single software significant cost savings over the use of legacy character sets. Unicode enables a single software product or a single web site to be targeted across multiple platforms, languages and countries without re-engineering. It allows data to be transported through many different systems without corruption.
There are seven character-encoding schemes in Unicode: UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, and UTF-32LE. UTF is an abbreviation for Unicode Transformation Format. The size of the character's internal representation can range from 8 bits (UTF-8) to 32 bits (UTF-32).
One of Unicode's core principles is that it be able to handle any character set and that clients supporting it provide the tools necessary to convert. This conversation can be from Unicode to native character sets and vice versa. The number of native character sets is extensive and ranges from Central European (ISO-8859-2) to Thai (TIS-620).
The default encoding of XUL, XML, and RDF documents in Mozilla is UTF-8. If no encoding is specified in the text declaration, this is the encoding that is used. In the Mozilla tree, you will usually see no encoding specified in this instance and UTF-8 is the default. To use a different encoding, you need to change the XML text declaration at the top of your file. To change your encoding to Central European, include:
<?xml version="1.0" encoding="ISO-8859-2" ?>
The size and proportion of your windows can come into play when you know your application will be localized into more than one language. In some languages, it takes more words or characters, hence more physical space, to bring meaning to some text. This is especially the case in widgets that contain more text, such as when you want to provide usage guidelines in a panel.
One solution that Mozilla uses in at least one place is to make the actual size of the window or make the widget into a localizable entity.
<window style="&window.size;" ...> <!ENTITY window.size "width: 40em; height: 40em;">
The translator or developer can anticipate the size based on the number of words or preview their changes in the displayed UI. If there is an overflow, they can overflow or do the reverse in the case of empty space.
As you begin to localize your application, especially if it is a web-related application, you will encounter words and phrases that have universal meaning and may not require translation. If you translate the whole Mozilla application, for example, you'll find that some words or phrases remain untouched. These items include terms that are used for branding, or universal web browsing terms, such as Bookmarks, Tasks, and Tools. In some instances, the choice to translate some of these terms is purely subjective.