Table of Contents
To this point, we have been building stand-alone applications to transform external files, in XML format or just plain text, to OpenDocument format. OpenOffice.org allows you to integrate an XSLT transformation into the application as a filter.
XSLT-based filters work by associating an XML file type, which we will call the “foreign” file, XSLT transformation files for import and/or export, and an OpenOffice.org template file. XML elements in the foreign file are associated with styles in the template file. The import transformation will take the foreign file’s content and insert it into the template, assigning styles as appropriate. The export transformation will read the OpenOffice.org document, and, using the style information, create a foreign file.
The remainder of this chapter will be a case study that shows how to construct and install XSLT-based filters.
The XML that we will import is a database of amateur wrestling clubs in California (yes, this is an actual database; the phone numbers and emails have been changed.) The state is divided into several areas or associations; for example, SCVWA—the Santa Clara Valley Wrestling Association. Each association consists of a series of clubs. Example 9.1, “Sample Club Database” shows an abbreviated file. A club can have multiple email addresses, and the <info> element is optional. The only element that isn’t self-explanatory is the <age-groups> element. Its type attribute tells which age groups the club serves: Kids, Cadets, Juniors, Open (competitors out of high school), and Women. The <info> element may contain hypertext link to a club’s website, represented by the HTML <a> element, which has been borrowed into this custom language without a namespace.
Example 9.1. Sample Club Database
<club-database> <association id="BAWA"> <club id="Q17" charter="2004"> <name>SF Elite Wrestling</name> <contact>Vic Anastasio</contact> <location>San Francisco</location> <phone>415-555-3884 x223 (w)</phone> <email>[email protected]</email> <age-groups type="KCJ"/> <info> Kids division from 6th grade and up. Practices are Tuesdays and Thursdays. See our website at <a href="http://example.com/elite">http://example.com/elite</a> </info> </club> </association> <association id="SCVWA"> <club id="H12b" charter="2003"> <name>Cougar Wrestling Club</name> <contact>Ricardo Garcia</contact> <location>San Jose, Saratoga, Palo Alto</location> <phone>408-555-5514</phone> <email>[email protected]</email> <email>[email protected]</email> <age-groups type="KCJOW"/> <info> Practice season begins in February and ends in June. </info> </club> </association> </club-database>
Figure 9.1, “Imported Club Database” shows the OpenOffice.org Writer file that we want as a result.
We will now create the template file in OpenOffice.org. This is just a skeleton document with styles that will be associated with XML elements. Figure 9.2, “Styles in Writer Template” shows the names of the paragraph and character styles in the template. [This is file clublist_template.ott in directory ch09 in the downloadable example files.]
That having been done, we create the stylesheet, shown in Example 9.2, “Stylesheet for Transforming Club List to Writer Document”. The template doesn’t have to include any <style:style> elements; those have been taken care of in the template. [This is file club_to_writer.xsl in directory ch09 in the downloadable example files.]
Example 9.2. Stylesheet for Transforming Club List to Writer Document
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0" xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0" xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0" xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0" xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0" xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0" xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0" xmlns:math="http://www.w3.org/1998/Math/MathML" xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0" xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0" office:version="1.0"> <xsl:template match="/"> <office:document> <office:body> <xsl:apply-templates select="club-database/association"/> </office:body> </office:document> </xsl:template> <xsl:template match="association"> <text:h text:outline-level="1" text:style-name="Association"> <xsl:value-of select="@id"/> </text:h> <xsl:apply-templates select="club"/> </xsl:template> <xsl:template match="club"> <text:h text:level="2" text:style-name="Club Name"> <xsl:value-of select="name" /> <xsl:text> </xsl:text> <text:span text:style-name="Club Code"><xsl:value-of select="@id" /></text:span> </text:h> <text:p text:style-name="Default"> <xsl:text>Chartered: </xsl:text> <text:span text:style-name="Charter"> <xsl:value-of select="@charter"/> </text:span> </text:p> <text:p text:style-name="Default"> <xsl:text>Contact: </xsl:text> <text:span text:style-name="Contact"> <xsl:value-of select="contact"/> </text:span> </text:p> <text:p text:style-name="Default"> <xsl:text>Location: </xsl:text> <text:span text:style-name="Location"> <xsl:value-of select="location"/> </text:span> </text:p> <text:p text:style-name="Default"> <xsl:text>Phone: </xsl:text> <text:span text:style-name="Phone"> <xsl:value-of select="phone"/> </text:span> </text:p> <xsl:choose> <xsl:when test="count(email) = 1"> <text:p text:style-name="Default"> <xsl:text>Email: </xsl:text> <text:span text:style-name="Email"> <xsl:value-of select="email"/> </text:span> </text:p> </xsl:when> <xsl:when test="count(email) > 1"> <text:p text:style-name="Default"> <text:span>Email:</text:span> </text:p> <text:list text:style-name="UnorderedList"> <xsl:for-each select="email"> <text:list-item> <text:p text:style-name="Default"> <text:span text:style-name="Email"> <xsl:value-of select="."/> </text:span> </text:p> </text:list-item> </xsl:for-each> </text:list> </xsl:when> </xsl:choose> <xsl:apply-templates select="age-groups"/> <xsl:apply-templates select="info"/> </xsl:template> <xsl:template match="age-groups"> <text:p text:style-name="Default"> <xsl:text>Age Groups: </xsl:text> <text:span text:style-name="Age Groups"> <xsl:if test="contains(@type,'K')"> <xsl:text>Kids </xsl:text> </xsl:if> <xsl:if test="contains(@type,'C')"> <xsl:text>Cadets </xsl:text> </xsl:if> <xsl:if test="contains(@type,'J')"> <xsl:text>Juniors </xsl:text> </xsl:if> <xsl:if test="contains(@type,'O')"> <xsl:text>Open </xsl:text> </xsl:if> <xsl:if test="contains(@type,'W')"> <xsl:text>Women </xsl:text> </xsl:if> </text:span> </text:p> </xsl:template> <xsl:template match="info"> <text:p text:style-name="Club Info"> <xsl:if test="normalize-space(.) != ''"> <xsl:apply-templates/> </xsl:if> </text:p> </xsl:template> <xsl:template match="a"> <text:a xlink:type="simple" xlink:href="{@href}"><xsl:value-of select="."/></text:a> </xsl:template> </xsl:stylesheet>
Creating the export filter is a much more difficult task. When we imported a file, a hierarchical structure like this …
<association> <club> <name /> <contact /> </club> <club> <name /> <contact /> </club> </association> <association> <!-- etc --> </association>
… was “flattened” into a structure like this:
<text:h text:style-name="Association"/> <text:h text:style-name="Club Name"/> <text:p>Contact: <text:span text:style-name="Contact"/></text:p> <text:h text:style-name="Club Name"/> <text:p>Contact: <text:span text:style-name="Contact"/></text:p> <text:h text:style-name="Association"/> <!-- etc -->
The export filter will have to take this flattened structure and re-create the nesting. The algorithm for this is not particularly difficult:
For each <text:h> element with a text:style-name of Association:
To construct a <club> element:
This is not exactly rocket surgery, but the job is complicated by the fact that XSLT almost exclusively uses recursion, not iteration.[16] This makes the transformation ugly, so we will present it in parts. [This is file writer_to_club.xsl in directory ch09 in the downloadable example files.]
The first part shows the opening <xsl:stylesheet> element, showing the namespaces that could be used in the OpenOffice.org document. The transformation won’t work without these declarations, but we do not want to see the namespaces in the resulting output file. Thus, we use the exclude-result-prefixes attribute to eliminate namespace delcarations from our ouput.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0" xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0" xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0" xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0" xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0" xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0" xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0" xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0" exclude-result-prefixes="text xsl fo office style table draw xlink form script config number svg"> <xsl:output method="xml" indent="yes"/>
Almost the only place we can use XSLT’s natural processing style is to grab all the <text:h> elements for the associations. Processing an association creates the <association> element with its ID, and then starts the process of making entries for the constituent clubs. Implicit in this code is the presumption that there is at least one club in an association.
When you are exporting a document, its XML representation is a “unified document,” with the contents of all the files (meta.xml, styles.xml, content.xml, etc.) all enclosed in an <office:document> element, not the <office:document-content> that we have been using in previous chapters. If you want to see what such a file looks like, install file unified_document.xsl in directory ch09 from the downloadable example files.
<xsl:template match="/"> <xsl:apply-templates select="office:document/office:body/ office:text/text:h[@text:style-name='Association']"/> </xsl:template> <xsl:template match="text:h[@text:style-name='Association']"> <association id="{.}"> <xsl:call-template name="make-club"> <xsl:with-param name="clubNode" select="following-sibling::text:h[1]"/> </xsl:call-template> </association> </xsl:template>
We can now make the club(s) in the association.
<xsl:template name="make-club"> <xsl:param name="clubNode"/> <xsl:if test="$clubNode/@text:style-name = 'Club_20_Name'"> <club> <xsl:attribute name="id"> <xsl:value-of select="$clubNode/text:span[@text:style-name='Club_20_Code']"/> </xsl:attribute> <name><xsl:value-of select="$clubNode"/></name> <xsl:call-template name="make-content"> <xsl:with-param name="contentNode" select="$clubNode/following-sibling::*[1]"/> </xsl:call-template> </club> <xsl:if test="$clubNode/following-sibling::text:h[1]"> <xsl:call-template name="make-club"> <xsl:with-param name="clubNode" select="$clubNode/following-sibling::text:h[1]"/> </xsl:call-template> </xsl:if> </xsl:if> </xsl:template>
Assembling the content for a club works very much along the same lines.
<xsl:template name="make-content"> <xsl:param name="contentNode"/> <xsl:if test="name($contentNode) = 'text:p'"> <xsl:choose> <xsl:when test="$contentNode/text:span"> <xsl:call-template name="add-item"> <xsl:with-param name="spanNode" select="$contentNode/text:span"/> </xsl:call-template> </xsl:when> <xsl:when test="name($contentNode/following-sibling::*[1]) = 'text:list'"> <xsl:call-template name="email-list"> <xsl:with-param name="emailList" select="$contentNode/following-sibling::text:list[1]"/> </xsl:call-template> </xsl:when> <xsl:when test="$contentNode/@text:style-name = 'Club_20_Info'"> <info> <xsl:apply-templates select="$contentNode"/> </info> </xsl:when> </xsl:choose> <xsl:call-template name="make-content"> <xsl:with-param name="contentNode" select="$contentNode/following-sibling::*[1]"/> </xsl:call-template> </xsl:if> </xsl:template>
Here’s the template that adds individual elements as children of a club. The styleAttr variable is for convenience, to make the source easier to read. All the elements except <age-groups> are handled by adding the span’s contents. Age groups are special, and, rather than trying to split up a list of keywords and recursively handle them, we cheat. The call to the translate function eliminates all lowercase letters and blanks, leaving the uppercase abbreviations for the age groups. For example, Kids Cadets Open is instantly reduced to KCO.
<xsl:template name="add-item"> <xsl:param name="spanNode"/> <xsl:variable name="styleAttr" select="$spanNode/@text:style-name"/> <xsl:choose> <xsl:when test="$styleAttr = 'Charter'"> <charter><xsl:value-of select="$spanNode"/></charter> </xsl:when> <xsl:when test="$styleAttr = 'Contact'"> <contact><xsl:value-of select="$spanNode"/></contact> </xsl:when> <xsl:when test="$styleAttr = 'Phone'"> <phone><xsl:value-of select="$spanNode"/></phone> </xsl:when> <xsl:when test="$styleAttr = 'Location'"> <location><xsl:value-of select="$spanNode"/></location> </xsl:when> <xsl:when test="$styleAttr = 'Email'"> <email><xsl:value-of select="$spanNode"/></email> </xsl:when> <xsl:when test="$styleAttr = 'Age_20_Groups'"> <age-groups> <xsl:attribute name="type"> <xsl:value-of select="translate($spanNode, ' abcdefghijklmnopqrstuvwxyz', '')"/> </xsl:attribute> </age-groups> </xsl:when> </xsl:choose> </xsl:template>
Rounding out the XSLT stylesheet are the templates that handle a list of email addresses within a <text:unordered-list> and the <text:a> element inside the club information.
<xsl:template name="email-list"> <xsl:param name="emailList"/> <xsl:for-each select="$emailList/descendant::text:span[@text:style-name='Email']"> <email><xsl:value-of select="."/></email> </xsl:for-each> </xsl:template> <xsl:template match="text:a"> <a href="{@xlink:href}"><xsl:apply-templates/></a> </xsl:template> </xsl:stylesheet>
To install an XSLT-based filter, you choose the “XML Filter Settings” option from the “Tools” menu. Figure 9.3, “General Filter Information” shows the entries for the club file filter.
The rest of the information about the filter is placed in the dialog for the Transformation tab, as shown in Figure 9.4, “Filter Transformation Information”.
When you first create a filter, you may specify any path name to the template. OpenOffice.org will move the template to the path/to/userdir/user/template/template name folder for you. (The path is the path specified in the user’s directory for templates in OpenOffice.org path options dialog box.) If you update your template, you have to re-enter its original path, and OpenOffice.org will update the template in its template directory.
That’s all there is to it; your new filter is ready for use. If you wish, you may also package the XSLT transformations and the template into a .jar file so that other users may install all the files in one swell foop by clicking the “Open package...” button in the main XML Filter Settings dialog. This will put the template in the path/to/userdir/user/template/template name directory and the XSLT file(s) into the path/to/userdir/user/xslt/template name directory.
Copyright (c) 2005 O’Reilly & Associates, Inc. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License".