Sometimes it is useful to be able to generate output that is simply plain text. For example, you may need plain text for a README file. It would be fairly simple to delete all the markup tags from a DocBook document, but the result would not be very satisfactory. Paragraphs might be readable, but tables would not be. Also, there would be no generated text such as number labels and xref text.
What you want is text that is processed by a DocBook stylesheet and formatted sufficiently to be meaningful. Unfortunately, there is no DocBook XSL stylesheet dedicated to generating formatted plain text. Most people who need formatted text use a two-step process:
Process the DocBook into HTML output.
Use a text-based web browser to convert the HTML to formatted plain text.
There are at least three nongraphical text-based web browsers that you can choose from to format HTML as plain text:
The original text-based web browser, still used by many people. The latest version handles simple tables. Lynx is available from http://lynx.browser.org/ for most platforms. You can use its -dump
option to save the formatted text to a file:
lynx -dump myfile.html > myfile.txt
An enhanced version of the Links (no relation to Lynx) character browser. It handles tables better than Lynx. ELinks is available from http://www.elinks.or.cz/. It also has a -dump
option:
elinks -dump myfile.html > myfile.txt
A text-based browser developed in Japan that can handle tables. It is available from http://w3m.sourceforge.net/. W3m also has a -dump
option:
w3m -dump myfile.html > myfile.txt
Conversion of HTML generated by DocBook works quite well with these browsers. That's because the HTML is pretty clean. DocBook's HTML output doesn't use frames, layout tables or Javascript, all features that are hard for text-only browsers to handle. Any CSS styling you apply will be lost, of course.
DocBook XSL: The Complete Guide - 3rd Edition | PDF version available | Copyright © 2002-2005 Sagehill Enterprises |