Introduction
The development of lexical analysis and parsing tools has been an
important area of research in computer science. This work has produced
the lexer and parser generators lex and yacc whose worthy
scions camllex and camlyacc are presented in this
chapter. These two tools are the de-facto standard for implementing
lexers and parsers, but there are other tools, like streams or the
regular expression library str, that may be adequate
for applications which do not need a powerful analysis.
The need for such tools is especially acute in the area of
state-of-the-art programming languages, but other applications
can profit from such tools: for example, database systems offering the
possibility of issuing queries, or spreadsheets defining the contents
of cells as the result of the evaluation of a formula. More modestly,
it is common to use plain text files to store data; for example system
configuration files or spreadsheet data. Even in such limited cases,
processing the data involves some amount of lexical analysis and
parsing.
In all of these examples the problem that
lexical analysis and parsing must solve is that of transforming a linear
character stream into a data item with a richer structure: a string of
words, a record structure, the abstract syntax tree for a program, etc.
All languages have a set of vocabulary items (lexicon) and a grammar
describing how such items may be combined to form larger items
(syntax). For a computer or program to be able to correctly process a
language, it must obey precise lexical and syntactic rules. A computer
does not have the detailed semantic understanding required to resolve
ambiguities in natural language. To work around the limitation,
computer languages typically obey clearly stated rules without
exceptions. The lexical and syntactic structure of such languages has
received formal definitions that we briefly introduce in this chapter
before introducing their uses.