Previous Contents Next

BASIC interpreter

The application described in this section is a program interpreter for Basic. Thus, it is a program that can run other programs written in Basic. Of course, we will only deal with a restricted language, which contains the following commands:

Every line of a Basic program is labelled with a line number, and contains only one command. For instance, a program that computes and then prints the factorial of an integer given by the user is written:
 
 5  REM inputting the argument
10  PRINT " factorial of:"
20  INPUT A
30  LET B = 1 
35  REM beginning of the loop
40  IF A <= 1 THEN 80 
50  LET B = B * A
60  LET A = A - 1
70  GOTO 40 
75  REM prints the result
80  PRINT B
We also wish to write a small text editor, working as a toplevel interactive loop. It should be able to add new lines, display a program, execute it, and display the result. Execution of the program is started with the RUN command. Here is an example of the evaluation of this program:
> RUN
 factorial of: ? 5
120
The interpreter is implemented in several distinct parts:
Description of the abstract syntax
: describes the definition of data types to represent Basic programs, as well as their components (lines, commands, expressions, etc.).

Program pretty printing
: consists of transforming the internal representation of Basic programs to strings, in order to display them.

Lexing and parsing
: accomplish the inverse transformation, that is, transform a string into the internal representation of a Basic program (the abstract syntax).

Evaluation
: is the heart of the interpreter. It controls and runs the program. As we will see, functional languages, such as Objective CAML, are particularly well adapted for this kind of problem.

Toplevel interactive loop
: glues together all the previous parts.

Abstract syntax

Figure 6.2 introduces the concrete syntax, as a BNF grammar, of the Basic we will implement. This kind of description for language syntaxes is described in chapter 11, page ??.

Unary_Op ::= -    |    !
Binary_Op ::= +    |    -    |    *    |    /    |    %
  | =    |    <    |    >    |    <=    |    >=    |    <>
  | &    |    ' | '
Expression ::= integer
  | variable
  | "string"
  | Unary_Op   Expression
  | Expression   Binary_Op   Expression
  | ( Expression )
Command ::= REM string
  | GOTO integer
  | LET variable = Expression
  | PRINT Expression
  | INPUT variable
  | IF Expression THEN integer
 
Line ::= integer Command
 
Program ::= Line
  | Line Program
 
Phrase ::= Line | RUN | LIST | END

Figure 6.2: BASIC Grammar.


We can see that the way expressions are defined does not ensure that a well formed expression can be evaluated. For instance, 1+"hello" is an expression, and yet it is not possible to evaluate it. This deliberate choice lets us simplify both the abstract syntax and the parsing of the Basic language. The price to pay for this choice is that a syntactically correct Basic program may generate a runtime error because of a type mismatch.

Defining Objective CAML data types for this abstract syntax is easy, we simply translate the concrete syntax into a sum type:

# type unr_op = UMINUS | NOT ;;
# type bin_op = PLUS | MINUS | MULT | DIV | MOD
| EQUAL | LESS | LESSEQ | GREAT | GREATEQ | DIFF
| AND | OR ;;
# type expression =
ExpInt of int
| ExpVar of string
| ExpStr of string
| ExpUnr of unr_op * expression
| ExpBin of expression * bin_op * expression ;;
# type command =
Rem of string
| Goto of int
| Print of expression
| Input of string
| If of expression * int
| Let of string * expression ;;
# type line = { num : int ; cmd : command } ;;
# type program = line list ;;


We also define the abstract syntax for the commands for the small program editor:

# type phrase = Line of line | List | Run | PEnd ;;


It is convenient to allow the programmer to skip some parentheses in arithmetic expressions. For instance, the expression 1+3*4 is usually interpreted as 1+(3*4). To this end, we associate an integer with each operator of the language:

# let priority_uop = function NOT -> 1 | UMINUS -> 7
let priority_binop = function
MULT | DIV -> 6
| PLUS | MINUS -> 5
| MOD -> 4
| EQUAL | LESS | LESSEQ | GREAT | GREATEQ | DIFF -> 3
| AND | OR -> 2 ;;
val priority_uop : unr_op -> int = <fun>
val priority_binop : bin_op -> int = <fun>
These integers indicate the priority of the operators. They will be used to print and parse programs.

Program pretty printing

To print a program, one needs to be able to convert abstract syntax program lines into strings.

Converting operators is easy:

# let pp_binop = function
PLUS -> "+" | MULT -> "*" | MOD -> "%" | MINUS -> "-"
| DIV -> "/" | EQUAL -> " = " | LESS -> " < "
| LESSEQ -> " <= " | GREAT -> " > "
| GREATEQ -> " >= " | DIFF -> " <> " | AND -> " & " | OR -> " | "
let pp_unrop = function UMINUS -> "-" | NOT -> "!" ;;
val pp_binop : bin_op -> string = <fun>
val pp_unrop : unr_op -> string = <fun>
Expression printing needs to take into account operator priority to print as few parentheses as possible. For instance, parentheses are put around a subexpression at the right of an operator only if the subexpression's main operator has a lower priority that the main operator of the whole expression. Also, arithmetic operators are left-associative, thus the expression 1-2-3 is interpreted as (1-2)-3.

To deal with this, we use two auxiliary functions ppl and ppr to print left and right subtrees, respectively. These functions take two arguments: the tree to print and the priority of the enclosing operator, which is used to decide if parentheses are necessary. Left and right subtrees are distinguished to deal with associativity. If the current operator priority is the same than the enclosing operator priority, left trees do not need parentheses whereas right ones may require them, as in 1-(2-3) or 1-(2+3).

The initial tree is taken as a left subtree with minimal priority (0). The expression pretty printing function pp_expression is:

# let parenthesis x = "(" ^ x ^ ")";;
val parenthesis : string -> string = <fun>
# let pp_expression =
let rec ppl pr = function
ExpInt n -> (string_of_int n)
| ExpVar v -> v
| ExpStr s -> "\"" ^ s ^ "\""
| ExpUnr (op,e) ->
let res = (pp_unrop op)^(ppl (priority_uop op) e)
in if pr=0 then res else parenthesis res
| ExpBin (e1,op,e2) ->
let pr2 = priority_binop op
in let res = (ppl pr2 e1)^(pp_binop op)^(ppr pr2 e2)
(* parenthesis if priority is not greater *)
in if pr2 >= pr then res else parenthesis res
and ppr pr exp = match exp with
(* right subtrees only differ for binary operators *)
ExpBin (e1,op,e2) ->
let pr2 = priority_binop op
in let res = (ppl pr2 e1)^(pp_binop op)^(ppr pr2 e2)
in if pr2 > pr then res else parenthesis res
| _ -> ppl pr exp
in ppl 0 ;;
val pp_expression : expression -> string = <fun>


Command pretty printing uses the expression pretty printing function. Printing a line consists of printing the line number before the command.

# let pp_command = function
Rem s -> "REM " ^ s
| Goto n -> "GOTO " ^ (string_of_int n)
| Print e -> "PRINT " ^ (pp_expression e)
| Input v -> "INPUT " ^ v
| If (e,n) -> "IF "^(pp_expression e)^" THEN "^(string_of_int n)
| Let (v,e) -> "LET " ^ v ^ " = " ^ (pp_expression e) ;;
val pp_command : command -> string = <fun>
# let pp_line l = (string_of_int l.num) ^ " " ^ (pp_command l.cmd) ;;
val pp_line : line -> string = <fun>


Lexing

Lexing and parsing do the inverse transformation of printing, going from a string to a syntax tree. Lexing splits the text of a command line into independent lexical units called lexemes, with Objective CAML type:

# type lexeme = Lint of int
| Lident of string
| Lsymbol of string
| Lstring of string
| Lend ;;
A particular lexeme denotes the end of an expression: Lend. It is not present in the text of the expression, but is created by the lexing function (see the lexer function, page ??).

The string being lexed is kept in a record that contains a mutable field indicating the position after which lexing has not been done yet. Since the size of the string is used several times and does not change, it is also stored in the record:

# type string_lexer = {string:string; mutable current:int; size:int } ;;
This representation lets us define the lexing of a string as the application of a function to a value of type string_lexer returning a value of type lexeme. Modifying the current position in the string is done as a side effect.


# let init_lex s = { string=s; current=0 ; size=String.length s } ;;
val init_lex : string -> string_lexer = <fun>
# let forward cl = cl.current <- cl.current+1 ;;
val forward : string_lexer -> unit = <fun>
# let forward_n cl n = cl.current <- cl.current+n ;;
val forward_n : string_lexer -> int -> unit = <fun>
# let extract pred cl =
let st = cl.string and pos = cl.current in
let rec ext n = if n<cl.size && (pred st.[n]) then ext (n+1) else n in
let res = ext pos
in cl.current <- res ; String.sub cl.string pos (res-pos) ;;
val extract : (char -> bool) -> string_lexer -> string = <fun>


The following functions extract a lexeme from the string and modify the current position. The two functions extract_int and extract_ident extract an integer and an identifier, respectively.

# let extract_int =
let is_int = function '0'..'9' -> true | _ -> false
in function cl -> int_of_string (extract is_int cl)
let extract_ident =
let is_alpha_num = function
'a'..'z' | 'A'..'Z' | '0' .. '9' | '_' -> true
| _ -> false
in extract is_alpha_num ;;
val extract_int : string_lexer -> int = <fun>
val extract_ident : string_lexer -> string = <fun>


The lexer function uses the two previous functions to extract a lexeme.

# exception LexerError ;;
exception LexerError
# let rec lexer cl =
let lexer_char c = match c with
' '
| '\t' -> forward cl ; lexer cl
| 'a'..'z'
| 'A'..'Z' -> Lident (extract_ident cl)
| '0'..'9' -> Lint (extract_int cl)
| '"' -> forward cl ;
let res = Lstring (extract ((<>) '"') cl)
in forward cl ; res
| '+' | '-' | '*' | '/' | '%' | '&' | '|' | '!' | '=' | '(' | ')' ->
forward cl; Lsymbol (String.make 1 c)
| '<'
| '>' -> forward cl;
if cl.current >= cl.size then Lsymbol (String.make 1 c)
else let cs = cl.string.[cl.current]
in ( match (c,cs) with
('<','=') -> forward cl; Lsymbol "<="
| ('>','=') -> forward cl; Lsymbol ">="
| ('<','>') -> forward cl; Lsymbol "<>"
| _ -> Lsymbol (String.make 1 c) )
| _ -> raise LexerError
in
if cl.current >= cl.size then Lend
else lexer_char cl.string.[cl.current] ;;
val lexer : string_lexer -> lexeme = <fun>


The lexer function is very simple: it matches the current character of a string and, based on its value, extracts the corresponding lexeme and modifies the current position to the start of the next lexeme. The code is simple because, for all characters except two, the current character defines which lexeme to extract. In the more complicated cases of '<', we need to look at the next character, which might be a '=' or a '>', producing two different lexemes. The same problem arises with '>'.

Parsing

The only difficulty in parsing our language comes from expressions. Indeed, knowing the beginning of an expression is not enough to know its structure. For instance, having parsed the beginning of an expression as being 1+2+3, the resulting syntax tree for this part depends on the rest of the expression: its structure is different when it is followed by +4 or *4 (see figure 6.3).


Figure 6.3: Basic: abstract syntax tree examples.


However, since the tree structure for 1+2 is the same in both cases, it can be built. As the position of +3 in the structure is not fully known, it is temporarily stored.

To build the abstract syntax tree, we use a pushdown automaton similar to the one built by yacc (see page ??). Lexemes are read one by one and put on a stack until there is enough information to build the expression. They are then removed from the stack and replaced by the expression. This latter operation is called reduction.

The stack elements have type:

# type exp_elem =
Texp of expression (* expression *)
| Tbin of bin_op (* binary operator *)
| Tunr of unr_op (* unary operator *)
| Tlp (* left parenthesis *) ;;
Right parentheses are not stored on the stack as only left parentheses matter for reduction.

Figure 6.4 illustrates the way the stack is used to parse the expression (1+2*3)+4. The character above the arrow is the current character of the string.


Figure 6.4: Basic: abstract syntax tree construction example.


We define an exception for syntax errors.

# exception ParseError ;;
The first step consists of transforming symbols into operators:

# let unr_symb = function
"!" -> NOT | "-" -> UMINUS | _ -> raise ParseError
let bin_symb = function
"+" -> PLUS | "-" -> MINUS | "*" -> MULT | "/" -> DIV | "%" -> MOD
| "=" -> EQUAL | "<" -> LESS | "<=" -> LESSEQ | ">" -> GREAT
| ">=" -> GREATEQ | "<>" -> DIFF | "&" -> AND | "|" -> OR
| _ -> raise ParseError
let tsymb s = try Tbin (bin_symb s) with ParseError -> Tunr (unr_symb s) ;;
val unr_symb : string -> unr_op = <fun>
val bin_symb : string -> bin_op = <fun>
val tsymb : string -> exp_elem = <fun>


The reduce function implements stack reduction. There are two cases to consider, whether the stack starts with: Moreover, reduce takes an argument indicating the minimal priority that an operator should have to trigger reduction. To avoid this reduction condition, it suffices to give the minimal value, zero, as the priority.

# let reduce pr = function
(Texp e)::(Tunr op)::st when (priority_uop op) >= pr
-> (Texp (ExpUnr (op,e)))::st
| (Texp e1)::(Tbin op)::(Texp e2)::st when (priority_binop op) >= pr
-> (Texp (ExpBin (e2,op,e1)))::st
| _ -> raise ParseError ;;
val reduce : int -> exp_elem list -> exp_elem list = <fun>


Notice that expression elements are stacked as they are read. Thus it is necessary to swap them when they are arguments of a binary operator.

The main function of our parser is stack_or_reduce that, according to the lexeme given in argument, puts it on the stack or triggers a reduction.

# let rec stack_or_reduce lex stack = match lex , stack with
Lint n , _ -> (Texp (ExpInt n))::stack
| Lident v , _ -> (Texp (ExpVar v))::stack
| Lstring s , _ -> (Texp (ExpStr s))::stack
| Lsymbol "(" , _ -> Tlp::stack
| Lsymbol ")" , (Texp e)::Tlp::st -> (Texp e)::st
| Lsymbol ")" , _ -> stack_or_reduce lex (reduce 0 stack)
| Lsymbol s , _
-> let symbol =
if s<>"-" then tsymb s
(* remove the ambiguity of the ``-'' symbol *)
(* according to the last exp element put on the stack *)
else match stack
with (Texp _)::_ -> Tbin MINUS
| _ -> Tunr UMINUS
in ( match symbol with
Tunr op -> (Tunr op)::stack
| Tbin op ->
( try stack_or_reduce lex (reduce (priority_binop op)
stack )
with ParseError -> (Tbin op)::stack )
| _ -> raise ParseError )
| _ , _ -> raise ParseError ;;
val stack_or_reduce : lexeme -> exp_elem list -> exp_elem list = <fun>


Once all lexemes are defined and stacked, the function reduce_all builds the abstract syntax tree with the elements remaining in the stack. If the expression being parsed is well formed, only one element should remain in the stack, containing the tree for this expression.

# let rec reduce_all = function
| [] -> raise ParseError
| [Texp x] -> x
| st -> reduce_all (reduce 0 st) ;;
val reduce_all : exp_elem list -> expression = <fun>


The parse_exp function is the main expression parsing function. It reads a string, extracts its lexemes and passes them to the stack_or_reduce function. Parsing stops when the current lexeme satisfies a predicate that is given as an argument.

# let parse_exp stop cl =
let p = ref 0 in
let rec parse_one stack =
let l = ( p:=cl.current ; lexer cl)
in if not (stop l) then parse_one (stack_or_reduce l stack)
else ( cl.current <- !p ; reduce_all stack )
in parse_one [] ;;
val parse_exp : (lexeme -> bool) -> string_lexer -> expression = <fun>
Notice that the lexeme that made the parsing stop is not used to build the expression. It is thus necessary to modify the current position to its beginning (variable p) to parse it later.

We can now parse a command line:

# let parse_cmd cl = match lexer cl with
Lident s -> ( match s with
"REM" -> Rem (extract (fun _ -> true) cl)
| "GOTO" -> Goto (match lexer cl with
Lint p -> p
| _ -> raise ParseError)
| "INPUT" -> Input (match lexer cl with
Lident v -> v
| _ -> raise ParseError)
| "PRINT" -> Print (parse_exp ((=) Lend) cl)
| "LET" ->
let l2 = lexer cl and l3 = lexer cl
in ( match l2 ,l3 with
(Lident v,Lsymbol "=") -> Let (v,parse_exp ((=) Lend) cl)
| _ -> raise ParseError )
| "IF" ->
let test = parse_exp ((=) (Lident "THEN")) cl
in ( match ignore (lexer cl) ; lexer cl with
Lint n -> If (test,n)
| _ -> raise ParseError )
| _ -> raise ParseError )
| _ -> raise ParseError ;;
val parse_cmd : string_lexer -> command = <fun>


Finally, we implement the function to parse commands typed by the user:

# let parse str =
let cl = init_lex str
in match lexer cl with
Lint n -> Line { num=n ; cmd=parse_cmd cl }
| Lident "LIST" -> List
| Lident "RUN" -> Run
| Lident "END" -> PEnd
| _ -> raise ParseError ;;
val parse : string -> phrase = <fun>


Evaluation

A Basic program is a list of lines. Execution starts at the first line. Interpreting a program line consists of executing the task corresponding to its command. There are three different kinds of commands: input-output (PRINT and INPUT), variable declaration or modification (LET), and flow control (GOTO and IF...THEN). Input-output commands interact with the user and use the corresponding Objective CAML functions.

Variable declaration and modification commands need to know how to compute the value of an arithmetic expression and the memory location to store the result. Expression evaluation returns an integer, a boolean, or a string. Their type is value.

# type value = Vint of int | Vstr of string | Vbool of bool ;;


Variable declaration should allocate some memory to store the associated value. Similarly, variable modification requires the modification of the associated value. Thus, evaluation of a Basic program uses an environment that stores the association between a variable name and its value. It is represented by an association list of tuples (name,value):
 
# type environment = (string * value) list ;;
The variable name is used to access its value. Variable modification modifies the association.

Flow control commands, conditional or unconditional, specify the number of the next line to execute. By default, it is the next line. To do this, it is necessary to remember the number of the current line.

The list of commands representing the program being edited under the toplevel is not an efficient data structure for running the program. Indeed, it is then necessary to look at the whole list of lines to find the line indicated by a flow control command (If and goto). Replacing the list of lines with an array of commands allows direct access to the command following a flow control command, using the array index instead of the line number in the flow control command. This solution requires some preprocessing called assembly before executing a RUN command. For reasons that will be detailed shortly, a program after assembly is not represented as an array of commands but as an array of lines:

# type code = line array ;;


As in the calculator example of previous chapters, the interpreter uses a state that is modified for each command evaluation. At each step, we need to remember the whole program, the next line to interpret and the values of the variables. The program being interpreted is not exactly the one that was entered in the toplevel: instead of a list of commands, it is an array of commands. Thus the state of a program during execution is:

# type state_exec = { line:int ; xprog:code ; xenv:environment } ;;


Two different reasons may lead to an error during the evaluation of a line: an error while computing an expression, or branching to an absent line. They must be dealt with so that the interpreter exits nicely, printing an error message. We define an exception as well as a function to raise it, indicating the line where the error occurred.

# exception RunError of int
let runerr n = raise (RunError n) ;;
exception RunError of int
val runerr : int -> 'a = <fun>


Assembly
Assembling a program that is a list of numbered lines (type program) consists of transforming this list into an array and modifying the flow control commands. This last modification only needs an association table between line numbers and array indexes. This is easily provided by storing lines (with their line numbers), instead of commands, in the array: to find the association between a line number and the index in the array, we look the line number up in the array and return the corresponding index. If no line is found with this number, the index returned is -1.

# exception Result_lookup_index of int ;;
exception Result_lookup_index of int
# let lookup_index tprog num_line =
try
for i=0 to (Array.length tprog)-1 do
let num_i = tprog.(i).num
in if num_i=num_line then raise (Result_lookup_index i)
else if num_i>num_line then raise (Result_lookup_index (-1))
done ;
(-1 )
with Result_lookup_index i -> i ;;
val lookup_index : line array -> int -> int = <fun>

# let assemble prog =
let tprog = Array.of_list prog in
for i=0 to (Array.length tprog)-1 do
match tprog.(i).cmd with
Goto n -> let index = lookup_index tprog n
in tprog.(i) <- { tprog.(i) with cmd = Goto index }
| If(c,n) -> let index = lookup_index tprog n
in tprog.(i) <- { tprog.(i) with cmd = If (c,index) }
| _ -> ()
done ;
tprog ;;
val assemble : line list -> line array = <fun>


Expression evaluation
The evaluation function does a depth-first traversal on the abstract syntax tree, and executes the operations indicated at each node.

The RunError exception is raised in case of type inconsistency, division by zero, or an undeclared variable.

# let rec eval_exp n envt expr = match expr with
ExpInt p -> Vint p
| ExpVar v -> ( try List.assoc v envt with Not_found -> runerr n )
| ExpUnr (UMINUS,e) ->
( match eval_exp n envt e with
Vint p -> Vint (-p)
| _ -> runerr n )
| ExpUnr (NOT,e) ->
( match eval_exp n envt e with
Vbool p -> Vbool (not p)
| _ -> runerr n )
| ExpStr s -> Vstr s
| ExpBin (e1,op,e2)
-> match eval_exp n envt e1 , op , eval_exp n envt e2 with
Vint v1 , PLUS , Vint v2 -> Vint (v1 + v2)
| Vint v1 , MINUS , Vint v2 -> Vint (v1 - v2)
| Vint v1 , MULT , Vint v2 -> Vint (v1 * v2)
| Vint v1 , DIV , Vint v2 when v2<>0 -> Vint (v1 / v2)
| Vint v1 , MOD , Vint v2 when v2<>0 -> Vint (v1 mod v2)

| Vint v1 , EQUAL , Vint v2 -> Vbool (v1 = v2)
| Vint v1 , DIFF , Vint v2 -> Vbool (v1 <> v2)
| Vint v1 , LESS , Vint v2 -> Vbool (v1 < v2)
| Vint v1 , GREAT , Vint v2 -> Vbool (v1 > v2)
| Vint v1 , LESSEQ , Vint v2 -> Vbool (v1 <= v2)
| Vint v1 , GREATEQ , Vint v2 -> Vbool (v1 >= v2)

| Vbool v1 , AND , Vbool v2 -> Vbool (v1 && v2)
| Vbool v1 , OR , Vbool v2 -> Vbool (v1 || v2)

| Vstr v1 , PLUS , Vstr v2 -> Vstr (v1 ^ v2)
| _ , _ , _ -> runerr n ;;
val eval_exp : int -> (string * value) list -> expression -> value = <fun>


Command evaluation
To evaluate a command, we need a few additional functions.

We add an association to an environment by removing a previous association for the same variable name if there is one:

# let rec add v e env = match env with
[] -> [v,e]
| (w,f)::l -> if w=v then (v,e)::l else (w,f)::(add v e l) ;;
val add : 'a -> 'b -> ('a * 'b) list -> ('a * 'b) list = <fun>


A function that prints the value of an integer or string is useful for evaluation of the PRINT command.

# let print_value v = match v with
Vint n -> print_int n
| Vbool true -> print_string "true"
| Vbool false -> print_string "false"
| Vstr s -> print_string s ;;
val print_value : value -> unit = <fun>


The execution of a command corresponds to a transition from one state to another. More precisely, the environment is modified if the command is an assignment. Furthermore, the next line to execute is always modified. As a convention, if the next line to execute does not exist, we set its value to -1

# let next_line state =
let n = state.line+1 in
if n < Array.length state.xprog then n else -1 ;;
val next_line : state_exec -> int = <fun>
# let eval_cmd state =
match state.xprog.(state.line).cmd with
Rem _ -> { state with line = next_line state }
| Print e -> print_value (eval_exp state.line state.xenv e) ;
print_newline () ;
{ state with line = next_line state }
| Let(v,e) -> let ev = eval_exp state.line state.xenv e
in { state with line = next_line state ;
xenv = add v ev state.xenv }
| Goto n -> { state with line = n }
| Input v -> let x = try read_int ()
with Failure "int_of_string" -> 0
in { state with line = next_line state;
xenv = add v (Vint x) state.xenv }
| If (t,n) -> match eval_exp state.line state.xenv t with
Vbool true -> { state with line = n }
| Vbool false -> { state with line = next_line state }
| _ -> runerr state.line ;;
val eval_cmd : state_exec -> state_exec = <fun>


On each call of the transition function eval_cmd, we look up the current line, run it, then set the number of the next line to run as the current line. If the last line of the program is reached, the current line is given the value -1. This will tell us when to stop.

Program evaluation
We recursively apply the transition function until we reach a state where the current line number is -1.

# let rec run state =
if state.line = -1 then state else run (eval_cmd state) ;;
val run : state_exec -> state_exec = <fun>


Finishing touches

The only thing left to do is to write a small editor and to plug together all the functions we wrote in the previous sections.

The insert function adds a new line in the program at the requested place.

# let rec insert line p = match p with
[] -> [line]
| l::prog ->
if l.num < line.num then l::(insert line prog)
else if l.num=line.num then line::prog
else line::l::prog ;;
val insert : line -> line list -> line list = <fun>


The print_prog function prints the source code of a program.

# let print_prog prog =
let print_line x = print_string (pp_line x) ; print_newline () in
print_newline () ;
List.iter print_line prog ;
print_newline () ;;
val print_prog : line list -> unit = <fun>


The one_command function processes the insertion of a line or the execution of a command. It modifies the state of the toplevel loop, which consists of a program and an environment. This state, represented by the loop_state type, is different from the evaluation state.

# type loop_state = { prog:program; env:environment } ;;
# exception End ;;

# let one_command state =
print_string "> " ; flush stdout ;
try
match parse (input_line stdin) with
Line l -> { state with prog = insert l state.prog }
| List -> (print_prog state.prog ; state )
| Run
-> let tprog = assemble state.prog in
let xstate = run { line = 0; xprog = tprog; xenv = state.env } in
{state with env = xstate.xenv }
| PEnd -> raise End
with
LexerError -> print_string "Illegal character\n"; state
| ParseError -> print_string "syntax error\n"; state
| RunError n ->
print_string "runtime error at line ";
print_int n ;
print_string "\n";
state ;;
val one_command : loop_state -> loop_state = <fun>


The main function is the go function, which starts the toplevel loop of our Basic.

# let go () =
try
print_string "Mini-BASIC version 0.1\n\n";
let rec loop state = loop (one_command state) in
loop { prog = []; env = [] }
with End -> print_string "See you later...\n";;
val go : unit -> unit = <fun>
The loop is implemented by the local function loop. It stops when the End exception is raised by the one_command function.

Example: C+/C-

We return to the example of the C+/C- game described in chapter 3, page ??. Here is the Basic program corresponding to that Objective CAML program:

10 PRINT "Give the hidden number: "
20 INPUT N
30 PRINT "Give a number: "
40 INPUT R
50 IF R = N THEN 110
60 IF R < N THEN 90
70 PRINT "C-"
80 GOTO 30
90 PRINT "C+"
100 GOTO 30
110 PRINT "CONGRATULATIONS"
And here is a sample run of this program.

> RUN
Give the hidden number:
64
Give a number:
88
C-
Give a number:
44
C+
Give a number:
64
CONGRATULATIONS

Further work

The Basic we implemented is minimalist. If you want to go further, the following exercises hint at some possible extensions.

  1. Floating-point numbers: as is, our language only deals with integers, strings and booleans. Add floats, as well as the corresponding arithmetic operations in the language grammar. We need to modify not only parsing, but also evaluation, taking into account the implicit conversions between integers and floats.

  2. Arrays: Add to the syntax the command DIM var[x] that declares an array var of size x, and the expression var[i] that references the ith element of the array var.

  3. Toplevel directives: Add the toplevel directives SAVE "file_name" and LOAD "file_name" that save a Basic program to the hard disk, and load a Basic program from the hard disk, respectively.

  4. Sub-program: Add sub-programs. The GOSUB line number command calls a sub-program by branching to the given line number while storing the line from where the call is made. The RETURN command resumes execution at the line following the last GOSUB call executed, if there is one, or exits the program otherwise. Adding sub-programs requires evaluation to manage not only the environement but also a stack containing the return addresses of the current GOSUB calls. The GOSUB command adds the possibility of defining recursive sub-programs.

Previous Contents Next