Standard Library

Standard Library

The standard library contains a group of stable modules. These are operating system independent. There are currently 29 modules in the standard library containing 400 functions, 30 types of which half are abstract, 8 exceptions, 10 sub-modules, and 3 parameterized modules. Clearly we will not describe all of the declarations in all of these modules. Indeed, the reference manual [LRVD99] already does that quite well. Only those modules presenting a new concept or a real difficulty in use will be detailed.

The standard library can be divided into four distinct parts:

linear data structures (15 modules), some of which have already appeared in the first part;
input-output (4 modules), for the formatting of output, the persistence and creation of cryptographic keys;
parsing and lexical analysis (4 modules). They are described in chapter 11 (page ??);
system interface that permit communication and examination of parameters passed to a command, directory navigation and file access.

To these four groups we add a fifth containing some utilities for handling or creating structures such as functions for text processing or generating pseudo-random numbers, etc.

Utilities

The modules that we have named "utilities" concern:

characters: the Char module primarily contains conversion functions;
object cloning: OO will be presented in chapter 15 (page ??), on object oriented programming
lazy evaluation: Lazy is first presented on page ??;
random number generator: Random will be described below.

Generation of Random Numbers

The Random module is a pseudo-random number generator. It establishes a random number generation function starting with a number or a list of numbers called a seed. In order to ensure that the function does not always return the same list of numbers, the programmer must give it a different seed each time the generator is initialized.

From this seed the function generates a succession of seemingly random numbers. Nevertheless, an initialization with the same seed will create the same list. To correctly initialize the generator, you need to find some outside resource, like the date represented in milliseconds, or the length of time since the start of the program.

The functions of the module:

initialization: init of type int -> unit and full_init of type int array -> unit initialize the generator. The second function takes an array of seeds.
generate random numbers: bits of type unit -> int returns a positive integer, int of type int -> int returns a positive integer ranging from 0 to a limit given as a parameter, and float returns a float between 0. and a limit given as a parameter.

Linear Data Structures

The modules for linear data structures are:

simple modules: Array, String, List, Sort, Stack, Queue, Buffer, Hashtbl (that is also parameterized) and Weak;
parameterized modules: Hashtbl (of HashedType parameters), Map and Set (of OrderedType parameters).

The parameterized modules are built from the other modules, thus making them more generic. The construction of parameterized modules will be presented in chapter 14, page ??.

Simple Linear Data Structures

The name of the module describes the type of data structures manipulated by the module. If the type is abstract, that is to say, if the representation is hidden, the current convention is to name it t inside the module. These modules establish the following structures:

module Array: vectors;
module List: lists;
module String: character strings;
module Hashtbl: hash tables (abstract type);
module Buffer: extensible character strings (abstract type);
module Stack: stacks (abstract type);
module Queue: queues or FIFO (abstract type);
module Weak: vector of weak pointers (abstract type).

Let us mention one last module that implements linear data structures:

module Sort: sorting on lists and vectors, merging of lists.

Family of common functions

Each of these modules (with the exception of Sort), has functions for defining structures, creating/accessing elements (such as handler functions), and converting to other types. Only the List module is not physically modifiable. We will not give a complete description of all these functions. Instead, we will focus on families of functions that one finds in these modules. Then we will detail the List and Array modules that are the most commonly used structures in functional and imperative programming.

One finds more or less the following functionality in all these modules:

a length function that takes the value of a type and calculates an integer corresponding to its length;
a clear function that empties the linear structure, if it is modifiable;
a function to add an element, add in general, but sometimes named differently according to common practice, (for example, push for stacks);
a function to access the n-th element, often called get;
a function to remove an element (often the first) remove or take.

In the same way, in several modules the names of functions for traversal and processing are the same:

map: applies a function on all the elements of the structure and returns a new structure containing the results of these calls;
iter: like map, but drops successive results, and returns ().

For the structures with indexed elements we have:

fill: replaces (modifies in place) a part of the structure with a value;
blit: copies a part of one structure into another structure of the same type;
sub: copies a part of one structure into a newly created structure.

Modules `List` and `Array`

We describe the functions of the two libraries while placing an emphasis on the similarities and the particularities of each one. For the functions common to both modules, t designates either the 'a list or 'a array type. When a function belongs to one module, we will use the dot notation.

Common or analogous functionality

The first of them is the calculation of length.

List.length : 'a t -> int

Two functions permitting the concatenation of two structures or all the structures of a list.

`List.append`	:	'a t -> 'a t -> 'a t
`List.concat`	:	'a t list -> 'a t

Both modules have a function to access an element designated by its position in the structure.

`List.nth`	:	'a list -> int -> 'a
`Array.get`	:	'a array -> int -> 'a

The function to access an element at index i of a vector t, which is frequently used, has a syntactic shorthand: t.(i).

Two functions allow you to apply an operation to all the elements of a structure.

`iter`	:	('a -> unit) -> 'a t -> unit
`map`	:	('a -> 'b) -> 'a t -> 'b t

You can use iter to print the contents of a list or a vector.


# let print_content iter print_item xs = 
   iter (fun x -> print_string"("; print_item x; print_string")") xs;
   print_newline() ;;
val print_content : (('a -> unit) -> 'b -> 'c) -> ('a -> 'd) -> 'b -> unit =
  <fun>
# print_content List.iter print_int [1;2;3;4;5] ;;
(1)(2)(3)(4)(5)
- : unit = ()
# print_content Array.iter print_int [|1;2;3;4;5|] ;;
(1)(2)(3)(4)(5)
- : unit = ()

The map function builds a new structure containing the result of the application. For example, with vectors whose contents are modifiable:


# let a = [|1;2;3;4|] ;;
val a : int array = [|1; 2; 3; 4|]
# let b = Array.map succ a ;;
val b : int array = [|2; 3; 4; 5|]
# a, b;;
- : int array * int array = [|1; 2; 3; 4|], [|2; 3; 4; 5|]

Two iterators can be used to compose successive applications of a function on all elements of a structure.

`fold_left`	:	('a -> 'b -> 'a) -> 'a -> 'b t -> 'a
`fold_right`	:	('a -> 'b -> 'b) -> 'a t -> 'b -> 'b

You have to give these iterators a base case that supplies a default value when the structure is empty.

fold_left f r `[`v1; v2; `...`; vn`]`	=	f `...` `(` f `(`f r v1`)` v2 `)` `...` vn
fold_right f `[`v1; v2; `...`; vn`]` r	=	f v1 `(` f v2 `...` `(`f vn r`)` `...` `)`

These functions allow you to easily transform binary operations into n-ary operations. When the operation is commutative and associative, left and right iteration are indistinguishable:


# List.fold_left (+) 0 [1;2;3;4] ;;
- : int = 10
# List.fold_right (+) [1;2;3;4] 0 ;;
- : int = 10
# List.fold_left List.append [0] [[1];[2];[3];[4]] ;;
- : int list = [0; 1; 2; 3; 4]
# List.fold_right List.append [[1];[2];[3];[4]] [0] ;;
- : int list = [1; 2; 3; 4; 0]

Notice that, for binary concatenation, an empty list is a neutral element to the left and to the right. We find thus, in this specific case, the equivalence of the two expressions:


# List.fold_left List.append [] [[1];[2];[3];[4]] ;;
- : int list = [1; 2; 3; 4]
# List.fold_right List.append [[1];[2];[3];[4]] [] ;;
- : int list = [1; 2; 3; 4]

We have, in fact, found the List.concat function.

Operations specific to lists.

It is useful to have the following list functions that are provided by the List module:

`List.hd`	:	'a list -> 'a
		first element of the list
`List.tl`	:	'a list -> 'a
		the list, without its first element
`List.rev`	:	'a list -> 'a list
		reversal of a list
`List.mem`	:	'a -> 'a list -> bool
		membership test
`List.flatten`	:	'a list list -> 'a list
		flattens a list of lists
`List.rev_append`	:	'a list -> 'a list -> 'a list
		is the same as append (rev l1) l2

The first two functions are partial. They are not defined on the empty list and raise a Failure exception. There is a variant of mem: memq that uses physical equality.


# let c = (1,2) ;;
val c : int * int = 1, 2
# let l = [c] ;;
val l : (int * int) list = [1, 2]
# List.memq (1,2) l ;;
- : bool = false
# List.memq c l ;;
- : bool = true

The List module provides two iterators that generalize boolean conjunction and disjunction (and / or): List.for_all and List.exists that are defined by iteration:


# let for_all f xs = List.fold_right (fun x -> fun b -> (f x) & b) xs true ;;
val for_all : ('a -> bool) -> 'a list -> bool = <fun>
# let exists f xs = List.fold_right (fun x -> fun b -> (f x) or b) xs false ;;
val exists : ('a -> bool) -> 'a list -> bool = <fun>

There are variants of the iterators in the List module that take two lists as arguments and traverse them in parallel (iter2, map2, etc.). If they are not the same size, the Invalid_argument exception is raised.

The elements of a list can be searched using the criteria provided by the following boolean functions:

`List.find`	:	('a -> bool) -> 'a list -> 'a
`List.find_all`	:	('a -> bool) -> 'a list -> 'a list

The find_all function has an alias: filter.

A variant of the general search function is the partitioning of a list:

List.partition : ('a -> bool) -> 'a list -> 'a list * 'a list

The List module has two often necessary utility functions permitting the division and creation of lists of pairs:

`List.split`	:	('a 'b) list -> 'a list * 'b list*
`List.combine`	:	'a list -> 'b list -> ('a 'b) list*

Finally, a structure combining lists and pairs is often used: association lists. They are useful to store values associated to keys. These are lists of pairs such that the first entry is a key and the second is the information associated to the key. One has these data structures to deal with pairs:

`List.assoc`	:	'a -> ('a 'b) list -> 'b*
		extract the information associated to a key
`List.mem_assoc`	:	'a -> ('a 'b) list -> bool*
		test the existence of a key
`List.remove_assoc`	:	'a -> ('a 'b) list -> ('a * 'b) list*
		deletion of an element corresponding to a key

Each of these functions has a variant using physical equality instead of structural equality: List.assq, List.mem_assq and List.remove_assq.

Handlers specific to Vectors.

The vectors that imperative programmers often use are physically modifiable structures. The Array module furnishes a function to change the value of an element:

Array.set : 'a array -> int -> 'a -> unit

Like get, the set function has a syntactic shortcut: t.(i) <- a.

There are three vector allocation functions:

`Array.create`	:	int -> 'a -> 'a array
		creates a vector of a given size whose elements are all initialized with the same value
`Array.make`	:	int -> 'a -> 'a array
		alias for `create`
`Array.init`	:	int -> (int -> 'a) -> 'a array
		creates a vector of a given size whose elements are each initialized with the result of the application of a function to the element's index

Since they are frequently used, the Array module has two functions for the creation of matrices (vectors of vectors):

`Array.create_matrix`	:	int -> int -> 'a -> 'a array array
`Array.make_matrix`	:	int -> int -> 'a -> 'a array array

The set function is generalized as a function modifying the values on an interval described by a starting index and a length:

Array.fill : 'a array -> int -> int -> 'a -> unit

One can copy a whole vector or extract a sub-vector (described by a starting index and a length) to obtain a new structure:

`Array.copy`	:	'a array -> 'a array
`Array.sub`	:	'a array -> int -> int -> 'a array

The copy or extraction can also be done towards another vector:

Array.blit : 'a array -> int -> 'a array -> int -> int -> unit

The first argument is the index into the first vector, the second is the index into the second vector and the third is the number of values copied. The three functions blit, sub and fill raise the Invalid_argument exception.

The privileged use of indices in the vector manipulation functions leads to the definition of two specific iterators:

`Array.iteri`	:	(int -> 'a -> unit) -> 'a array -> unit
`Array.mapi`	:	(int -> 'a -> 'b) -> 'a array -> 'b array

They apply a function whose first argument is the index of the affected element.


#   let f i a = (string_of_int i) ^ ":" ^ (string_of_int a) in
   Array.mapi f  [| 4; 3; 2; 1; 0 |] ;;
- : string array = [|"0:4"; "1:3"; "2:2"; "3:1"; "4:0"|]

Although the Array module does not have a function to modify the contents of all the elements in a vector, this effect can be easily obtained using iteri:


# let iter_and_set f t =
  Array.iteri (fun i -> fun x -> t.(i) <- f x) t ;;
val iter_and_set : ('a -> 'a) -> 'a array -> unit = <fun>
# let v = [|0;1;2;3;4|] ;;
val v : int array = [|0; 1; 2; 3; 4|]
# iter_and_set succ v ;;
- : unit = ()
# v ;;
- : int array = [|1; 2; 3; 4; 5|]

Finally, the Array module provides two list conversion functions:

`Array.of_list`	:	'a list -> 'a array
`Array.to_list`	:	'a array -> 'a list

Input-output

The standard library has four input-output modules:

module Printf: for the formatting of output;
Format: pretty-printing facility to format text within ``pretty-printing boxes''. The pretty-printer breaks lines at specified break hints, and indents lines according to the box structure.
module Marshal: implements a mechanism for persistent values;
module Digest: for creating unique keys.

The description of the Marshal module will be given later in the chapter when we begin to discuss persistent data structures (see page ??).

Module `Printf`

The Printf module formats text using the rules of the printf function in the C language library. The display format is represented as a character string that will be decoded according to the conventions of printf in C, that is to say, by specializing the % character. This character followed by a letter indicates the type of the argument at this position. The following format "(x=%d, y=%d)" indicates that it should put two integers in place of the %d in the output string.

Specification of formats.

A format defines the parameters for a printed string. Those, of basic types: int, float, char and string, will be converted to strings and will replace their occurrence in the printed string. The values 77 and 43 provided to the format "(x=%d, y=%d)" will generate the complete printed string "(x=77, y=43)". The principal letters indicating the type of conversion to carry out are given in figure 8.1.

Type Letter Result

integer d or i signed decimal

u unsigned decimal

x unsigned hexadecimal, lower case form

X same, with upper case letters

character c character

string s string

float f decimal

e or E scientific notation

g or G same

boolean b true or false

special a or t functional parameter

of type (out_channel -> 'a -> unit) -> 'a -> unit

or out_channel -> unit

Figure 8.1: Conversion conventions.

The format also allows one to specify the justification of the conversion, which allows for the alignment of the printed values. One can indicate the size in conversion characters. For this one places between the % character and the type of conversion an integer number as in %10d that indicates a conversion to be padded on the right to ten characters. If the size of the result of the conversion exceeds this limit, the limit will be discarded. A negative number indicates left justification. For conversions of floating point numbers, it is helpful to be able to specify the printed precision. One places a decimal point followed by a number to indicate the number of characters after the decimal point as in %.5f that indicates five characters to the right of the decimal point.

There are two specific format letters: a and t that indicate a functional argument. Typically, a print function defined by the user. This is specific to Objective CAML.

Functions in the module

The types of the five functions in this module are given in figure 8.2.

fprintf : out_channel -> ('a, out_channel, unit) format -> 'a

printf : ('a, out_channel, unit) format -> 'a

eprintf : ('a, out_channel, unit) format -> 'a

sprintf : ('a, unit, string) format -> 'a

bprintf : Buffer.t -> ('a, Buffer.t, string) format -> 'a

Figure 8.2: Printf formatting functions.

The fprintf function takes a channel, a format and arguments of types described in the format. The printf and eprintf functions are specializations on standard output and standard error. Finally, sprintf and bprintf do not print the result of the conversion, but instead return the corresponding string.

Here are some simple examples of the utilization of formats.


# Printf.printf "(x=%d, y=%d)" 34 78 ;;
(x=34, y=78)- : unit = ()
# Printf.printf "name = %s, age = %d" "Patricia" 18 ;;
name = Patricia, age = 18- : unit = ()
# let s = Printf.sprintf "%10.5f\n%10.5f\n" (-.12.24) (2.30000008) ;;
val s : string = " -12.24000\n   2.30000\n"
# print_string s ;;
-12.24000
   2.30000
- : unit = ()

The following example builds a print function from a matrix of floats using a given format.


# let print_mat m = 
   Printf.printf "\n" ;
   for i=0 to (Array.length m)-1 do
     for j=0 to (Array.length m.(0))-1 do   
       Printf.printf "%10.3f" m.(i).(j)
     done ;
     Printf.printf "\n"
   done ;;    
val print_mat : float array array -> unit = <fun>
# print_mat (Array.create 4  [| 1.2; -.44.22; 35.2 |]) ;;

     1.200   -44.220    35.200
     1.200   -44.220    35.200
     1.200   -44.220    35.200
     1.200   -44.220    35.200
- : unit = ()

Note on the format type.

The description of a format adopts the syntax of character strings, but it is not a value of type string. The decoding of a format, according to the preceding conventions, builds a value of type format where the 'a parameter is instantiated either with unit if the format does not mention a parameter, or by a functional type corresponding to a function able to receive as many arguments as are mentioned and returning a value of type unit.

One can illustrate this process by partially applying the printf function to a format:


# let p3 = 
   Printf.printf "begin\n%d is val1\n%s is val2\n%f is val3\n" ;;
begin
val p3 : int -> string -> float -> unit = <fun>

One obtains thus a function that takes three arguments. Note that the word begin had already been printed. Another format would have given another type of function:


# let p2 = 
   Printf.printf "begin\n%f is val1\n%s is val2\n";;
begin
val p2 : float -> string -> unit = <fun>

In providing arguments one by one to p3, one progressively obtains the output.


# let p31 = p3 45 ;;
45 is val1
val p31 : string -> float -> unit = <fun>
# let p32 = p31 "hello" ;;
hello is val2
val p32 : float -> unit = <fun>
# let p33 = p32 3.14 ;;
3.140000 is val3
val p33 : unit = ()
# p33 ;;
- : unit = ()

From the last obtained value, nothing is printed: it is the value () of type unit.

One cannot build a format using values of type string:


# let f d =
  Printf.printf (d^d);;
Characters 27-30:
This expression has type string but is here used with type
  ('a, out_channel, unit) format

The compiler cannot know the value of the string passed as an argument. It thus cannot know the type that instantiates the 'a parameter of type format.

On the other hand, strings are physically modifiable values, it would thus be possible to replace, for example, the %d part with another letter, thus dynamically changing the print format. This conflicts with the static generation of the conversion function.

`Digest` Module

A hash function converts a character string of unspecified size into a character string of fixed length, most often smaller. Hashing functions return a fingerprint (digest) of their entry.

Such functions are used for the construction of hash tables, as in the Hashtbl module, permitting one to rapidly test if an element is a member of such a table by directly accessing the fingerprint. For example the function f_mod_n, that generates the modulo n sum of the ASCII codes of the characters in a string, is a hashing function. If one creates an n by n table to arrange the strings, from the fingerprint one obtains direct access. Nevertheless two strings can return the same fingerprint. In the case of collisions, one adds to the hash table an extension to store these elements. If there are too many collisions, then access to the hash table is not very effective. If the fingerprint has a length of n bits, then the probability of collision between two different strings is 1/2ⁿ.

A non-reversible hash function has a very weak probability of collision. It is thus difficult, given a fingerprint, to construct a string with this fingerprint. The preceding function f_mod_n is not, based on the evidence, such a function. One way hash functions permit the authentification of a string, that it is for some text sent over the Internet, a file, etc.

The Digest module uses the MD5 algorithm, short for Message Digest 5. It returns a 128 bit fingerprint. Although the algorithm is public, it is impossible (today) to carry out a reconstruction from a fingerprint. This module defines the Digest.t type as an abbreviation of the string type. The figure 8.3 details the main functions of this module.

string : string -> t

returns the fingerprint of a string

file : string -> t

returns the fingerprint of a file

Figure 8.3: Functions of the Digest module.

We use the string function in the following example on a small string and on a large one built from the first. The fingerprint is always of fixed length.


# let s = "The small cat is dead...";;
val s : string = "The small cat is dead..."
# Digest.string s;;
- : Digest.t = "xr6\127\171(\134=\238`\252F\028\t\210$"

# let r = ref s in
   for i=1 to 100 do r:= s^ !r done;
   Digest.string !r;;
- : Digest.t = "\232\197|C]\137\180{>\224QX\155\131D\225"

The creation of a fingerprint for a program allows one to guarantee the contents and thus avoids the use of a bad version. For example, when code is dynamically loaded (see page ??), a fingerprint is used to select the binary file to load.


# Digest.file "basic.ml" ;;
- : Digest.t = "\179\026\191\137\157Ly|^w7\183\164:\167q"

Persistence

Persistence is the conservation of a value outside the running execution of a program. This is the case when one writes a value in a file. This value is thus accessible to any program that has access to the file. Writing and reading persistent values requires the definition of a format for representing the coding of data. In effect, one must know how to go from a complex structure stored in memory, such as a binary tree, to a linear structure, a list of bytes, stored in a file. This is why the coding of persistent values is called linearization ¹.

Realization and Difficulties of Linearization

The implementation of a mechanism for the linearization of data structures requires choices and presents difficulties that we describe below.

read-write of data structures. Since memory can always be viewed as a vector of words, one value can always correspond to the memory that it occupies, leaving us to preserve the useful part by then compacting the value.
share or copy. Must the linearization of a data structure conserve sharing? Typically a binary tree having two identical children (in the sense of physical equality) can indicate, for the second child, that it has already saved the first. This characteristic influences the size of the saved value and the time taken to do it. On the other hand, in the presence of physically modifiable values, this could change the behavior of this value after a recovery depending on whether or not sharing was conserved.
circular structures. In the case of a circular value, linearization without sharing is likely to loop. It will be necessary to conserve sharing.
functional values. Functional values, or closures, are composed of an environment part and a code part. The code part corresponds to the entry point (address) of the code to execute. What must thus be done with code? It is possible to uniquely store this address, but thus only the same program will find the correct meaning of this address. It is also possible to save the list of machine instructions of this function, but that would require having a mechanism to dynamically load code.
guaranteeing the type when reloading. This is the main difficulty of this mechanism. Static typing guarantees that typed values will not generate type errors at execution time. But this is not true except for values belonging to the program during the course of execution. What type can one give to a value outside the program, that was not seen by the type verifier? Just to verify that the re-read value has the monomorphic type generated by the compiler, the type would have to be transmitted at the moment the value was saved, then the type would have to be checked when the value was loaded. Additionally, a mechanism to manage the versions of types would be needed to be safe in case a type is redeclared in a program.

`Marshal` Module

The linearization mechanism in the Marshal module allows you to choose to keep or discard the sharing of values. It also allows for the use of closures, but in this case, only the pointer to the code is saved.

This module is mainly comprised of functions for linearization via a channel or a string, and functions for recovery via a channel or a string. The linearization functions are parameterizable. The following type declares two possible options:

type external_flag = 
  No_sharing
| Closures;;

The No_sharing constant constructor indicates that the sharing of values is not to be preserved, though the default is to keep sharing. The Closures constructor allows the use of closures while conserving its pointer to the code. Its absence will raise an exception if one tries to store a functional value.

Warning

The Closures constructor is inoperative in interactive mode. It can only be used in command line mode.

The reading and writing functions in this module are gathered in figure 8.4.

to_channel : out_channel -> 'a -> extern_flag list -> unit

to_string : 'a -> extern_flag list -> string

to_buffer : string -> int -> int -> 'a -> extern_flag list -> unit

from_channel : in_channel -> 'a

from_string : string -> int -> 'a

Figure 8.4: Functions of the Marshal module.

The to_channel function takes an output channel, a value, and a list of options and writes the value to the channel. The to_string function produces a string corresponding to the linearized value, whereas to_buffer accomplishes the same task by modifying part of a string passed as an argument. The from_channel function reads a linearized value from a channel and returns it. The from_string variant takes as input a string and the position of the first character to read in the string. Several linearized values can be stored in the same file or in the same string. For a file, they can be read sequentially. For a string, one must specify the right offset from the beginning of the string to decode the desired value.


# let s = Marshal.to_string [1;2;3;4] [] in String.sub s 0 10;;
- : string = "\132\149\166\190\000\000\000\t\000\000"

Warning

Using this module one loses the safety of static typing (see infra, page ??).

Loading a persistent object creates a value of indeterminate type:


# let x = Marshal.from_string (Marshal.to_string [1; 2; 3; 4] []) 0;;
val x : '_a = <poly>

This indetermination is denoted in Objective CAML by the weakly typed variable '_a. You should specify the expected type:


# let l = 
   let s = (Marshal.to_string [1; 2; 3; 4] []) in 
     (Marshal.from_string s 0 : int list) ;;
val l : int list = [1; 2; 3; 4]

We return to this topic on page ??.

Note

The output_value function of the preloaded library corresponds to calling to_channel with an empty list of options. The input_value function in the Pervasives module directly calls the from_channel function. These functions were kept for compatibility with old programs.

Example: Backup Screens

We want to save the bitmap, represented as a matrix of colors, of the whole screen. The save_screen function recovers the bitmap, converts it to a table of colors and saves it in a file whose name is passed as a parameter.


# let save_screen name = 
  let i = Graphics.get_image 0 0 (Graphics.size_x ()) 
                                 (Graphics.size_y ())  in 
   let j = Graphics.dump_image i in 
   let oc = open_out name in 
     output_value oc j;
     close_out oc;;
val save_screen : string -> unit = <fun>

The load_screen function does the reverse operation. It opens the file whose name is passed as a parameter, restores the value stored inside, converts this color matrix into a bitmap, then displays the bitmap.


# let load_screen name = 
   let ic = open_in name in 
   let image = ((input_value ic) : Graphics.color array array) in
     close_in ic;
     Graphics.close_graph();
     Graphics.open_graph (" "^(string_of_int(Array.length image.(0)))
                        ^"x"^(string_of_int(Array.length image)));
    let image2 = Graphics.make_image image in 
      Graphics.draw_image image2 0 0; image2 ;;
val load_screen : string -> Graphics.image = <fun>

Warning

Abstract typed values cannot be made persistent.

It is for this reason that the preceding example does not use the abstract Graphics.image type, but instead uses the concrete color array array type. The abstraction of types is presented in chapter 14.

Sharing

The loss of sharing in a data structure can make the structure completely lose its intended behavior. Let us revisit the example of the symbol generator from page ??. For whatever reason, we want to save the functional values new_s and reset_s, and thereafter use the current value of their common counter. We thus write the following program:

 
# let reset_s,new_s = 
   let c = ref 0 in 
   ( function () -> c := 0 ) ,
   ( function s ->  c:=!c+1; s^(string_of_int !c) ) ;;

# let save = 
   Marshal.to_string (new_s,reset_s) [Marshal.Closures;Marshal.No_sharing] ;;

# let (new_s1,reset_s1) = 
  (Marshal.from_string save 0 : ((string -> string ) * (unit -> unit))) ;;

# (* 1 *)
 Printf.printf "new_s : \%s\n" (new_s "X"); 
 Printf.printf "new_s : \%s\n" (new_s "X");
 (* 2 *)
 Printf.printf "new_s1 : \%s\n" (new_s1 "X"); 
 (* 3 *)
 reset_s1(); 
 Printf.printf "new_s1 (after reset_s1) : \%s\n" (new_s1 "X") ;;
Characters 148-154:
Unbound value new_s1

The first two outputs in (* 1 *) comply with our intent. The output obtained in (* 2 *) after re-reading the closures also appears correct (after X2 comes X3). But, in fact, the sharing of the c counter between the re-read functions new_s1 and reset_s1 is lost, as the output of X4 attests that one of them set the counter to zero. Each closure has a copy of the counter and the call to reset_s1 does not reset the new_s1 counter to zero. Thus we should not have used the No_sharing option during the linearization.

It is generally necessary to conserve sharing. Nevertheless in certain cases where execution speed is important, the absence of sharing speeds up the process of saving. The following example demonstrates a function that copies a matrix. In this case it might be preferable to break the sharing:


# let copy_mat_f (m : float array array) = 
  let s = Marshal.to_string m [Marshal.No_sharing] in 
    (Marshal.from_string s 0 : float array array);;
val copy_mat_f : float array array -> float array array = <fun>

One can also use it to create a matrix without sharing:


# let create_mat_f n m v = 
   let m = Array.create n (Array.create m v) in 
     copy_mat_f m;;
val create_mat_f : int -> int -> float -> float array array = <fun>
# let a = create_mat_f 3 4 3.14;;
val a : float array array =
  [|[|3.14; 3.14; 3.14; 3.14|]; [|3.14; 3.14; 3.14; 3.14|];
    [|3.14; 3.14; 3.14; 3.14|]|]
# a.(1).(2) <- 6.28;;
- : unit = ()
# a;;
- : float array array =
[|[|3.14; 3.14; 3.14; 3.14|]; [|3.14; 3.14; 6.28; 3.14|];
  [|3.14; 3.14; 3.14; 3.14|]|]

Which is a more common behavior than that of Array.create, and resembles that of Array.create_matrix.

Size of Values

It may be useful to know the size of a persistent value. If sharing is conserved, this size also reflects the amount of memory occupied by a value. Although the encoding sometimes optimizes the size of atomic values², knowing the size of their respective encodings permits us to compare different implementations of a data structure. In addition, for programs that will never stop themselves, like embedded systems or even network servers; watching the size of data structures can help detect memory leaks. The Marshal module has two functions to calculate the size of a constant. They are described in figure 8.5.

header_size : int

data_size : string -> int -> int

total_size : string -> int -> int

Figure 8.5: Size functions of Marshal.

The total size of a persistent value is the same as the size of its data structures plus the size of its header.

Below is a small example of the use of MD5 encoding to compare two representations of binary trees:


# let size x = Marshal.data_size (Marshal.to_string x []) 0;;
val size : 'a -> int = <fun>
# type 'a bintree1 = Empty1 | Node1 of 'a * 'a bintree1 * 'a bintree1 ;;
type 'a bintree1 = | Empty1 | Node1 of 'a * 'a bintree1 * 'a bintree1
# let s1 = 
  Node1(2, Node1(1, Node1(0, Empty1, Empty1), Empty1), 
           Node1(3, Empty1, Empty1)) ;;
val s1 : int bintree1 =
  Node1
   (2, Node1 (1, Node1 (0, Empty1, Empty1), Empty1),
    Node1 (3, Empty1, Empty1))
# type 'a bintree2 = 
  Empty2 | Leaf2 of 'a | Node2 of 'a * 'a bintree2 * 'a bintree2 ;;
type 'a bintree2 =
  | Empty2
  | Leaf2 of 'a
  | Node2 of 'a * 'a bintree2 * 'a bintree2
# let s2 =
  Node2(2, Node2(1, Leaf2 0, Empty2), Leaf2 3) ;;
val s2 : int bintree2 = Node2 (2, Node2 (1, Leaf2 0, Empty2), Leaf2 3)
# let s1, s2 = size s1, size s2 ;;
val s1 : int = 13
val s2 : int = 9

The values given by the size function reflect well the intuition that one might have of the size of s1 and s2.

Typing Problem

The real problem with persistent values is that it is possible to break the type system of Objective CAML. The creation functions return a monomorphic type (unit or string). On the other hand unmarshalling functions return a polymorphic type 'a. From the point of view of types, you can do anything with a persistent value. Here is the usage that can be done with it (see chapter 2, page ??): create a function magic_copy of type 'a -> 'b.


# let magic_copy a = 
  let s = Marshal.to_string a [Marshal.Closures] in 
    Marshal.from_string s 0;;
val magic_copy : 'a -> 'b = <fun>

The use of such a function causes a brutal halt in the execution of the program.

#  (magic_copy 3 : float) +. 3.1;;
Segmentation fault

In interactive mode (under Linux), we even leave the toplevel (interactive) loop with a system error signal corresponding to a memory violation.

Interface with the System

The standard library has six system interface modules:

module Sys: for communication between the operating system and the program;
module Arg: to analyze parameters passed to the program from the command line;
module Filename: operations on file names
module Printexc: for the interception and printing of exceptions;
module Gc: to control the mechanism that automatically deallocates memory, described in chapter 9;
module Callback: to call Objective CAML functions from C, described in chapter 12.

The first four modules are described below.

Module `Sys`

This module provides quite useful functions for communication with the operating system, such as handling the signals received by a program. The values in figure 8.6 contain information about the system.

OS_type : string

type of system

interactive : bool ref

true if executing at the toplevel

word_size : string

size of a word (32 or 64 bits)

max_string_length : int

maximum size of a string

max_array_length : int

maximum size of a vector

time : unit -> float

gives the time in seconds since the start of the program

Figure 8.6: Information about the system.

Communication between the program and the system can go through the command line, the value of an environmental variable, or through running another program. These functions are described in figure 8.7.

argv : string array

contains the vector of parameters

getenv : string -> string

retrieves the value of a variable

command : string -> int

executes the command passed as an argument

Figure 8.7: Communication with the system.

The functions of the figure 8.8 allow us to navigate in the file hierarchy.

file_exists : string -> bool

returns true if the file exists

remove : string -> unit

destroys a file

rename : string -> string -> unit

renames a file

chdir : string -> unit

change the current directory

getcwd : unit -> string

returns the name of the current directory

Figure 8.8: File manipulation.

Finally, the management of signals will be described in the chapter on system programming (18).

Here is a small program that revisits the example of saving a graphics window as an array of colors. The main function verifies that it is not started from the interactive loop, then reads from the command line the names of files to display, then tests if they exist, then displays them (with the load_screen function). We wait for a key to be pressed between displaying two images.


# let main () =
   if not (!Sys.interactive) then 
     for i = 0 to Array.length(Sys.argv) -1 do 
       let name = Sys.argv.(i) in 
         if Sys.file_exists name then 
         begin
           ignore(load_screen name);
           ignore(Graphics.read_key)
         end
     done;;
val main : unit -> unit = <fun>

Module `Arg`

The Arg module defines a small syntax for command line arguments. With this module, you can parse arguments and associate actions with them. The various elements of the command line are separated by one or more spaces. They are the values stored in the Sys.argv array. In the syntax provided by Arg, certain elements are distinguished by starting with the minus character (-). These are called command line keywords or switches. One can associate a specific action with a keyword or take as an argument a value of type string, int or float. The value of these arguments is initialized with the value found on the command line just after the keyword. In this case one can call a function that converts character strings into the expected type. The other elements on the command line are called anonymous arguments. One associates an action with them that takes their value as an argument. An undefined option causes the display of some short documentation on the command line. The documentation's contents are defined by the user.

The actions associated with keywords are encapsulated in the type:


type spec =
  | Unit of (unit -> unit)     (* Call the function with unit argument*)
  | Set of bool ref            (* Set the reference to true*)
  | Clear of bool ref          (* Set the reference to false*)
  | String of (string -> unit) (* Call the function with a string 
                                argument *)
  | Int of (int -> unit)       (* Call the function with an int 
                                argument *)
  | Float of (float -> unit)   (* Call the function with a float 
                                argument *)
  | Rest of (string -> unit)   (* Stop interpreting keywords and call the
                                function with each remaining argument*)

The command line parsing function is:


# Arg.parse ;;
- : (string * Arg.spec * string) list -> (string -> unit) -> string -> unit =
<fun>

Its first argument is a list of triples of the form (key, spec, doc) such that:

key is a character string corresponding to the keyword. It starts with the reserved character '_'.
spec is a value of type spec specifying the action associated with key.
doc is a character string describing the option key. It is displayed upon a syntax error.

The second argument is the function to process the anonymous command line arguments. The last argument is a character string displayed at the beginning of the command line documentation.

The Arg module also includes:

Bad: an exception taking as its argument a character string. It can be used by the processing functions.
usage: of type (string * Arg.spec * string) list -> string -> unit, this function displays the command line documentation. One preferably provides it with the same arguments as those of parse.
current: of type int ref that contains a reference to the current value of the index in the Sys.argv array. One can therefore modify this value if necessary.

By way of an example, we show a function read_args that initializes the configuration of the Minesweeper game seen in chapter 6, page ??. The possible options will be -col, -lin and -min. They will be followed by an integer indicating, respectively: the number of columns, the number of lines and the number of mines desired. These values must not be less than the default values, respectively 10, 10 and 15.

The processing functions are:


# let set_nbcols cf n = cf := {!cf with nbcols = n} ;;
# let set_nbrows cf n = cf := {!cf with nbrows = n} ;;
# let set_nbmines cf n = cf := {!cf with nbmines = n} ;;

All three are of type config ref -> int -> unit. The command line parsing function can be written:


# let read_args() =
  let cf = ref default_config in
  let speclist = 
   [("-col", Arg.Int (set_nbcols cf), "number of columns (>=10)");
    ("-lin", Arg.Int (set_nbrows cf), "number of lines (>=10)");
    ("-min", Arg.Int (set_nbmines cf), "number of mines (>=15)")]
  in   
  let usage_msg = "usage : minesweep [-col n] [-lin n] [-min n]" in
   Arg.parse speclist (fun s -> ()) usage_msg; !cf ;;
val read_args : unit -> config = <fun>

This function calculates a configuration that will be passed as arguments to open_wcf, the function that opens the main window when the game is started. Each option is, as its name indicates, optional. If it does not appear on the command line, the corresponding parameter keeps its default value. The order of the options is unimportant.

Module `Filename`

The Filename module has operating system independant functions to manipulate the names of files. In practice, the file and directory naming conventions differ greatly between Windows, Unix and MacOS.

Module `Printexc`

This very short module (three functions described in figure 8.9) provides a general exception handler. This is particularly useful for programs executed in command mode³ to be sure not to allow an exception to escape that would stop the program.

catch : ('a -> 'b) -> 'a -> 'b

general exception handler

print : ('a -> 'b) -> 'a -> 'b

print and re-raise the exception

to_string : exn -> string

convert an exception to a string

Figure 8.9: Handling exceptions.

The catch function applies its first argument to its second. This launches the main function of the program. If an exception arrives at the level of catch, that is to say that if it is not handled inside the program, then catch will print its name and exit the program. The print function has the same behavior as catch but re-raises the exception after printing it. Finally the to_string function converts an exception into a character string. It is used by the two preceding functions. If we look again at the main function for displaying bitmaps, we might thus write an encapsulating function go in the following manner:


# let go () = 
   Printexc.catch main ();;
val go : unit -> unit = <fun>

This permits the normal termination of the program by printing the value of the uncaptured exception.

Type	Letter	Result
integer	`d` or `i`	signed decimal
	`u`	unsigned decimal
	`x`	unsigned hexadecimal, lower case form
	`X`	same, with upper case letters
character	`c`	character
string	`s`	string
float	`f`	decimal
	`e` or `E`	scientific notation
	`g` or `G`	same
boolean	`b`	`true` or `false`
special	`a` or `t`	functional parameter
		of type (out_channel -> 'a -> unit) -> 'a -> unit
		or out_channel -> unit

`fprintf`	:	out_channel -> ('a, out_channel, unit) format -> 'a
`printf`	:	('a, out_channel, unit) format -> 'a
`eprintf`	:	('a, out_channel, unit) format -> 'a
`sprintf`	:	('a, unit, string) format -> 'a
`bprintf`	:	Buffer.t -> ('a, Buffer.t, string) format -> 'a

`string`	:	string -> t
		returns the fingerprint of a string
`file`	:	string -> t
		returns the fingerprint of a file

`to_channel`	:	out_channel -> 'a -> extern_flag list -> unit
`to_string`	:	'a -> extern_flag list -> string
`to_buffer`	:	string -> int -> int -> 'a -> extern_flag list -> unit
`from_channel`	:	in_channel -> 'a
`from_string`	:	string -> int -> 'a

`header_size`	:	int
`data_size`	:	string -> int -> int
`total_size`	:	string -> int -> int

`OS_type`	:	string
		type of system
`interactive`	:	bool ref
		true if executing at the toplevel
`word_size`	:	string
		size of a word (32 or 64 bits)
`max_string_length`	:	int
		maximum size of a string
`max_array_length`	:	int
		maximum size of a vector
`time`	:	unit -> float
		gives the time in seconds since the start of the program

`argv`	:	string array
		contains the vector of parameters
`getenv`	:	string -> string
		retrieves the value of a variable
`command`	:	string -> int
		executes the command passed as an argument

`file_exists`	:	string -> bool
		returns `true` if the file exists
`remove`	:	string -> unit
		destroys a file
`rename`	:	string -> string -> unit
		renames a file
`chdir`	:	string -> unit
		change the current directory
`getcwd`	:	unit -> string
		returns the name of the current directory

`catch`	:	('a -> 'b) -> 'a -> 'b
		general exception handler
`print`	:	('a -> 'b) -> 'a -> 'b
		print and re-raise the exception
`to_string`	:	exn -> string
		convert an exception to a string