Standard Library
The standard library contains a group of stable modules. These are
operating system independent. There are currently 29 modules in the
standard library containing 400 functions, 30 types of which half are
abstract, 8 exceptions, 10 sub-modules, and 3 parameterized modules.
Clearly we will not describe all of the declarations in all of these
modules. Indeed, the reference manual [LRVD99] already does
that quite well. Only those modules presenting a new concept or a
real difficulty in use will be detailed.
The standard library can be divided into four distinct parts:
-
linear data structures (15 modules), some of which have already
appeared in the first part;
- input-output (4 modules), for the formatting of output,
the persistence and creation of cryptographic keys;
- parsing and lexical analysis (4 modules). They are described in
chapter 11 (page
??);
- system interface that permit communication and examination of
parameters passed to a command, directory navigation and file access.
To these four groups we add a fifth containing some utilities
for handling or creating structures such as
functions for text processing or generating pseudo-random numbers, etc.
Utilities
The modules that we have named "utilities" concern:
-
characters: the Char module primarily contains conversion functions;
- object cloning: OO will be presented in chapter
15 (page ??), on object oriented programming
- lazy evaluation: Lazy is first presented on page
??;
- random number generator: Random will be described below.
Generation of Random Numbers
The Random module is a pseudo-random number generator.
It establishes a random number generation function starting with a number
or a list of numbers called a seed. In order to ensure that the
function does not always return the same list of numbers, the programmer
must give it a different seed each time the generator is initialized.
From this seed the function generates a succession of seemingly random
numbers. Nevertheless, an initialization with the same seed will create
the same list. To correctly initialize the generator, you need to find
some outside resource, like the date represented in milliseconds, or
the length of time since the start of the program.
The functions of the module:
-
initialization: init of type int -> unit and
full_init of type int array -> unit
initialize the generator. The second function takes an array of seeds.
- generate random numbers: bits of type
unit -> int returns a positive integer,
int of type int -> int returns a positive integer
ranging from 0 to a limit given as a parameter,
and float returns a float between 0. and a limit given as a parameter.
Linear Data Structures
The modules for linear data structures are:
-
simple modules: Array, String,
List, Sort, Stack, Queue,
Buffer, Hashtbl (that is also parameterized) and
Weak;
- parameterized modules: Hashtbl (of HashedType
parameters), Map and Set (of OrderedType
parameters).
The parameterized modules are built from the other modules, thus
making them more generic. The construction of parameterized modules
will be presented in chapter 14, page
??.
Simple Linear Data Structures
The name of the module describes the type of data structures manipulated
by the module. If the type is abstract, that is to say, if the representation
is hidden, the current convention is to name it t inside the module.
These modules establish the following structures:
-
module Array: vectors;
- module List: lists;
- module String: character strings;
- module Hashtbl: hash tables (abstract type);
- module Buffer: extensible character strings (abstract type);
- module Stack: stacks (abstract type);
- module Queue: queues or FIFO (abstract type);
- module Weak: vector of weak pointers (abstract type).
Let us mention one last module that implements linear data structures:
-
module Sort: sorting on lists and vectors, merging of lists.
Family of common functions
Each of these modules (with the exception of Sort), has functions
for defining structures, creating/accessing elements (such as handler
functions), and converting to other types. Only the List module is
not physically modifiable.
We will not give a complete description of all these functions.
Instead, we will focus on families of functions that one finds
in these modules. Then we will detail the List and
Array modules that are the most commonly used structures in
functional and imperative programming.
One finds more or less the following functionality in all these modules:
-
a length function that takes the value of a type and
calculates an integer corresponding to its length;
- a clear function that empties the linear structure, if
it is modifiable;
- a function to add an element, add in general, but
sometimes named differently according to common practice, (for example,
push for stacks);
- a function to access the n-th element, often called
get;
- a function to remove an element (often the first)
remove or take.
In the same way, in several modules the names of functions for traversal and
processing are the same:
-
map: applies a function on all the elements
of the structure and returns a new structure containing the results
of these calls;
- iter: like map, but drops successive results,
and returns ().
For the structures with indexed elements we have:
-
fill: replaces (modifies in place) a part of the
structure with a value;
- blit: copies a part of one structure into another
structure of the same type;
- sub: copies a part of one structure into a
newly created structure.
Modules List and Array
We describe the functions of the two libraries while placing an emphasis
on the similarities and the particularities of each one.
For the functions common to both modules, t
designates either the 'a list or 'a array type. When a
function belongs to one module, we will use the dot notation.
Common or analogous functionality
The first of them is the calculation of length.
List.length |
: |
'a t -> int |
Two functions permitting the concatenation of two structures or all the
structures of a list.
List.append |
: |
'a t -> 'a t -> 'a t |
List.concat |
: |
'a t list -> 'a t |
Both modules have a function to access an element designated by its
position in the structure.
List.nth |
: |
'a list -> int -> 'a |
Array.get |
: |
'a array -> int -> 'a |
The function to access an element at index i of a
vector t, which is frequently used, has a syntactic shorthand:
t.(i).
Two functions allow you to apply an operation to all the elements
of a structure.
iter |
: |
('a -> unit) -> 'a t -> unit |
map |
: |
('a -> 'b) -> 'a t -> 'b t |
You can use iter to print the contents of a list or a vector.
# let
print_content
iter
print_item
xs
=
iter
(fun
x
->
print_string"("
;
print_item
x;
print_string")"
)
xs;
print_newline()
;;
val print_content : (('a -> unit) -> 'b -> 'c) -> ('a -> 'd) -> 'b -> unit =
<fun>
# print_content
List.iter
print_int
[
1
;2
;3
;4
;5
]
;;
(1)(2)(3)(4)(5)
- : unit = ()
# print_content
Array.iter
print_int
[|
1
;2
;3
;4
;5
|]
;;
(1)(2)(3)(4)(5)
- : unit = ()
The map function builds a new structure containing the
result of the application. For example, with vectors whose
contents are modifiable:
# let
a
=
[|
1
;2
;3
;4
|]
;;
val a : int array = [|1; 2; 3; 4|]
# let
b
=
Array.map
succ
a
;;
val b : int array = [|2; 3; 4; 5|]
# a,
b;;
- : int array * int array = [|1; 2; 3; 4|], [|2; 3; 4; 5|]
Two iterators can be used to compose successive applications of a function
on all elements of a structure.
fold_left |
: |
('a -> 'b -> 'a) -> 'a -> 'b t -> 'a |
fold_right |
: |
('a -> 'b -> 'b) -> 'a t -> 'b -> 'b |
You have to give these iterators a base case that supplies a default value
when the structure is empty.
fold_left f r [ v1; v2; ... ; vn] |
= |
f ... ( f (f r v1) v2 ) ... vn |
fold_right f [ v1; v2; ... ; vn] r |
= |
f v1 ( f v2 ... (f vn r) ... ) |
These functions allow you to easily transform binary operations into
n-ary operations. When the operation is commutative and associative,
left and right iteration are indistinguishable:
# List.fold_left
(+
)
0
[
1
;2
;3
;4
]
;;
- : int = 10
# List.fold_right
(+
)
[
1
;2
;3
;4
]
0
;;
- : int = 10
# List.fold_left
List.append
[
0
]
[[
1
]
;[
2
]
;[
3
]
;[
4
]]
;;
- : int list = [0; 1; 2; 3; 4]
# List.fold_right
List.append
[[
1
]
;[
2
]
;[
3
]
;[
4
]]
[
0
]
;;
- : int list = [1; 2; 3; 4; 0]
Notice that, for binary concatenation, an empty list is a neutral element
to the left and to the right. We find thus, in this specific case, the
equivalence of the two expressions:
# List.fold_left
List.append
[]
[[
1
]
;[
2
]
;[
3
]
;[
4
]]
;;
- : int list = [1; 2; 3; 4]
# List.fold_right
List.append
[[
1
]
;[
2
]
;[
3
]
;[
4
]]
[]
;;
- : int list = [1; 2; 3; 4]
We have, in fact, found the List.concat function.
Operations specific to lists.
It is useful to have the following list functions that are provided by
the List module:
List.hd |
: |
'a list -> 'a |
|
|
first element of the list |
List.tl |
: |
'a list -> 'a |
|
|
the list, without its first element |
List.rev |
: |
'a list -> 'a list |
|
|
reversal of a list |
List.mem |
: |
'a -> 'a list -> bool |
|
|
membership test |
List.flatten |
: |
'a list list -> 'a list |
|
|
flattens a list of lists |
List.rev_append |
: |
'a list -> 'a list -> 'a list |
|
|
is the same as append (rev l1) l2 |
The first two functions are partial. They are not defined on the empty
list and raise a Failure exception. There is a variant of
mem: memq that uses physical equality.
# let
c
=
(1
,
2
)
;;
val c : int * int = 1, 2
# let
l
=
[
c]
;;
val l : (int * int) list = [1, 2]
# List.memq
(1
,
2
)
l
;;
- : bool = false
# List.memq
c
l
;;
- : bool = true
The List module provides two iterators that generalize
boolean conjunction and disjunction (and / or):
List.for_all and List.exists that are defined by
iteration:
# let
for_all
f
xs
=
List.fold_right
(fun
x
->
fun
b
->
(f
x)
&
b)
xs
true
;;
val for_all : ('a -> bool) -> 'a list -> bool = <fun>
# let
exists
f
xs
=
List.fold_right
(fun
x
->
fun
b
->
(f
x)
or
b)
xs
false
;;
val exists : ('a -> bool) -> 'a list -> bool = <fun>
There are variants of the iterators in the List module that take
two lists as arguments and traverse them in parallel (iter2,
map2, etc.). If they are not the same size, the
Invalid_argument exception is raised.
The elements of a list can be searched using the criteria provided by
the following boolean functions:
List.find |
: |
('a -> bool) -> 'a list -> 'a |
List.find_all |
: |
('a -> bool) -> 'a list -> 'a
list |
The find_all function has an alias: filter.
A variant of the general search function is the partitioning of a list:
List.partition |
: |
('a -> bool) -> 'a list -> 'a list * 'a list |
The List module has two often necessary utility functions permitting
the division and creation of lists of pairs:
List.split |
: |
('a * 'b) list -> 'a list * 'b list |
List.combine |
: |
'a list -> 'b list -> ('a * 'b) list |
Finally, a structure combining lists and pairs is often used:
association lists. They are useful to store values associated to keys.
These are lists of pairs such that the first entry is a key and the second
is the information associated to the key. One has these data structures to
deal with pairs:
List.assoc |
: |
'a -> ('a * 'b) list -> 'b |
|
|
extract the information associated to a key |
List.mem_assoc |
: |
'a -> ('a * 'b) list -> bool |
|
|
test the existence of a key |
List.remove_assoc |
: |
'a -> ('a * 'b) list -> ('a
* 'b) list |
|
|
deletion of an element corresponding to a key |
Each of these functions has a variant using physical equality
instead of structural equality: List.assq,
List.mem_assq and List.remove_assq.
Handlers specific to Vectors.
The vectors that imperative programmers often use are physically modifiable
structures. The Array module furnishes a function to change the
value of an element:
Array.set |
: |
'a array -> int -> 'a -> unit |
Like get, the set function has a syntactic shortcut:
t.(i) <- a.
There are three vector allocation functions:
Array.create |
: |
int -> 'a -> 'a array |
|
|
creates a vector of a given size whose
elements are all initialized with the same value |
Array.make |
: |
int -> 'a -> 'a array |
|
|
alias for create |
Array.init |
: |
int -> (int -> 'a) -> 'a array |
|
|
creates a vector of a given size whose
elements are each initialized with the result of the application of
a function to the element's index |
Since they are frequently used, the Array module has two functions
for the creation of matrices (vectors of vectors):
Array.create_matrix |
: |
int -> int -> 'a -> 'a array array |
Array.make_matrix |
: |
int -> int -> 'a -> 'a array
array |
The set function is generalized as a function modifying the values
on an interval described by a starting index and a length:
Array.fill |
: |
'a array -> int -> int -> 'a -> unit |
One can copy a whole vector or extract a sub-vector (described by a starting
index and a length) to obtain a new structure:
Array.copy |
: |
'a array -> 'a array |
Array.sub |
: |
'a array -> int -> int -> 'a array |
The copy or extraction can also be done towards another vector:
Array.blit |
: |
'a array -> int -> 'a array -> int -> int -> unit |
The first argument is the index into the first vector, the second is
the index into the second vector and the third is the number of values copied.
The three functions
blit, sub and fill raise the
Invalid_argument exception.
The privileged use of indices in the vector manipulation functions
leads to the definition of two specific iterators:
Array.iteri |
: |
(int -> 'a -> unit) -> 'a array -> unit |
Array.mapi |
: |
(int -> 'a -> 'b) -> 'a array -> 'b array |
They apply a function whose first argument is the index of the
affected element.
#
let
f
i
a
=
(string_of_int
i)
^
":"
^
(string_of_int
a)
in
Array.mapi
f
[|
4
;
3
;
2
;
1
;
0
|]
;;
- : string array = [|"0:4"; "1:3"; "2:2"; "3:1"; "4:0"|]
Although the Array module does not have a function to modify the
contents of all the elements in a vector, this effect can be easily obtained
using iteri:
# let
iter_and_set
f
t
=
Array.iteri
(fun
i
->
fun
x
->
t.
(i)
<-
f
x)
t
;;
val iter_and_set : ('a -> 'a) -> 'a array -> unit = <fun>
# let
v
=
[|
0
;1
;2
;3
;4
|]
;;
val v : int array = [|0; 1; 2; 3; 4|]
# iter_and_set
succ
v
;;
- : unit = ()
# v
;;
- : int array = [|1; 2; 3; 4; 5|]
Finally, the Array module provides two list conversion functions:
Array.of_list |
: |
'a list -> 'a array |
Array.to_list |
: |
'a array -> 'a list |
Input-output
The standard library has four input-output modules:
-
module Printf: for the formatting of output;
- Format: pretty-printing facility to format text within
``pretty-printing boxes''. The pretty-printer breaks lines at specified
break hints, and indents lines according to the box structure.
- module Marshal: implements a mechanism for persistent values;
- module Digest: for creating unique keys.
The description of the Marshal module will be given later in the
chapter when we begin to discuss persistent data structures (see page
??).
Module Printf
The Printf module formats text using the rules of the
printf function in the C language library. The display format
is represented as a character string that will be decoded
according to the conventions of printf in C, that is to say,
by specializing the % character. This character followed by
a letter indicates the type of the argument at this position. The following
format "(x=%d, y=%d)" indicates that it should put two integers
in place of the %d in the output string.
Specification of formats.
A format defines the parameters for a printed string.
Those, of basic types: int, float,
char and string, will be converted to strings and will
replace their occurrence in the printed string.
The values 77 and 43 provided to the format
"(x=%d, y=%d)" will generate the complete printed string
"(x=77, y=43)"
. The principal letters indicating the type of
conversion to carry out are given in figure 8.1.
Type |
Letter |
Result |
integer |
d or i |
signed decimal |
|
u |
unsigned decimal |
|
x |
unsigned hexadecimal, lower case form |
|
X |
same, with upper case letters |
character |
c |
character |
string |
s |
string |
float |
f |
decimal |
|
e or E |
scientific notation |
|
g or G |
same |
boolean |
b |
true or false |
special |
a or t |
functional parameter |
|
|
of type (out_channel -> 'a -> unit) -> 'a -> unit |
|
|
or out_channel -> unit |
Figure 8.1:
Conversion conventions.
The format also allows one to specify the justification of the conversion,
which allows for the alignment of the printed values.
One can indicate the size in conversion characters. For this one
places between the % character and the type of conversion an
integer number as in %10d that indicates a conversion
to be padded on the right to ten characters. If the size of the result of the
conversion exceeds this limit, the limit will be discarded. A negative number
indicates left justification. For conversions of floating point
numbers, it is helpful to be able to specify the printed precision.
One places a decimal point followed by a number to indicate
the number of characters after the decimal point as in
%.5f that indicates five characters to the right of the decimal
point.
There are two specific format letters: a and
t that indicate a functional argument. Typically, a print function
defined by the user. This is specific to Objective CAML.
Functions in the module
The types of the five functions in this module are given in figure
8.2.
fprintf |
: |
out_channel -> ('a, out_channel, unit) format -> 'a |
printf |
: |
('a, out_channel, unit) format -> 'a |
eprintf |
: |
('a, out_channel, unit) format -> 'a |
sprintf |
: |
('a, unit, string) format -> 'a |
bprintf |
: |
Buffer.t -> ('a, Buffer.t, string) format -> 'a |
Figure 8.2:
Printf formatting functions.
The fprintf function takes a channel, a format and arguments of
types described in the format. The printf and eprintf
functions are specializations on standard output and standard error.
Finally, sprintf and bprintf do not print the result of
the conversion, but instead return the corresponding string.
Here are some simple examples of the utilization of formats.
# Printf.printf
"(x=%d, y=%d)"
3
4
7
8
;;
(x=34, y=78)- : unit = ()
# Printf.printf
"name = %s, age = %d"
"Patricia"
1
8
;;
name = Patricia, age = 18- : unit = ()
# let
s
=
Printf.sprintf
"%10.5f\n%10.5f\n"
(-.
1
2
.
2
4
)
(2
.
3
0
0
0
0
0
0
8
)
;;
val s : string = " -12.24000\n 2.30000\n"
# print_string
s
;;
-12.24000
2.30000
- : unit = ()
The following example builds a print function from a matrix of floats
using a given format.
# let
print_mat
m
=
Printf.printf
"\n"
;
for
i=
0
to
(Array.length
m)-
1
do
for
j=
0
to
(Array.length
m.
(0
))-
1
do
Printf.printf
"%10.3f"
m.
(i).
(j)
done
;
Printf.printf
"\n"
done
;;
val print_mat : float array array -> unit = <fun>
# print_mat
(Array.create
4
[|
1
.
2
;
-.
4
4
.
2
2
;
3
5
.
2
|]
)
;;
1.200 -44.220 35.200
1.200 -44.220 35.200
1.200 -44.220 35.200
1.200 -44.220 35.200
- : unit = ()
Note on the format type.
The description of a format adopts the syntax of character strings,
but it is not a value of type string. The decoding of a format,
according to the preceding conventions, builds a value of type
format where the 'a parameter is instantiated
either with unit if the format does not mention a parameter,
or by a functional type corresponding to a function able to receive
as many arguments as are mentioned and returning a value of type
unit.
One can illustrate this process by partially applying the
printf function to a format:
# let
p3
=
Printf.printf
"begin\n%d is val1\n%s is val2\n%f is val3\n"
;;
begin
val p3 : int -> string -> float -> unit = <fun>
One obtains thus a function that takes three arguments. Note that the
word begin had already been printed. Another format would have given
another type of function:
# let
p2
=
Printf.printf
"begin\n%f is val1\n%s is val2\n"
;;
begin
val p2 : float -> string -> unit = <fun>
In providing arguments one by one to p3, one
progressively obtains the output.
# let
p31
=
p3
4
5
;;
45 is val1
val p31 : string -> float -> unit = <fun>
# let
p32
=
p31
"hello"
;;
hello is val2
val p32 : float -> unit = <fun>
# let
p33
=
p32
3
.
1
4
;;
3.140000 is val3
val p33 : unit = ()
# p33
;;
- : unit = ()
From the last obtained value, nothing is printed: it is the value
() of type unit.
One cannot build a format using values of type string:
# let
f
d
=
Printf.printf
(d^
d);;
Characters 27-30:
This expression has type string but is here used with type
('a, out_channel, unit) format
The compiler cannot know the value of the string passed as an argument.
It thus cannot know the type that instantiates the 'a parameter
of type format.
On the other hand, strings are physically modifiable values,
it would thus be possible to replace, for example, the %d part
with another letter, thus dynamically changing the print format.
This conflicts with the static generation of the conversion function.
Digest Module
A hash function converts a character string of unspecified size
into a character string of fixed length, most often smaller.
Hashing functions return a fingerprint (digest)
of their entry.
Such functions are used for the construction of hash tables, as in the
Hashtbl module, permitting one to rapidly test if an element
is a member of such a table by directly accessing the fingerprint.
For example the function f_mod_n,
that generates the modulo n sum of the ASCII codes of the
characters in a string, is a hashing function. If one creates an
n by n table to arrange the strings, from the fingerprint one obtains
direct access. Nevertheless two strings can return the same fingerprint.
In the case of collisions, one adds to the hash table an extension
to store these elements. If there are too many collisions,
then access to the hash table is not very effective. If the fingerprint
has a length of n bits, then the probability of collision between two
different strings is 1/2n.
A non-reversible hash function has a very weak probability of
collision. It is thus difficult, given a fingerprint, to construct
a string with this fingerprint. The preceding function f_mod_n
is not, based on the evidence, such a function. One way hash functions
permit the authentification of a string, that it is for some text sent
over the Internet, a file, etc.
The Digest module uses the MD5 algorithm, short for
Message Digest 5. It returns a 128 bit fingerprint. Although
the algorithm is public, it is impossible (today) to carry out a
reconstruction from a fingerprint. This module defines the
Digest.t type as an abbreviation of the string type. The
figure 8.3 details the main functions of this module.
string |
: |
string -> t |
|
|
returns the fingerprint of a string |
file |
: |
string -> t |
|
|
returns the fingerprint of a file |
Figure 8.3: Functions of the Digest module.
We use the string function in the following example on a small
string and on a large one built from the first. The fingerprint is always
of fixed length.
# let
s
=
"The small cat is dead..."
;;
val s : string = "The small cat is dead..."
# Digest.string
s;;
- : Digest.t = "xr6\127\171(\134=\238`\252F\028\t\210$"
# let
r
=
ref
s
in
for
i=
1
to
1
0
0
do
r:=
s^
!
r
done;
Digest.string
!
r;;
- : Digest.t = "\232\197|C]\137\180{>\224QX\155\131D\225"
The creation of a fingerprint for a program allows one to guarantee the
contents and thus avoids the use of a bad version. For example, when code
is dynamically loaded (see page
??), a fingerprint is used to select the binary file
to load.
# Digest.file
"basic.ml"
;;
- : Digest.t = "\179\026\191\137\157Ly|^w7\183\164:\167q"
Persistence
Persistence is the conservation of a value outside the running
execution of a program. This is the case when one writes a value in a file.
This value is thus accessible to any program that has access to the file.
Writing and reading persistent values requires the definition of a format
for representing the coding of data. In effect, one must know how to go
from a complex structure stored in memory, such as a binary tree, to a
linear structure, a list of bytes, stored in a file.
This is why the coding of persistent values is called linearization
1.
Realization and Difficulties of Linearization
The implementation of a mechanism for the linearization of data structures
requires choices and presents difficulties that we describe below.
-
read-write of data structures.
Since memory can always be viewed as a vector of words, one value can
always correspond to the memory that it occupies, leaving us to
preserve the useful part by then compacting the value.
- share or copy.
Must the linearization of a data structure conserve sharing?
Typically a binary tree having two identical children (in the sense of
physical equality) can indicate, for the second child, that it has already
saved the first. This characteristic influences the size of the saved value
and the time taken to do it. On the other hand, in the presence of
physically modifiable values, this could change the behavior of
this value after a recovery depending on whether or not sharing was
conserved.
- circular structures.
In the case of a circular value, linearization without sharing is likely
to loop. It will be necessary to conserve sharing.
- functional values.
Functional values, or closures, are composed of an environment part and
a code part. The code part corresponds to the entry point (address) of
the code to execute. What must thus be done with code?
It is possible to uniquely store this address, but thus only the same
program will find the correct meaning of this address. It is also
possible to save the list of machine instructions of this function,
but that would require having a mechanism to dynamically load code.
- guaranteeing the type when reloading.
This is the main difficulty of this mechanism. Static typing guarantees
that typed values will not generate type errors at execution time.
But this is not true except for values belonging to the program
during the course of execution. What type can one give to a value outside
the program, that was not seen by the type verifier? Just to verify that
the re-read value has the
monomorphic type generated by the compiler, the type would have to be
transmitted at the moment the value was saved, then the type would have
to be checked when the value was loaded. Additionally, a mechanism
to manage the versions of types would be needed to be safe in case
a type is redeclared in a program.
Marshal Module
The linearization mechanism in the Marshal module allows you
to choose to keep or discard the sharing of values. It also allows for
the use of closures, but in this case, only the
pointer to the code is saved.
This module is mainly comprised of functions for linearization via a channel
or a string, and functions for recovery via a channel or a string.
The linearization functions are
parameterizable. The following type declares two possible options:
type external_flag =
No_sharing
| Closures;;
The No_sharing constant constructor
indicates that the sharing of values is not to be preserved, though the default is
to keep sharing. The Closures constructor allows the use of closures
while conserving its pointer to the code. Its absence will raise an exception
if one tries to store a functional value.
Warning
The Closures constructor is inoperative in interactive mode. It can
only be used in command line mode.
The reading and writing functions in this module are gathered in figure
8.4.
to_channel |
: |
out_channel -> 'a -> extern_flag list -> unit |
to_string |
: |
'a -> extern_flag list -> string |
to_buffer |
: |
string -> int -> int -> 'a -> extern_flag list -> unit |
from_channel |
: |
in_channel -> 'a |
from_string |
: |
string -> int -> 'a |
Figure 8.4: Functions of the Marshal module.
The to_channel function takes an output channel, a value,
and a list of options and writes the value to the channel. The
to_string function produces a string corresponding to the
linearized value, whereas to_buffer accomplishes the same
task by modifying part of a string passed as an argument.
The from_channel function reads a linearized value from a
channel and returns it. The from_string variant
takes as input a string and the position of the first character to read
in the string. Several linearized values can be stored in the same file
or in the same string. For a file, they can be read sequentially.
For a string, one must specify the right offset from the beginning of
the string to decode the desired value.
# let
s
=
Marshal.to_string
[
1
;2
;3
;4
]
[]
in
String.sub
s
0
1
0
;;
- : string = "\132\149\166\190\000\000\000\t\000\000"
Warning
Using this module one loses the safety of static typing
(see infra, page ??).
Loading a persistent object creates a value of indeterminate type:
# let
x
=
Marshal.from_string
(Marshal.to_string
[
1
;
2
;
3
;
4
]
[])
0
;;
val x : '_a = <poly>
This indetermination is denoted in Objective CAML by the weakly typed variable
'_a. You should specify the expected type:
# let
l
=
let
s
=
(Marshal.to_string
[
1
;
2
;
3
;
4
]
[])
in
(Marshal.from_string
s
0
:
int
list)
;;
val l : int list = [1; 2; 3; 4]
We return to this topic on page ??.
Note
The output_value function of the preloaded library
corresponds to calling to_channel with an empty list of options.
The input_value function in the Pervasives module
directly calls the from_channel function. These functions were
kept for compatibility with old programs.
Example: Backup Screens
We want to save the bitmap, represented as a matrix of colors,
of the whole screen. The save_screen function recovers the
bitmap, converts it to a table of colors and saves it in a file
whose name is passed as a parameter.
# let
save_screen
name
=
let
i
=
Graphics.get_image
0
0
(Graphics.size_x
())
(Graphics.size_y
())
in
let
j
=
Graphics.dump_image
i
in
let
oc
=
open_out
name
in
output_value
oc
j;
close_out
oc;;
val save_screen : string -> unit = <fun>
The load_screen function does the reverse operation.
It opens the file whose name is passed as a parameter, restores the value
stored inside, converts this color matrix into a bitmap, then displays the
bitmap.
# let
load_screen
name
=
let
ic
=
open_in
name
in
let
image
=
((input_value
ic)
:
Graphics.color
array
array)
in
close_in
ic;
Graphics.close_graph();
Graphics.open_graph
(" "
^
(string_of_int(Array.length
image.
(0
)))
^
"x"
^
(string_of_int(Array.length
image)));
let
image2
=
Graphics.make_image
image
in
Graphics.draw_image
image2
0
0
;
image2
;;
val load_screen : string -> Graphics.image = <fun>
Warning
Abstract typed values cannot be made persistent.
It is for this reason that the preceding example does not use the abstract
Graphics.image type, but instead uses the concrete
color array array type. The abstraction of types is presented in
chapter 14.
Sharing
The loss of sharing in a data structure can make the structure
completely lose its intended behavior. Let us revisit the example of
the symbol generator from page
??. For whatever reason, we want to save the
functional values new_s and reset_s,
and thereafter use the current value of their common counter.
We thus write the following program:
# let
reset_s,
new_s
=
let
c
=
ref
0
in
(
function
()
->
c
:=
0
)
,
(
function
s
->
c:=!
c+
1
;
s^
(string_of_int
!
c)
)
;;
# let
save
=
Marshal.to_string
(new_s,
reset_s)
[
Marshal.
Closures;Marshal.
No_sharing]
;;
# let
(new_s1,
reset_s1)
=
(Marshal.from_string
save
0
:
((string
->
string
)
*
(unit
->
unit)))
;;
# (* 1 *)
Printf.printf
"new_s : \%s\n"
(new_s
"X"
);
Printf.printf
"new_s : \%s\n"
(new_s
"X"
);
(* 2 *)
Printf.printf
"new_s1 : \%s\n"
(new_s1
"X"
);
(* 3 *)
reset_s1();
Printf.printf
"new_s1 (after reset_s1) : \%s\n"
(new_s1
"X"
)
;;
Characters 148-154:
Unbound value new_s1
The first two outputs in (* 1 *)
comply with our intent.
The output obtained in (* 2 *)
after re-reading
the closures also appears correct (after X2
comes X3
). But, in fact, the sharing of the c counter
between the re-read functions new_s1 and reset_s1
is lost, as the output of X4
attests that one of them
set the counter to zero. Each closure has a copy of the counter and
the call to reset_s1 does not reset the
new_s1 counter to zero. Thus we should not have used the No_sharing
option during the linearization.
It is generally necessary to conserve sharing. Nevertheless in certain
cases where execution speed is important, the absence of sharing
speeds up the process of saving. The following example demonstrates a
function that copies a matrix. In this case it might be preferable to break
the sharing:
# let
copy_mat_f
(m
:
float
array
array)
=
let
s
=
Marshal.to_string
m
[
Marshal.
No_sharing]
in
(Marshal.from_string
s
0
:
float
array
array);;
val copy_mat_f : float array array -> float array array = <fun>
One can also use it to create a matrix without sharing:
# let
create_mat_f
n
m
v
=
let
m
=
Array.create
n
(Array.create
m
v)
in
copy_mat_f
m;;
val create_mat_f : int -> int -> float -> float array array = <fun>
# let
a
=
create_mat_f
3
4
3
.
1
4
;;
val a : float array array =
[|[|3.14; 3.14; 3.14; 3.14|]; [|3.14; 3.14; 3.14; 3.14|];
[|3.14; 3.14; 3.14; 3.14|]|]
# a.
(1
).
(2
)
<-
6
.
2
8
;;
- : unit = ()
# a;;
- : float array array =
[|[|3.14; 3.14; 3.14; 3.14|]; [|3.14; 3.14; 6.28; 3.14|];
[|3.14; 3.14; 3.14; 3.14|]|]
Which is a more common behavior than that of
Array.create, and resembles that of
Array.create_matrix.
Size of Values
It may be useful to know the size of a persistent value. If sharing
is conserved, this size also reflects the amount of memory occupied by
a value. Although the encoding sometimes optimizes the size
of atomic values2, knowing
the size of their respective encodings permits us to compare different
implementations of a data structure.
In addition, for programs that will never stop themselves, like
embedded systems or even network servers; watching the size of
data structures can help detect memory leaks. The Marshal
module has two functions to calculate the size of a constant. They are
described in figure 8.5.
header_size |
: |
int |
data_size |
: |
string -> int -> int |
total_size |
: |
string -> int -> int |
Figure 8.5: Size functions of Marshal.
The total size of a persistent value is the same as the size of its
data structures plus the size of its
header.
Below is a small example of the use of MD5 encoding to compare
two representations of binary trees:
# let
size
x
=
Marshal.data_size
(Marshal.to_string
x
[])
0
;;
val size : 'a -> int = <fun>
# type
'a
bintree1
=
Empty1
|
Node1
of
'a
*
'a
bintree1
*
'a
bintree1
;;
type 'a bintree1 = | Empty1 | Node1 of 'a * 'a bintree1 * 'a bintree1
# let
s1
=
Node1(2
,
Node1(1
,
Node1(0
,
Empty1,
Empty1),
Empty1),
Node1(3
,
Empty1,
Empty1))
;;
val s1 : int bintree1 =
Node1
(2, Node1 (1, Node1 (0, Empty1, Empty1), Empty1),
Node1 (3, Empty1, Empty1))
# type
'a
bintree2
=
Empty2
|
Leaf2
of
'a
|
Node2
of
'a
*
'a
bintree2
*
'a
bintree2
;;
type 'a bintree2 =
| Empty2
| Leaf2 of 'a
| Node2 of 'a * 'a bintree2 * 'a bintree2
# let
s2
=
Node2(2
,
Node2(1
,
Leaf2
0
,
Empty2),
Leaf2
3
)
;;
val s2 : int bintree2 = Node2 (2, Node2 (1, Leaf2 0, Empty2), Leaf2 3)
# let
s1,
s2
=
size
s1,
size
s2
;;
val s1 : int = 13
val s2 : int = 9
The values given by the size function reflect well the
intuition that one might have of the size of s1 and
s2.
Typing Problem
The real problem with persistent values is that it is possible to
break the type system of Objective CAML. The creation functions return
a monomorphic type (unit or string). On the other
hand unmarshalling functions return a polymorphic type 'a.
From the point of view of types, you can do anything with a persistent
value. Here is the usage that can be done with it
(see chapter 2, page
??): create a function
magic_copy of type 'a -> 'b.
# let
magic_copy
a
=
let
s
=
Marshal.to_string
a
[
Marshal.
Closures]
in
Marshal.from_string
s
0
;;
val magic_copy : 'a -> 'b = <fun>
The use of such a function causes a brutal halt in the execution
of the program.
# (magic_copy 3 : float) +. 3.1;;
Segmentation fault
In interactive mode (under Linux), we even leave the toplevel
(interactive) loop with a system error signal corresponding
to a memory violation.
Interface with the System
The standard library has six system interface modules:
-
module Sys: for communication between the operating system
and the program;
- module Arg: to analyze parameters passed to the program
from the command line;
- module Filename: operations on file names
- module Printexc: for the interception and printing
of exceptions;
- module Gc: to control the mechanism that automatically
deallocates memory, described in chapter 9;
- module Callback: to call Objective CAML functions from C,
described in chapter 12.
The first four modules are described below.
Module Sys
This module provides quite useful functions for communication with
the operating system, such as handling the signals received by
a program. The values in figure 8.6 contain information
about the system.
OS_type |
: |
string |
|
|
type of system |
interactive |
: |
bool ref |
|
|
true if executing at the toplevel |
word_size |
: |
string |
|
|
size of a word (32 or 64 bits) |
max_string_length |
: |
int |
|
|
maximum size of a string |
max_array_length |
: |
int |
|
|
maximum size of a vector |
time |
: |
unit -> float |
|
|
gives the time in seconds since the start of
the program |
Figure 8.6: Information about the system.
Communication between the program and the system can go through the command
line, the value of an environmental variable, or through running
another program. These functions are described
in figure 8.7.
argv |
: |
string array |
|
|
contains the vector of parameters |
getenv |
: |
string -> string |
|
|
retrieves the value of a variable |
command |
: |
string -> int |
|
|
executes the command passed as an argument |
Figure 8.7: Communication with the system.
The functions of the figure 8.8 allow us to
navigate in the file hierarchy.
file_exists |
: |
string -> bool |
|
|
returns true if the file exists |
remove |
: |
string -> unit |
|
|
destroys a file |
rename |
: |
string -> string -> unit |
|
|
renames a file |
chdir |
: |
string -> unit |
|
|
change the current directory |
getcwd |
: |
unit -> string |
|
|
returns the name of the current directory |
Figure 8.8: File manipulation.
Finally, the management of signals will be described in the chapter on
system programming (18).
Here is a small program that revisits the example of saving a graphics
window as an array of colors. The main function verifies
that it is not started from the interactive loop, then reads from the
command line the names of files to display, then tests if they exist, then
displays them (with the load_screen function). We wait for a key to be
pressed between displaying two images.
# let
main
()
=
if
not
(!
Sys.interactive)
then
for
i
=
0
to
Array.length(Sys.argv)
-
1
do
let
name
=
Sys.argv.
(i)
in
if
Sys.file_exists
name
then
begin
ignore(load_screen
name);
ignore(Graphics.read_key)
end
done;;
val main : unit -> unit = <fun>
Module Arg
The Arg module defines a small syntax for command line
arguments. With this module, you can parse arguments and associate actions
with them.
The various elements of the command line are separated by one or more
spaces. They are the values stored in the Sys.argv array.
In the syntax provided by
Arg, certain elements are distinguished by starting with the
minus character (-). These are called command line
keywords or switches. One can associate a specific action with a keyword
or take as an argument a value of type string, int
or float. The value of these arguments is initialized with
the value found on the command line just after the keyword.
In this case one can call a function that converts character strings
into the expected type. The other elements on the command line are called
anonymous arguments. One associates an action with them that takes
their value as an argument. An undefined option causes the display
of some short documentation on the command line. The documentation's
contents are defined by the user.
The actions associated with keywords are encapsulated in the type:
type
spec
=
|
Unit
of
(unit
->
unit)
(* Call the function with unit argument*)
|
Set
of
bool
ref
(* Set the reference to true*)
|
Clear
of
bool
ref
(* Set the reference to false*)
|
String
of
(string
->
unit)
(* Call the function with a string
argument *)
|
Int
of
(int
->
unit)
(* Call the function with an int
argument *)
|
Float
of
(float
->
unit)
(* Call the function with a float
argument *)
|
Rest
of
(string
->
unit)
(* Stop interpreting keywords and call the
function with each remaining argument*)
The command line parsing function is:
# Arg.parse
;;
- : (string * Arg.spec * string) list -> (string -> unit) -> string -> unit =
<fun>
Its first argument is a list of triples of the form
(key, spec, doc) such that:
-
key is a character string corresponding to the keyword.
It starts with the reserved character '_'.
- spec is a value of type spec specifying the
action associated with key.
- doc is a character string describing the option
key. It is displayed upon a syntax error.
The second argument is the function to process the anonymous
command line arguments.
The last argument is a character string displayed at the beginning of
the command line documentation.
The Arg module also includes:
-
Bad: an exception taking as its argument a character string.
It can be used by the processing functions.
- usage: of type (string * Arg.spec * string)
list -> string -> unit, this function displays the command line
documentation. One preferably provides it with the same arguments as
those of parse.
- current: of type int ref that contains a
reference to the current value of the index in the Sys.argv array.
One can therefore modify this value if necessary.
By way of an example, we show a function read_args that
initializes the configuration of
the Minesweeper game seen in chapter 6, page ??.
The possible options will be
-col, -lin and -min. They will be followed
by an integer indicating, respectively: the number of columns,
the number of lines and the number of mines desired. These values must
not be less than the default values,
respectively 10, 10 and 15.
The processing functions are:
# let
set_nbcols
cf
n
=
cf
:=
{!
cf
with
nbcols
=
n}
;;
# let
set_nbrows
cf
n
=
cf
:=
{!
cf
with
nbrows
=
n}
;;
# let
set_nbmines
cf
n
=
cf
:=
{!
cf
with
nbmines
=
n}
;;
All three are of type config ref -> int -> unit.
The command line parsing function can be written:
# let
read_args()
=
let
cf
=
ref
default_config
in
let
speclist
=
[
("-col"
,
Arg.
Int
(set_nbcols
cf),
"number of columns (>=10)"
);
("-lin"
,
Arg.
Int
(set_nbrows
cf),
"number of lines (>=10)"
);
("-min"
,
Arg.
Int
(set_nbmines
cf),
"number of mines (>=15)"
)]
in
let
usage_msg
=
"usage : minesweep [-col n] [-lin n] [-min n]"
in
Arg.parse
speclist
(fun
s
->
())
usage_msg;
!
cf
;;
val read_args : unit -> config = <fun>
This function calculates a configuration that will be passed as arguments
to open_wcf, the function that opens the main window
when the game is started. Each option is, as its name indicates, optional.
If it does not appear on the command line, the corresponding parameter
keeps its default value. The order of the options is unimportant.
Module Filename
The Filename module has operating system independant functions to
manipulate the names of files. In practice, the
file and directory naming conventions differ greatly
between Windows, Unix and MacOS.
Module Printexc
This very short module (three functions described in figure 8.9)
provides a general exception handler. This is particularly useful
for programs executed in command mode3
to be sure not to allow an exception to escape that would stop the program.
catch |
: |
('a -> 'b) -> 'a -> 'b |
|
|
general exception handler |
print |
: |
('a -> 'b) -> 'a -> 'b |
|
|
print and re-raise the exception |
to_string |
: |
exn -> string |
|
|
convert an exception to a string |
Figure 8.9: Handling exceptions.
The catch function applies its first argument to its
second. This launches the main function of the program. If an
exception arrives at the level of catch, that is to say that
if it is not handled inside the program, then catch will print
its name and exit the program. The print function has the same
behavior as catch but re-raises the exception after printing it.
Finally the to_string function converts an exception
into a character string. It is used by the two preceding functions.
If we look again at the main function for displaying
bitmaps, we might thus write an encapsulating function
go in the following manner:
# let
go
()
=
Printexc.catch
main
();;
val go : unit -> unit = <fun>
This permits the normal termination of the program by printing the
value of the uncaptured exception.