Modifiable Data Structures
Values of the following types: vectors, character strings,
records with mutable fields, and references are the data structures whose parts can be
physically modified.
We have seen that an Objective CAML variable bound to a value keeps this value to
the end of its lifetime. You can only modify this binding with a
redefinition---in which case we are not really talking about the ``same''
variable; rather, a new variable of the same name now masks the old
one, which is no longer directly accessible, but which remains unchanged.
With modifiable values, you can change the value associated with a variable
without having to redeclare the latter. You have access to the value of a
variable for writing as well as for reading.
Vectors
Vectors, or one dimensional arrays, collect a known number of elements of
the same type. You can write a vector directly by listing its values
between the symbols [| and |], separated by semicolons as
for lists.
# let
v
=
[|
3
.
1
4
;
6
.
2
8
;
9
.
4
2
|]
;;
val v : float array = [|3.14; 6.28; 9.42|]
The creation function Array.create takes the number of elements in
the vector and an initial value, and returns a new vector.
# let
v
=
Array.create
3
3
.
1
4
;;
val v : float array = [|3.14; 3.14; 3.14|]
To access or modify a particular element, you give the index of that
element:
Syntax
expr1 . ( expr2 )
Syntax
expr1 . ( expr2 )
<- expr3
expr1 should be a vector (type array)
whose values have type expr3. The expression
expr2 must, of course, have type int. The modification
is an expression of type unit. The first element of a vector has
index 0 and the index of the last element is the length of the vector minus
1. The parentheses around the index expression are required.
# v.
(1
)
;;
- : float = 3.14
# v.
(0
)
<-
1
0
0
.
0
;;
- : unit = ()
# v
;;
- : float array = [|100; 3.14; 3.14|]
If the index used to access an element in an array is outside the range of
indices of the array, an exception is raised at the moment of access.
# v.
(-
1
)
+.
4
.
0
;;
Uncaught exception: Invalid_argument("Array.get")
This check is done during program execution, which can slow it down.
Nevertheless it is essential, in order to avoid writing to
memory outside the space allocated to a vector, which would cause serious
execution errors.
The functions for manipulating arrays are part of the Array module
in the standard library. We'll describe them in chapter
8 (page ??). In the examples
below, we will use the following three functions from the Array
module:
-
create which creates an array of the given size with the
given initial value;
- length which gives the length of a vector;
- append which concatenates two vectors.
Sharing of Values in a Vector
All the elements of a vector contain the value that was passed in when it
was created. This implies a sharing of this value, if it is a structured
value. For example, let's create a matrix as a vector of vectors using the
function create from the Array module.
# let
v
=
Array.create
3
0
;;
val v : int array = [|0; 0; 0|]
# let
m
=
Array.create
3
v;;
val m : int array array = [|[|0; 0; 0|]; [|0; 0; 0|]; [|0; 0; 0|]|]
Figure 3.1: Memory representation of a vector sharing its elements.
If you modify one of the fields of vector v, which was used in the
creation of m, then you automatically modify all the ``rows''
of the matrix together (see figures 3.1 and
3.2).
# v.
(0
)
<-
1
;;
- : unit = ()
# m;;
- : int array array = [|[|1; 0; 0|]; [|1; 0; 0|]; [|1; 0; 0|]|]
Figure 3.2: Modification of shared elements of a vector.
Duplication occurs if the initialization value of the vector (the second
argument passed to Array.create) is an atomic value
and there is sharing if this value is a structured value.
Values whose size does not exceed the standard size of Objective CAML
values---that is, the memory word---are called atomic values. These are
the integers, characters, booleans, and constant constructors. The other
values---structured values---are represented by a pointer into a memory area.
This distinction is detailed in chapter 9 (page
??).
Vectors of floats are a special case. Although floats are structured
values, the creation of a vector of floats causes the the initial value to
be copied. This is for reasons of optimization. Chapter 12, on
the interface with the C language (page ??), describes this
special case.
Non-Rectangular Matrices
A matrix, a vector of vectors, does not need not to be rectangular. In fact,
nothing stops you from replacing one of the vector elements with a vector of a
different length. This is useful to limit the size of such a matrix. The
following value t constructs a triangular matrix for the
coefficients of Pascal's triangle.
# let
t
=
[|
[|
1
|]
;
[|
1
;
1
|]
;
[|
1
;
2
;
1
|]
;
[|
1
;
3
;
3
;
1
|]
;
[|
1
;
4
;
6
;
4
;
1
|]
;
[|
1
;
5
;
1
0
;
1
0
;
5
;
1
|]
|]
;;
val t : int array array =
[|[|1|]; [|1; 1|]; [|1; 2; 1|]; [|1; 3; 3; 1|]; [|1; 4; 6; 4; ...|]; ...|]
# t.
(3
)
;;
- : int array = [|1; 3; 3; 1|]
In this example, the element of vector t with index i is a
vector of integers with size i+1. To manipulate such matrices, you have
to calculate the size of each element vector.
Copying Vectors
When you copy a vector, or when you concatenate two vectors, the result
obtained is a new vector. A modification of the original vectors does not
result in the modification of the copies, unless, as usual, there are
shared values.
# let
v2
=
Array.copy
v
;;
val v2 : int array = [|1; 0; 0|]
# let
m2
=
Array.copy
m
;;
val m2 : int array array = [|[|1; 0; 0|]; [|1; 0; 0|]; [|1; 0; 0|]|]
# v.
(1
)<-
3
5
2
;;
- : unit = ()
# v2;;
- : int array = [|1; 0; 0|]
# m2
;;
- : int array array = [|[|1; 352; 0|]; [|1; 352; 0|]; [|1; 352; 0|]|]
We notice in this example that copying m only copies the pointers
to v. If one of the elements of v is modified,
m2 is modified too.
Concatenation creates a new vector whose size
is equal to the sum of the sizes of the two others.
# let
mm
=
Array.append
m
m
;;
val mm : int array array =
[|[|1; 352; 0|]; [|1; 352; 0|]; [|1; 352; 0|]; [|1; 352; 0|];
[|1; 352; ...|]; ...|]
# Array.length
mm
;;
- : int = 6
# m.
(0
)
<-
Array.create
3
0
;;
- : unit = ()
# m
;;
- : int array array = [|[|0; 0; 0|]; [|1; 352; 0|]; [|1; 352; 0|]|]
# mm
;;
- : int array array =
[|[|1; 352; 0|]; [|1; 352; 0|]; [|1; 352; 0|]; [|1; 352; 0|];
[|1; 352; ...|]; ...|]
On the other hand, modification of v, a value shared by m
and mm, does affect both these matrices.
# v.
(1
)
<-
1
8
;;
- : unit = ()
# mm;;
- : int array array =
[|[|1; 18; 0|]; [|1; 18; 0|]; [|1; 18; 0|]; [|1; 18; 0|]; [|1; 18; ...|];
...|]
Character Strings
Character strings can be considered a special case of vectors of
characters. Nevertheless, for efficient memory usage1 their type is
specialized. Moreover, access to their elements has a special syntax:
Syntax
expr1 . [expr2]
The elements of a character string can be physically modified:
Syntax
expr1 . [expr2]
<- expr3
# let
s
=
"hello"
;;
val s : string = "hello"
# s.[
2
]
;;
- : char = 'l'
# s.[
2
]<-
'Z'
;;
- : unit = ()
# s;;
- : string = "heZlo"
Mutable Fields of Records
Fields of a record can be declared mutable. All you have to do is to show
this in the declaration of the type of the record using the keyword
mutable.
Syntax
type name = { ...;
mutable namei : t
; ...}
Here is a small example defining a record type for points in the plane:
# type
point
=
{
mutable
xc
:
float;
mutable
yc
:
float
}
;;
type point = { mutable xc: float; mutable yc: float }
# let
p
=
{
xc
=
1
.
0
;
yc
=
0
.
0
}
;;
val p : point = {xc=1; yc=0}
Thus the value of a field which is declared mutable can be modified
using the syntax:
Syntax
expr1 . name <- expr2
The expression expr1 should be a record type which has the field
name. The modification operator returns a value of type
unit.
# p.
xc
<-
3
.
0
;;
- : unit = ()
# p
;;
- : point = {xc=3; yc=0}
We can write a function for moving a point by modifying its components.
We use a local declaration with pattern matching in order to sequence the
side-effects.
# let
moveto
p
dx
dy
=
let
()
=
p.
xc
<-
p.
xc
+.
dx
in
p.
yc
<-
p.
yc
+.
dy
;;
val moveto : point -> float -> float -> unit = <fun>
# moveto
p
1
.
1
2
.
2
;;
- : unit = ()
# p
;;
- : point = {xc=4.1; yc=2.2}
It is possible to mix mutable and non-mutable fields in the definition of a
record. Only those specified as mutable may be modified.
# type
t
=
{
c1
:
int;
mutable
c2
:
int
}
;;
type t = { c1: int; mutable c2: int }
# let
r
=
{
c1
=
0
;
c2
=
0
}
;;
val r : t = {c1=0; c2=0}
# r.
c1
<-
1
;;
Characters 0-9:
The label c1 is not mutable
# r.
c2
<-
1
;;
- : unit = ()
# r
;;
- : t = {c1=0; c2=1}
On page ?? we give an example of using records with
modifiable fields and arrays to implement a stack structure.
References
Objective CAML provides a polymorphic type ref which can be seen
as the type of a pointer to any value; in Objective CAML terminology we call it
a reference to a value. A referenced value can be
modified. The type ref is defined as a record with one
modifiable field:
type 'a ref = {mutable contents:'a}
This type is provided as a syntactic shortcut. We
construct a reference to a value using the function ref. The
referenced value can be reached using the prefix function
(!). The function modifying the content of a reference is the
infix function (:=).
# let
x
=
ref
3
;;
val x : int ref = {contents=3}
# x
;;
- : int ref = {contents=3}
# !
x
;;
- : int = 3
# x
:=
4
;;
- : unit = ()
# !
x
;;
- : int = 4
# x
:=
!
x+
1
;;
- : unit = ()
# !
x
;;
- : int = 5
Polymorphism and Modifiable Values
The type ref is parameterized. This is what lets us use it to
create references to values of any type whatever. However, it is
necessary to place certain restrictions on the type of referenced
values; we cannot allow the creation of a reference to a value with a
polymorphic type without taking some precautions.
Let us suppose that there were no restriction; then someone could
declare:
let x = ref [] ;;
Then the variable x would have type 'a list ref and
its value could be modified in a way which would be inconsistent with
the strong static typing of Objective CAML:
x
:=
1
::
!
x
;;
x
:=
true
::
!
x
;;
Thus we would have one and the same variable having type int
list at one moment and bool list the next.
In order to avoid such a situation, Objective CAML's type inference
mechanism uses a new category of type variables: weak type
variables. Syntactically, they are distinguished by the underscore
character which prefixes them.
# let
x
=
ref
[]
;;
val x : '_a list ref = {contents=[]}
The type variable '_a is not a type parameter, but an unknown
type awaiting instantiation; the first use of x after its
declaration fixes the value that '_a will take in all
types that depend on it, permanently.
# x
:=
0
::!
x
;;
- : unit = ()
# x
;;
- : int list ref = {contents=[0]}
From here onward, the variable x has type int list
ref.
A type containing an unknown is in fact monomorphic even though its
type has not been specified. It is not possible to instantiate this
unknown with a polymorphic type.
# let
x
=
ref
[]
;;
val x : '_a list ref = {contents=[]}
# x
:=
(function
y
->
())::!
x
;;
- : unit = ()
# x
;;
- : ('_a -> unit) list ref = {contents=[<fun>]}
In this example, even though we have instantiated the unknown type
with a type which is a priori polymorphic ('a ->
unit), the type has remained monomorphic with a new unknown type.
This restriction of polymorphism applies not only to references, but
to any value containing a modifiable part: vectors, records having at
least one field declared mutable, etc. Thus all the type
parameters, even those which have nothing to do with a modifiable
part, are weak type variables.
# type
('a,
'b)
t
=
{
ch1
:
'a
list
;
mutable
ch2
:
'b
list
}
;;
type ('a, 'b) t = { ch1: 'a list; mutable ch2: 'b list }
# let
x
=
{
ch1
=
[]
;
ch2
=
[]
}
;;
val x : ('_a, '_b) t = {ch1=[]; ch2=[]}
Warning
This modification of the typing of application has consequences for
pure functional programs.
Likewise, when you apply a polymorphic value to a polymorphic function,
you get a weak type variable, because you must not exclude the
possibility that the function may construct physically modifiable
values. In other words, the result of the application is always
monomorphic.
#
(function
x
->
x)
[]
;;
- : '_a list = []
You get the same result with partial application:
# let
f
a
b
=
a
;;
val f : 'a -> 'b -> 'a = <fun>
# let
g
=
f
1
;;
val g : '_a -> int = <fun>
To get a polymorphic type back, you have to abstract the second
argument of f and then apply it:
# let
h
x
=
f
1
x
;;
val h : 'a -> int = <fun>
In effect, the expression which defines h is the functional
expression function
x
->
f
1
x. Its evaluation produces a
closure which does not risk producing a side effect, because the body
of the function is not evaluated.
In general, we distinguish so-called ``non-expansive'' expressions,
whose calculation we are sure carries no risk of causing a side effect,
from other expressions, called ``expansive.'' Objective CAML's type system
classifies expressions of the language according to their
syntactic form:
-
``non-expansive'' expressions include primarily
variables, constructors of non-mutable values, and
abstractions;
- ``expansive'' expressions include primarily
applications and constructors of modifiable values. We can also include
here control structures like conditionals and pattern matching.