Allocation and Deallocation of Memory
Most languages permit dynamic memory allocation, among them C,
Pascal, Lisp, ML, SmallTalk, C++, Java, ADA.
Explicit Allocation
We distinguish two types of allocation:
-
a simple allocation reserving a block of memory of a certain
size without concern of its contents;
- an allocation combining the reservation of space with its initialization.
The first case is illustrated by the function new in Pascal
or malloc in C. These return a pointer to a memory block
(i.e. its address), through which the value stored in memory can
be read or modified. The second case corresponds to the construction
of values in Objective CAML, Lisp, or in object-oriented languages.
Class instances in object-oriented languages are constructed by
combining new with the invocation of a constructor for the
class, which usually expects a number of parameters. In functional
languages, constructor functions are called in places where a
structural value (tuple, list, record, vector, or closure) is defined.
Let's examine an example of value construction in Objective CAML. The
representation of values in memory is illustrated in
Figure 9.1.
Figure 9.1: Memory representation of values.
# let
u
=
let
l
=
[
'c'
;
'a'
;
'm'
]
in
List.tl
l
;;
val u : char list = ['a'; 'm']
# let
v
=
let
r
=
(
[
'z'
]
,
u
)
in
match
r
with
p
->
(fst
p)
@
(snd
p)
;;
val v : char list = ['z'; 'a'; 'm']
A list element is represented by a tuple of two words, the first
containing a character and the second containing a pointer to the next
element of the list. The actual runtime representation differs
slightly and is described in the chapter on interfacing with C (see
page ??).
The first definition constructs a value named l by allocating
a cell (constructor ::) for each element of the list
[
'c'
;'a'
;'m'
]
. The global declaration u
corresponds to the tail of l. This establishes a sharing
relationship between l and u, i.e. between the
argument and the result of the function call to List.tl.
Only the declaration u is known after the evaluation of this
first statement.
The second statement constructs a list with only one element, then a
pair called r containing this list and the list u.
This pair is pattern matched
and renamed p by the matching. Next, the first element of
p is concatenated with its second element, which creates a
value [
'z'
;'a'
;'m'
]
tied to the global identifier
v. Notice that the result of snd (the list
[
'a'
;'m'
]
) is shared with its argument p whereas the
result of fst (the character 'z') is copied.
In each case memory allocation is explicit, meaning that it is requested
by the programmer (by a language command or instruction).
Note
Allocated memory stores information on the size of the object allocated
in order to be able to free it later.
Explicit Reclamation
Languages with explicit memory reclamation possess a freeing operator (free in C or dispose in Pascal) that take the address (a
pointer) of the region to deallocate. Using the information stored at
allocation time, the program frees this region and may re-use it later.
Dynamic allocation is generally used to manipulate data structures that
evolve, such as lists, trees etc.. Freeing the space occupied by
such data is not done in one fell swoop, but instead requires a
function to traverse the data. We call such functions destructors.
Although correctly defining destructors
is not too difficult, their use is quite delicate.
In fact, in order to free the space occupied by a structure, it is
necessary to traverse the structure's pointers and apply the language's
freeing operator.
Leaving the responsibility of freeing memory to the programmer has the
advantage that the latter is sure of the actions taken.
However, incorrect use of these operators can cause an error during the
execution of the program.
The principal dangers of explicit memory reclamation are:
-
dangling pointers: a memory region has been freed while there
are still pointers pointing at it. If the region is reused, access
to the region by these pointers risks being incoherent.
- Inaccessible memory regions (a memory ``leak''): a memory region is
still allocated, but no longer referenced by any pointer. There is no
longer any possibility of freeing the region. There is a clear loss of memory.
The entire difficulty with explicit memory reclamation is that of knowing
the lifetime of the set of values of a program.
Implicit Reclamation
Languages with implicit memory reclamation do not possess memory-freeing
operators. It is not possible for the programmer to free
an allocated value. Instead, an automatic
reclamation mechanism is engaged when a value is no longer referenced, or
at the time of an allocation failure, that is to say, when the heap is
full.
An automatic memory reclamation algorithm is in some ways a global
destructor. This characteristic makes its design and
implementation more difficult than that of a destructor dedicated to a
particular data structure. But, once this difficulty is overcome, the
memory reclamation function obtained greatly enhances the safety of
memory management. In particular, the risk of dangling pointers
disappears.
Furthermore, an automatic memory reclamation mechanism may bring good
properties to the heap:
-
compaction: all the recovered memory belongs to a single
block, thereby avoiding fragmentation of the heap, and allowing
allocation of objects of the size of the free space on the heap;
- localization: the different parts of the same value are
close to one another from the point of view of memory address,
permitting them to remain in the same memory pages during use,
and thereby avoiding their erasure from cache memory.
Design choices for a garbage collector must take certain
criteria and constraints into account:
-
reclamation factor: what percentage of unused memory is available?
- memory fragmentation: can one allocate a block the size of the
free memory?
- the slowness of allocation and collection;
- what freedom do we have regarding the representation of values?
In practice, the safety criterion remains primordial, and garbage
collectors find a compromise among the other constraints.