Introduction
The execution model of a program on a microprocessor corresponds to
that of imperative programming. More precisely, a program is a series of
instructions whose execution modifies the memory state of the
machine. Memory consists mainly of values created and manipulated by
the program. However, like any computer resource, available memory has a
finite size; a program trying to use more memory than the system
provides will be in an incoherent state. For this reason, it is
necessary to reuse the space of values that are at a given moment no
longer used by future computations during continued execution. Such memory
management has a strong influence on program execution and its
efficiency.
The action of reserving a block of memory for a certain use is called
allocation. We distinguish static allocation, which
happens at program load time, i.e. before execution starts, from
dynamic allocation, which happens during program
execution. Whereas statically allocated memory is never reclaimed
during execution, dynamically allocated regions are susceptible to
being freed, or to being reused during execution.
Explicit memory management is risky for two reasons:
-
if a block of memory is freed while it contains a value still
in use, this value may become corrupted before being
accessed. References to such values are called dangling
pointers;
- if the address of a memory block is no longer known to the
program, then the corresponding block cannot be freed before the end
of program execution. In such cases, we speak of a memory leak.
Explicit memory management by the programmer requires much care to
avoid these two possibilities. This task becomes rather difficult if
programs manipulate complicated data structures, and in particular if
data structures share common regions of memory.
To free the programmer from this difficult exercise, automatic memory
management mechanisms have been introduced into numerous programming
languages. The main idea is that at any moment during execution, the
only dynamically allocated values potentially useful to the program
are those whose addresses are known by the program, directly or
indirectly. All values that can no longer be reached at that moment
cannot be accessed in the future and thus their associated
memory can be reclaimed. This deallocation can be effected either
immediately when a value becomes unreachable, or later when the
program requires more free space than is available.
Objective CAML uses a mechanism called garbage collection (GC) to
perform automatic memory management. Memory is allocated at value
construction (i.e., when a constructor is applied) and it is
freed implicitly. Most programs do not have to deal with the garbage
collector directly, since it works transparently behind the scenes.
However, garbage collection can have an effect on efficiency for
allocation-intensive programs. In such cases, it is useful to control
the GC parameters, or even to invoke the collector
explicitly. Moreover, in order to interface Objective CAML with other
languages (see chapter 12), it is necessary to understand
what constraints the garbage collector imposes on data
representations.