Processes
Unix associates a process with each execution of a program.
In [CDM98] Card, Dumas and Mével describe the difference between
a program and a process:
``a program itself is not a process: a program is a passive entity
(an executable file on a disc), while a process is an active entity
with a counter specifying the next instruction to execute and a set
of associated resources.''
Unix is a multi-task operating system: many processes
may be executed at the same time. It is preemptive, which means
that the execution of processes is entrusted to a particular
process. A process is therefore not totally master of its resources.
Especially a process can not determine the time of its execution.
A process has to be created.
Each process has his own private memory space. Processes can
communicate via files or communication channels. Thus the distributed memory
model of parallelism is simulated on a single machine.
The system gives each process a unique identifier: the PID
(Process IDentifier). Under Unix each process, except
the initial process, is created by another process, which is called
its parent.
The set of all active processes can be listed by the Unix
command ps3:
$ ps -f
PID PPID CMD
1767 1763 csh
2797 1767 ps -f
The use of the option -f adds for each active process its
identifier (PID), that of its parent (PPID) and the
name of the started program (CMD). Here we have two
processes, the command line interpreter csh and the command
ps itself. It can be seen that ps has been started from
the command line interpreter csh. The parent of its process
is the process associated with the execution of csh.
Executing a Program
Execution Context
Three values are associated with an executing program, which is started from the
command line:
-
The command line used to start it. It is contained in
the value Sys.argv.
- The environment variables of the command line interpreter.
These can be accessed by the command Sys.getenv.
- An execution status until the program is terminated.
Command line.
The command line allows you to read arguments or options of a program call.
The behavior of the program may depend from these values.
Here is a small example. We write the following program
into the file argv_ex.ml:
if
Array.length
Sys.argv
=
1
then
Printf.printf
"Hello world\n"
else
if
Array.length
Sys.argv
=
2
then
Printf.printf
"Hello %s\n"
Sys.argv.
(1
)
else
Printf.printf
"%s : too many arguments\n"
Sys.argv.
(0
)
We compile it:
$ ocamlc -o argv_ex argv_ex.ml
And we execute it:
$ argv_ex
Hello world
$ argv_ex reader
Hello reader
$ argv_ex dear reader
./argv_ex : too many arguments
Environment variables.
Environment variables may contain values necessary for execution.
The number and the names of these variables depend on the
operating system and on the user configuration. The values of these
variables can be accessed by the function getenv,
which takes as argument the name of a variable in form of a
character string:
#
Sys.getenv
"HOSTNAME"
;;
-
:
string
=
"zinc.pps.jussieu.fr"
Execution Status
The return value of a program is generally a fixed integer,
indicating if the program did terminate with an error or not.
The exact values may differ from one operating system to another.
The programer can always explicitly stop his program and return
the execution status value with the function call:
# Pervasives.exit
;;
- : int -> 'a = <fun>
Process Creation
A program is started by another process, which is called
the current process. The executed program becomes a new process.
There are three different relations between the two processes:
-
The two processes are independent from each other
and can be executed concurrently.
- The parent process is waiting for the child process to terminate.
- The created process replaces the parent process, which terminates.
It is also possible to duplicate the current process to obtain two instances.
The two instances of the process do not differ but in their PID.
This is the famous fork which we will describe later.
Independent Processes
The Unix module offers a portable function to create a process.
# Unix.create_process
;;
- : string ->
string array ->
Unix.file_descr -> Unix.file_descr -> Unix.file_descr -> int
= <fun>
The first argument is the name of the program (it may be a path).
The second is the array of arguments for the program. The last three
arguments are the descriptors indicating the standard input,
standard output and standard error output of the process. The
return value is the PID of the created process.
There also exists a variant of this function which allows you to indicate
the values of environment variables:
# Unix.create_process_env
;;
- : string ->
string array ->
string array ->
Unix.file_descr -> Unix.file_descr -> Unix.file_descr -> int
= <fun>
These two functions can be used under Unix and Windows.
GGH
Process Stacks
It is not always useful for a created process to be of concurrent nature.
The parent process may have to wait for the created process to
terminate. The two following functions take as argument the
name of a command and execute it.
# Sys.command;;
- : string -> int = <fun>
# Unix.system;;
- : string -> Unix.process_status = <fun>
They differ in the type of the return code. The type
process_status is explained in more detail on page
??.
During the execution of the command the parent process is blocked.
Replacement of Current Processes
The replacement of current processes by freshly created processes
allows you to limit the number of concurrently executed processes.
The four following functions allow this:
# Unix.execv
;;
- : string -> string array -> unit = <fun>
# Unix.execve
;;
- : string -> string array -> string array -> unit = <fun>
# Unix.execvp
;;
- : string -> string array -> unit = <fun>
# Unix.execvpe
;;
- : string -> string array -> string array -> unit = <fun>
Their first argument is the name of the program. Using
execvp or execvpe, this name may indicate a path
in the file system. The second argument contains the program arguments.
The last argument of the functions execve and
execvpe additionally allows you to indicate the values of
system variables.
Creation of Processes by Duplication
The original system call to create processes under Unix is:
# Unix.fork
;;
- : unit -> int = <fun>
The function fork starts a new process, not a new program.
Its effect is to duplicate the calling process. The code of
the new process is the same as that of its parent. Under Unix
the same code can be shared by several processes, each process possessing
its own execution context. Therefore we speak about
reentrant code.
Let's look at the following small program (we use the function
getpid which returns the PID of the process associated
with the execution):
Printf.printf
"before fork : %d\n"
(Unix.getpid
())
;;
flush
stdout
;;
Unix.fork
()
;;
Printf.printf
"after fork : %d\n"
(Unix.getpid
())
;;
flush
stdout
;;
We obtain the following output:
before fork : 10529
after fork : 10529
after fork : 10530
After the execution of fork, two processes execute the code.
This leads to the output of two PID's ``after'' the
fork. We note that
one process has kept the PID of the beginning (the parent).
The other one has a new PID (the child), which corresponds to the
return value of the fork call. For the parent process
the return value of fork is the PID of the child, while for
the child, it is 0.
It is this difference in the return value of fork which
allows in one program source to decide which code shall be
executed by the child and which by the parent:
Printf.printf
"before fork : %d\n"
(Unix.getpid
())
;;
flush
stdout
;;
let
pid
=
Unix.fork
()
;;
if
pid=
0
then
(* -- Code of the child *)
Printf.printf
"I am the child: %d\n"
(Unix.getpid
())
else
(* -- Code of the father *)
Printf.printf
"I am the father: %d of child: %d\n"
(Unix.getpid
())
pid
;;
flush
stdout
;;
Here is the trace of the execution of this program:
before fork : 10539
I am the father: 10539 of child: 10540
I am the child: 10540
It is also possible to use the return value for matching:
match
Unix.fork
()
with
0
->
Printf.printf
"I am the child: %d\n"
(Unix.getpid
())
|
pid
->
Printf.printf
"I am the father: %d of child: %d\n"
(Unix.getpid
())
pid
;;
The fertility of a process may be very big. Therefore the
number of descendents of a process is limited by the
configuration of the operating system. The following example
creates two generations of processes with grandparent, parents,
uncles and cousins.
let
pid0
=
Unix.getpid
();;
let
print_generation1
pid
ppid
=
Printf.printf
"I am %d, son of %d\n"
pid
ppid;
flush
stdout
;;
let
print_generation2
pid
ppid
pppid
=
Printf.printf
"I am %d, son of %d, grandson of %d\n"
pid
ppid
pppid;
flush
stdout
;;
match
Unix.fork()
with
0
->
let
pid01
=
Unix.getpid
()
in
(
match
Unix.fork()
with
0
->
print_generation2
(Unix.getpid
())
pid01
pid0
|
_
->
print_generation1
pid01
pid0)
|
_
->
match
Unix.fork
()
with
0
->
(
let
pid02
=
Unix.getpid
()
in
match
Unix.fork()
with
0
->
print_generation2
(Unix.getpid
())
pid02
pid0
|
_
->
print_generation1
pid02
pid0
)
|
_
->
Printf.printf
"I am %d, father and grandfather\n"
pid0
;;
We obtain:
I am 10644, father and grandfather
I am 10645, son of 10644
I am 10648, son of 10645, grandson of 10644
I am 10646, son of 10644
I am 10651, son of 10646, grandson of 10644
Order and Moment of Execution
A sequence of process creations without synchronization may lead to
surprising effects. This is illustrated by the following poem writing
program à la M. Jourdain4:
match
Unix.fork
()
with
0
->
Printf.printf
"fair Marquise "
;
flush
stdout
|
_
->
match
Unix.fork
()
with
0
->
Printf.printf
"your beautiful eyes "
;
flush
stdout
|
_
->
match
Unix.fork
()
with
0
->
Printf.printf
"make me die "
;
flush
stdout
|
_
->
Printf.printf
"of love\n"
;
flush
stdout
;;
It may produce the following result:
of love
fair Marquise your beautiful eyes make me die
We usually want our program to be able to assure the order of
execution of its processes. More generally speaking, an application
which makes use of several processes may have to synchronize them.
Depending on the model of parallelism in use, the synchronization
is realized by communication between the processes or by waiting
conditions. This subject is presented more profoundly by the two
following chapters. For the moment, we can improve our poem writing
program in two ways:
-
Give the child the time to write its phrase before writing
the own.
- Wait for the termination of the child, which will then have
written its phrase, before writing our own phrase.
Delays.
A process can suspend its activity by calling the function:
# Unix.sleep
;;
- : int -> unit = <fun>
The argument provides the number of seconds during which the process wants to
suspend its activities.
Using this function, we write:
match
Unix.fork
()
with
0
->
Printf.printf
"fair Marquise "
;
flush
stdout
|
_
->
Unix.sleep
1
;
match
Unix.fork
()
with
0
->
Printf.printf"your beautiful eyes "
;
flush
stdout
|
_
->
Unix.sleep
1
;
match
Unix.fork
()
with
0
->
Printf.printf"make me die "
;
flush
stdout
|
_
->
Unix.sleep
1
;
Printf.printf
"of love\n"
;
flush
stdout
;;
And we can obtain:
fair Marquise your beautiful eyes make me die of love
Nevertheless, this method is not sure. In theory, it would be
possible that the system gives enough time to one of the processes
to sleep and to write its output at the same turn. Therefore
we prefer the following method for assuring the execution order
of our processes.
GGH
Waiting for the termination of the child.
A parent process may wait for his child to
terminate through a call to the function:
# Unix.wait
;;
- : unit -> int * Unix.process_status = <fun>
The execution of the parent is suspended until one of its children
terminates. If wait is called by a process
not having any children, a Unix_error is thrown.
We will discuss later the return value of wait.
For the moment, we will just use the command to pronounce our poem:
match
Unix.fork
()
with
0
->
Printf.printf
"fair Marquise "
;
flush
stdout
|
_
->
ignore
(Unix.wait
())
;
match
Unix.fork
()
with
0
->
Printf.printf
"your beautiful eyes "
;
flush
stdout
|
_
->
ignore
(Unix.wait
())
;
match
Unix.fork
()
with
0
->
Printf.printf
"make me die "
;
flush
stdout
|
_
->
ignore
(Unix.wait
())
;
Printf.printf
"of love\n"
;
flush
stdout
Indeed, we obtain:
fair Marquise your beautiful eyes make me die of love
Warning
fork is proprietary to the Unix system
Descendence, Death and Funerals of Processes
The function wait is useful not only to wait for
the termination of a child. It also has the responsibility
to complete the death of the child process.
Whenever a process is created, the system adds an entry in a table.
The table serves to keep track of all processes. When a process
terminates, the entry does not disappear automatically in the table.
It is the responsibility of the parent to assure the deletion
by the call of wait. If this is not done, the child process
keeps an entry in the table. This is called a
zombie process.
When the system is started, a first process called init is
started. After the initialization of some parameters, the essential
role of this ``forefather'' is to take care of orphan processes
and to call the wait which deletes them from the process
table after their termination.
Waiting for the Termination of a Given Process
There is a variation of the function wait, named waitpid.
This command is supported on Unix and Windows:
# Unix.waitpid
;;
- : Unix.wait_flag list -> int -> int * Unix.process_status = <fun>
The first argument specifies the waiting modalities. The second
indicates which process or which group of processes are treated.
After the termination of a process, two pieces of information can be accessed
by its parent as a result of the function calls wait or
waitpid: the number of the terminated process and its exit status.
The status is represented by a value of type
Unix.process_status.
This type has three constructors. Each of them takes an integer
as argument.
-
WEXITED n: the process has terminated normally
with the return code n.
- WSIGNALED n: the process has been killed by the
signal n.
- WSTOPPED n: the process has been stopped by the signal
n.
The last value only makes sense for the function waitpid
which can listen for such signals as indicated by its first argument.
We will discuss signals and their treatment at page
??.
Managing of Waiting by Ancestors
In order to avoid having to care for the termination of child processes oneself,
it is possible to delegate this responsibility to an ancestor process.
``Double fork'' allows a process not to take care of the funerals of all
its child processes, but to delegate this responsibility to the init
process. Here is the principle: a process P0 creates a process
P1, which in turn creates a third process P2. Then P1
terminates. So P2 is orphan and will be adopted by init,
which waits for its termination. The initial process P0 can execute
a wait for P1 which will be of short duration. The idea is
to delegate to the grandchild the work which otherwise would have
been for the child.
The schema is the following:
# match
Unix.fork()
with
(* P0 creates P1 *)
0
->
if
Unix.fork()
=
0
then
exit
0
;
(* P1 creates P2 and terminates *)
Printf.printf
"P2 did its work\n"
;
exit
0
|
pid
->
ignore
(Unix.waitpid
[]
pid)
;
(* P0 waits for P1 to terminate *)
Printf.printf
"P0 can do other things without waiting\n"
;;
P2 did its work
P0 can do other things without waiting
- : unit = ()
We will apply this principle to handle requests sent to a server
in chapter 20.