Processes

Processes

Unix associates a process with each execution of a program. In [CDM98] Card, Dumas and Mével describe the difference between a program and a process: ``a program itself is not a process: a program is a passive entity (an executable file on a disc), while a process is an active entity with a counter specifying the next instruction to execute and a set of associated resources.''

Unix is a multi-task operating system: many processes may be executed at the same time. It is preemptive, which means that the execution of processes is entrusted to a particular process. A process is therefore not totally master of its resources. Especially a process can not determine the time of its execution. A process has to be created.

Each process has his own private memory space. Processes can communicate via files or communication channels. Thus the distributed memory model of parallelism is simulated on a single machine.

The system gives each process a unique identifier: the PID (Process IDentifier). Under Unix each process, except the initial process, is created by another process, which is called its parent.

The set of all active processes can be listed by the Unix command ps³:

$ ps -f
PID    PPID    CMD
1767   1763   csh
2797   1767   ps -f

The use of the option -f adds for each active process its identifier (PID), that of its parent (PPID) and the name of the started program (CMD). Here we have two processes, the command line interpreter csh and the command ps itself. It can be seen that ps has been started from the command line interpreter csh. The parent of its process is the process associated with the execution of csh.

Executing a Program

Execution Context

Three values are associated with an executing program, which is started from the command line:

The command line used to start it. It is contained in the value Sys.argv.
The environment variables of the command line interpreter. These can be accessed by the command Sys.getenv.
An execution status until the program is terminated.

Command line.

The command line allows you to read arguments or options of a program call. The behavior of the program may depend from these values. Here is a small example. We write the following program into the file argv_ex.ml:


if Array.length Sys.argv = 1 then
 Printf.printf "Hello world\n"
else if Array.length Sys.argv = 2 then
 Printf.printf "Hello %s\n" Sys.argv.(1)
else Printf.printf "%s : too many arguments\n" Sys.argv.(0)

We compile it:

$ ocamlc -o argv_ex argv_ex.ml

And we execute it:

$ argv_ex
Hello world
$ argv_ex reader
Hello reader
$ argv_ex dear reader
./argv_ex : too many arguments

Environment variables.

Environment variables may contain values necessary for execution. The number and the names of these variables depend on the operating system and on the user configuration. The values of these variables can be accessed by the function getenv, which takes as argument the name of a variable in form of a character string:


# Sys.getenv "HOSTNAME";;
- : string = "zinc.pps.jussieu.fr"

Execution Status

The return value of a program is generally a fixed integer, indicating if the program did terminate with an error or not. The exact values may differ from one operating system to another. The programer can always explicitly stop his program and return the execution status value with the function call:


# Pervasives.exit ;;
- : int -> 'a = <fun>

Process Creation

A program is started by another process, which is called the current process. The executed program becomes a new process. There are three different relations between the two processes:

The two processes are independent from each other and can be executed concurrently.
The parent process is waiting for the child process to terminate.
The created process replaces the parent process, which terminates.

It is also possible to duplicate the current process to obtain two instances. The two instances of the process do not differ but in their PID. This is the famous fork which we will describe later.

Independent Processes

The Unix module offers a portable function to create a process.


# Unix.create_process ;;
- : string ->
    string array ->
    Unix.file_descr -> Unix.file_descr -> Unix.file_descr -> int
= <fun>

The first argument is the name of the program (it may be a path). The second is the array of arguments for the program. The last three arguments are the descriptors indicating the standard input, standard output and standard error output of the process. The return value is the PID of the created process.

There also exists a variant of this function which allows you to indicate the values of environment variables:


# Unix.create_process_env ;;
- : string ->
    string array ->
    string array ->
    Unix.file_descr -> Unix.file_descr -> Unix.file_descr -> int
= <fun>

These two functions can be used under Unix and Windows.

GGH

Process Stacks

It is not always useful for a created process to be of concurrent nature. The parent process may have to wait for the created process to terminate. The two following functions take as argument the name of a command and execute it.


# Sys.command;;
- : string -> int = <fun>
# Unix.system;;
- : string -> Unix.process_status = <fun>

They differ in the type of the return code. The type process_status is explained in more detail on page ??. During the execution of the command the parent process is blocked.

Replacement of Current Processes

The replacement of current processes by freshly created processes allows you to limit the number of concurrently executed processes. The four following functions allow this:


# Unix.execv ;;
- : string -> string array -> unit = <fun>
# Unix.execve ;;
- : string -> string array -> string array -> unit = <fun>
# Unix.execvp ;;
- : string -> string array -> unit = <fun>
# Unix.execvpe ;;
- : string -> string array -> string array -> unit = <fun>

Their first argument is the name of the program. Using execvp or execvpe, this name may indicate a path in the file system. The second argument contains the program arguments. The last argument of the functions execve and execvpe additionally allows you to indicate the values of system variables.

Creation of Processes by Duplication

The original system call to create processes under Unix is:


# Unix.fork ;;
- : unit -> int = <fun>

The function fork starts a new process, not a new program. Its effect is to duplicate the calling process. The code of the new process is the same as that of its parent. Under Unix the same code can be shared by several processes, each process possessing its own execution context. Therefore we speak about reentrant code.

Let's look at the following small program (we use the function getpid which returns the PID of the process associated with the execution):

Printf.printf "before fork : %d\n" (Unix.getpid ())  ;;
flush stdout ;;
Unix.fork () ;;
Printf.printf "after fork : %d\n" (Unix.getpid ())  ;;
flush stdout ;;

We obtain the following output:

before fork : 10529
after fork : 10529
after fork : 10530

After the execution of fork, two processes execute the code. This leads to the output of two PID's ``after'' the fork. We note that one process has kept the PID of the beginning (the parent). The other one has a new PID (the child), which corresponds to the return value of the fork call. For the parent process the return value of fork is the PID of the child, while for the child, it is 0.

It is this difference in the return value of fork which allows in one program source to decide which code shall be executed by the child and which by the parent:

Printf.printf "before fork : %d\n" (Unix.getpid ())  ;;
flush stdout ;;
let pid = Unix.fork () ;;
if pid=0 then  (* -- Code of the child *)
  Printf.printf "I am  the child: %d\n" (Unix.getpid ())  
else           (* -- Code of the father *)
  Printf.printf "I am the father: %d of child: %d\n"  (Unix.getpid ()) pid ;;
flush stdout ;;

Here is the trace of the execution of this program:

before fork : 10539
I am the father: 10539 of child: 10540
I am  the child: 10540

It is also possible to use the return value for matching:

match Unix.fork () with
   0  -> Printf.printf "I am the child: %d\n" (Unix.getpid ())
| pid -> Printf.printf "I am the father: %d of child: %d\n" 
                       (Unix.getpid ()) pid ;;

The fertility of a process may be very big. Therefore the number of descendents of a process is limited by the configuration of the operating system. The following example creates two generations of processes with grandparent, parents, uncles and cousins.

let pid0 = Unix.getpid ();;
let print_generation1 pid ppid =
  Printf.printf "I am %d, son of %d\n" pid ppid;
  flush stdout ;;

let print_generation2 pid ppid pppid  =
  Printf.printf "I am %d, son of %d, grandson of %d\n" 
                 pid ppid pppid;
  flush stdout ;;

match Unix.fork() with
    0 -> let pid01 = Unix.getpid () 
         in ( match Unix.fork() with
                  0 -> print_generation2 (Unix.getpid ()) pid01 pid0 
                | _ -> print_generation1 pid01 pid0)
  | _ -> match Unix.fork () with
             0 -> ( let pid02 = Unix.getpid () 
                    in match Unix.fork() with
                           0 -> print_generation2 (Unix.getpid ()) pid02 pid0 
                         | _ -> print_generation1 pid02 pid0 )
           | _ -> Printf.printf "I am %d, father and grandfather\n" pid0 ;;

We obtain:

I am 10644, father and grandfather
I am 10645, son of 10644
I am 10648, son of 10645, grandson of 10644
I am 10646, son of 10644
I am 10651, son of 10646, grandson of 10644

Order and Moment of Execution

A sequence of process creations without synchronization may lead to surprising effects. This is illustrated by the following poem writing program à la M. Jourdain⁴:

match Unix.fork () with
  0 -> Printf.printf "fair Marquise " ; flush stdout
| _ -> match Unix.fork () with
           0 -> Printf.printf "your beautiful eyes " ; flush stdout
         | _ -> match Unix.fork () with
                     0 -> Printf.printf "make me die " ; flush stdout
                   | _ -> Printf.printf "of love\n" ;  flush stdout ;;

It may produce the following result:

of love
fair Marquise your beautiful eyes make me die

We usually want our program to be able to assure the order of execution of its processes. More generally speaking, an application which makes use of several processes may have to synchronize them. Depending on the model of parallelism in use, the synchronization is realized by communication between the processes or by waiting conditions. This subject is presented more profoundly by the two following chapters. For the moment, we can improve our poem writing program in two ways:

Give the child the time to write its phrase before writing the own.
Wait for the termination of the child, which will then have written its phrase, before writing our own phrase.

Delays.

A process can suspend its activity by calling the function:


# Unix.sleep ;;
- : int -> unit = <fun>

The argument provides the number of seconds during which the process wants to suspend its activities.

Using this function, we write:

match Unix.fork () with
    0 -> Printf.printf "fair Marquise " ; flush stdout 
  | _ -> Unix.sleep 1 ;
         match Unix.fork () with
             0 -> Printf.printf"your beautiful eyes "; flush stdout
           | _ -> Unix.sleep 1 ;
             match Unix.fork () with
                 0 -> Printf.printf"make me die "; flush stdout
               | _ -> Unix.sleep 1 ; Printf.printf "of love\n" ; flush stdout ;;

And we can obtain:

fair Marquise your beautiful eyes make me die of love

Nevertheless, this method is not sure. In theory, it would be possible that the system gives enough time to one of the processes to sleep and to write its output at the same turn. Therefore we prefer the following method for assuring the execution order of our processes.

GGH

Waiting for the termination of the child.

A parent process may wait for his child to terminate through a call to the function:


# Unix.wait ;;
- : unit -> int * Unix.process_status = <fun>

The execution of the parent is suspended until one of its children terminates. If wait is called by a process not having any children, a Unix_error is thrown. We will discuss later the return value of wait. For the moment, we will just use the command to pronounce our poem:

match Unix.fork () with
    0 -> Printf.printf "fair Marquise " ; flush stdout
  | _  -> ignore (Unix.wait ()) ;
          match Unix.fork () with
              0 -> Printf.printf "your beautiful eyes " ; flush stdout
            | _ -> ignore (Unix.wait ()) ;
                  match Unix.fork () with
                      0 -> Printf.printf "make me die " ; flush stdout
                    | _ -> ignore (Unix.wait ()) ;
                   Printf.printf "of love\n" ; 
                   flush stdout

Indeed, we obtain:

fair Marquise your beautiful eyes make me die of love

Warning

fork is proprietary to the Unix system

Descendence, Death and Funerals of Processes

The function wait is useful not only to wait for the termination of a child. It also has the responsibility to complete the death of the child process.

Whenever a process is created, the system adds an entry in a table. The table serves to keep track of all processes. When a process terminates, the entry does not disappear automatically in the table. It is the responsibility of the parent to assure the deletion by the call of wait. If this is not done, the child process keeps an entry in the table. This is called a zombie process.

When the system is started, a first process called init is started. After the initialization of some parameters, the essential role of this ``forefather'' is to take care of orphan processes and to call the wait which deletes them from the process table after their termination.

Waiting for the Termination of a Given Process

There is a variation of the function wait, named waitpid. This command is supported on Unix and Windows:


# Unix.waitpid ;;
- : Unix.wait_flag list -> int -> int * Unix.process_status = <fun>

The first argument specifies the waiting modalities. The second indicates which process or which group of processes are treated.

After the termination of a process, two pieces of information can be accessed by its parent as a result of the function calls wait or waitpid: the number of the terminated process and its exit status. The status is represented by a value of type Unix.process_status. This type has three constructors. Each of them takes an integer as argument.

WEXITED n: the process has terminated normally with the return code n.
WSIGNALED n: the process has been killed by the signal n.
WSTOPPED n: the process has been stopped by the signal n.

The last value only makes sense for the function waitpid which can listen for such signals as indicated by its first argument. We will discuss signals and their treatment at page ??.

Managing of Waiting by Ancestors

In order to avoid having to care for the termination of child processes oneself, it is possible to delegate this responsibility to an ancestor process. ``Double fork'' allows a process not to take care of the funerals of all its child processes, but to delegate this responsibility to the init process. Here is the principle: a process P₀ creates a process P₁, which in turn creates a third process P₂. Then P₁ terminates. So P₂ is orphan and will be adopted by init, which waits for its termination. The initial process P₀ can execute a wait for P₁ which will be of short duration. The idea is to delegate to the grandchild the work which otherwise would have been for the child.

The schema is the following:


# match Unix.fork() with                       (* P0 creates P1 *)
   0 -> if Unix.fork() = 0 then exit 0 ;      (* P1 creates P2 and terminates *)
        Printf.printf "P2 did its work\n" ;
        exit 0 
 | pid -> ignore (Unix.waitpid [] pid) ;      (* P0 waits for P1 to terminate *)
          Printf.printf "P0 can do other things without waiting\n" ;;
P2 did its work
P0 can do other things without waiting
- : unit = ()

We will apply this principle to handle requests sent to a server in chapter 20.