We are designing a container which will be used
to hold some number of items of a given set *K*.
In this context,
we call the elements of the set *K* *keys* .
The general approach is to store the keys in an array.
The position of a key in the array is given by a function ,
called a *hash function* ,
which determines the position of a given key directly from that key.

In the general case,
we expect the size of the set of keys, |*K*|,
to be relatively large or even unbounded.
E.g., if the keys are 32-bit integers, then .
Similarly, if the keys are arbitrary character strings
of arbitrary length, then |*K*| is unbounded.

On the other hand, we also expect that the actual number of items
stored in the container to be significantly less than |*K*|.
I.e., if *n* is the number of items actually
stored in the container, then .
Therefore, it seems prudent to use an array of size *M*,
where *M* is as least as great as the maximum number of items to be stored
in the container.

Consequently, what we need is a function .
This function maps the set of values to be stored in the container
to subscripts in an array of length *M*.
This function is called a *hash function* .

In general, since ,
the mapping defined by hash function will be a
*many-to-one mapping* .
I.e., there will exist many pairs of distinct keys *x* and *y*,
such that ,
for which *h*(*x*)=*h*(*y*).
This situation is called a *collision*.
Several approaches for dealing with collisions are explored
in the following sections.

What are the characteristics of a good hash function?

- A good hash function avoids collisions.
- A good hash function tends to spread keys evenly in the array.
- A good hash function is easy to compute.

Copyright © 1997 by Bruno R. Preiss, P.Eng. All rights reserved.