Cover Data Structures and Algorithms with Object-Oriented Design Patterns in Java
next up previous contents index

Keys and Hash Functions

We are designing a container which will be used to hold some number of items of a given set K. In this context, we call the elements of the set K keys . The general approach is to store the keys in an array. The position of a key in the array is given by a function tex2html_wrap_inline58138, called a hash function , which determines the position of a given key directly from that key.

In the general case, we expect the size of the set of keys, |K|, to be relatively large or even unbounded. For example, if the keys are 32-bit integers, then tex2html_wrap_inline61070. Similarly, if the keys are arbitrary character strings of arbitrary length, then |K| is unbounded.

On the other hand, we also expect that the actual number of items stored in the container to be significantly less than |K|. That is, if n is the number of items actually stored in the container, then tex2html_wrap_inline61078. Therefore, it seems prudent to use an array of size M, where M is as least as great as the maximum number of items to be stored in the container.

Consequently, what we need is a function tex2html_wrap_inline61084. This function maps the set of values to be stored in the container to subscripts in an array of length M. This function is called a hash function .

In general, since tex2html_wrap_inline61088, the mapping defined by hash function will be a many-to-one mapping . That is, there will exist many pairs of distinct keys x and y, such that tex2html_wrap_inline61094, for which h(x)=h(y). This situation is called a collision. Several approaches for dealing with collisions are explored in the following sections.

What are the characteristics of a good hash function?




next up previous contents index

Bruno Copyright © 1998 by Bruno R. Preiss, P.Eng. All rights reserved.