Data Structures and Algorithms with Object-Oriented Design Patterns in C#

Hashing-The Basic Idea

In this chapter we examine data structures which are designed specifically with the objective of providing efficient insertion and find operations. In order to meet the design objective, certain concessions are made. Specifically, we do not require that there be any specific ordering of the items in the container. In addition, while we still require the ability to remove items from the container, it is not our primary objective to make removal as efficient as the insertion and find operations.

Ideally we would build a data structure for which both the insertion and find operations are O(1) in the worst case. However, this kind of performance can only be achieved with complete a priori knowledge. We need to know beforehand specifically which items are to be inserted into the container. Unfortunately, we do not have this information in the general case. So, if we cannot guarantee O(1) performance in the worst case, then we make it our design objective to achieve O(1) performance in the average case.

The constant time performance objective immediately leads us to the following conclusion: Our implementation must be based in some way on an array rather than a linked list. This is because we can access the element of an array in constant time, whereas the same operation in a linked list takes O(k) time.

In the previous chapter, we consider two searchable containers--the ordered list and the sorted list. In the case of an ordered list, the cost of an insertion is O(1) and the cost of the find operation is O(n). For a sorted list the cost of insertion is O(n) and the cost of the find operation is for the array implementation.

Clearly, neither the ordered list nor the sorted list meets our performance objectives. The essential problem is that a search, either linear or binary, is always necessary. In the ordered list, the find operation uses a linear search to locate the item. In the sorted list, a binary search can be used to locate the item because the data is sorted. However, in order to keep the data sorted, insertion becomes O(n).

In order to meet the performance objective of constant time insert and find operations, we need a way to do them without performing a search. That is, given an item x, we need to be able to determine directly from x the array position where it is to be stored.