Logo Data Structures and Algorithms with Object-Oriented Design Patterns in C++
next up previous contents index

Removing Items

Removing items from a scatter table using open addressing has to be done with some care. The naïve approach would be to locate the item to be removed and then change the state of its location to empty. However, that approach does not work! Recall that the FindMatch routine which is used to locate an item stops its search when it encounters an empty cell. Therefore, if we change the state of a cell in the middle of a cluster to empty, all subsequent searches in that cluster will stop at the empty cell. As a result, subsequent searches for an object may fail even when the target is still in the table!

One way to deal with this is to make use of the third state, deleted. Instead of marking a location empty, we mark it deleted when an item is deleted. Recall that that the FindMatch routine was written in such a way that it continues past deleted cells in its search. Also, the FindUnoccupied routine was written to stop its search when it encounters either an empty or a deleted location. Consequently, the positions marked deleted are available for reuse when insertion is done.

Program gif gives the implementation of the Withdraw. The Withdraw function takes a reference to an Object and removes it from the scatter table. It does so by first locating the specific object instance using FindInstance and then marking the location deleted. The implementation of FindInstance has been elided. It is simply a trivial variation of the FindMatch routine.

   program14071
Program: OpenScatterTable Class Withdraw Member Function Definition

The running time of the Withdraw routine is determined by that of FindInstance. In the worst case FindInstance has to examine every array position. Therefore, the running time of Withdraw is tex2html_wrap_inline62946.

There is a very serious problem with the technique of marking locations as deleted. After a large number of insertions and deletions have been done, it will be very likely that there will be no cells left that are marked empty. This is because, nowhere in any of the routines (except Purge) is a cell ever marked empty! This has the very unfortunate consequence that an unsuccessful search, i.e., a search for an object which is not in the scatter table, is tex2html_wrap_inline63214. Recall that FindMatch examines at most M array locations and only stops its search early when an empty location is encountered. Since there are no more empty locations, the search must examine all M locations.

If we are using the scatter table in an application in which we know a priori that no items will be removed, or perhaps only a very small number of items will be removed, then the Withdraw routine given in Program gif will suffice. However, if the application is such that a significant number of withdrawals will be made, a better implementation is required.

Ideally, when removing an item the scatter table ends up exactly as it would have appeared had that item never been inserted in the first place. Note that exactly the same constraint is met by the Withdraw function for the ChainedScatterTable class given in Program gif. It turns out that a variation of that algorithm can be used to implement the Withdraw function for the OpenScatterTable class as shown in Program gif.

   program14099
Program: OpenScatterTable Class Alternate Withdraw Member Function Definition

The algorithm begins by checking that the scatter table is not empty. Then it calls FindInstance to determine the position i of the item to be removed. If the item to be removed is not in the scatter table FindInstance returns length and an exception is thrown. Otherwise, FindInstance falls between 0 and tex2html_wrap_inline63222, which indicates that the item was found.

In the general case, the item to be deleted falls in the middle of a cluster. Deleting it would create a hole in the middle of the cluster. What we need to do is to find another item further down in the cluster which can be moved up to fill in the hole that would be created when the item at position i is deleted. The purpose of the loop on lines 11-19 is to find the position j of an item which can be moved safely into position i. Note the implementation here implicitly assumes that a linear probing sequence is used--the C function is not called explicitly. An item at position j can be moved safely to position i only if the hash value of the item at position j is not cyclically contained in the interval between i and j.

If an item is found at some position j that can be moved safely, then that item is moved to position i on line 22. The effect of moving the item at position j to position i, is to move the hole from position i to position j (line 23). Therefore, another iteration of the main loop (lines 8-24) is needed to fill in the relocated hole in the cluster.

If no item can be found to fill in the hole, then it is safe to split the cluster in two. Eventually, either because no item can be found to fill in the hole or because the hole has moved to the end of the cluster, there is nothing more to do other than to delete the hole. Thus, on lines 25-26 the entry at position i is set to empty and the associated object pointer is set to zero. Notice that the third state deleted is not required in this implementation of Withdraw.

If we use the Withdraw implementation of Program gif, the scatter table entries will only ever be in one of two two states--occupied or empty. Consequently, we can improve the bound on the worst-case for the search from tex2html_wrap_inline63210 to tex2html_wrap_inline63226, where n is the number of items in the scatter table.

Determining the running time of Program gif is a little tricky. Assuming the item to be deleted is actually in the table, the running time to find the position of that item (lines 3-7) is tex2html_wrap_inline62846, where tex2html_wrap_inline61308 is the number of item actually in the scatter table. In the worst case, the scatter table is comprised of a single cluster of n items, and we are deleting the first item of the cluster. In this case, the main loop on lines 8-24 makes one pass through the entire cluster, in the worst case moving the hole to the end of the cluster one position at at time. Thus, the running time of the main loop is tex2html_wrap_inline63236. The remaining lines require a constant amount of additional time. Altogether, the running time for the Withdraw function is tex2html_wrap_inline63238 in the worst case.


next up previous contents index

Bruno Copyright © 1997 by Bruno R. Preiss, P.Eng. All rights reserved.