Data Structures and Algorithms with Object-Oriented Design Patterns in C#
next up previous contents index

Removing Items

Removing items from a scatter table using open addressing has to be done with some care. The naïve approach would be to locate the item to be removed and then change the state of its location to EMPTY. However, that approach does not work! Recall that the FindMatch method which is used to locate an item stops its search when it encounters an EMPTY cell. Therefore, if we change the state of a cell in the middle of a cluster to EMPTY, all subsequent searches in that cluster will stop at the empty cell. As a result, subsequent searches for an object may fail even when the target is still in the table!

One way to deal with this is to make use of the third state, DELETED. Instead of marking a location EMPTY, we mark it DELETED when an item is deleted. Recall that that the FindMatch method was written in such a way that it continues past deleted cells in its search. Also, the FindUnoccupied method was written to stop its search when it encounters either an EMPTY or a DELETED location. Consequently, the positions marked DELETED are available for reuse when insertion is done.

Program gif gives the implementation of the Withdraw. The Withdraw method takes a ComparableObject and removes that object from the scatter table. It does so by first locating the specific object instance using FindInstance and then marking the location DELETED. The implementation of FindInstance has been elided. It is simply a trivial variation of the FindMatch method.

   program13415
Program: OpenScatterTable Class Withdraw method.

The running time of the Withdraw method is determined by that of FindInstance. In the worst case FindInstance has to examine every array position. Therefore, the running time of Withdraw is tex2html_wrap_inline62097.

There is a very serious problem with the technique of marking locations as DELETED. After a large number of insertions and deletions have been done, it is very likely that there are no cells left that are marked EMPTY. This is because, nowhere in any of the methods (except Purge) is a cell ever marked EMPTY! This has the very unfortunate consequence that an unsuccessful search, i.e., a search for an object which is not in the scatter table, is tex2html_wrap_inline62357. Recall that FindMatch examines at most M array locations and only stops its search early when an EMPTY location is encountered. Since there are no more empty locations, the search must examine all M locations.

If we are using the scatter table in an application in which we know a priori that no items will be removed, or perhaps only a very small number of items will be removed, then the Withdraw method given in Program gif will suffice. However, if the application is such that a significant number of withdrawals will be made, a better implementation is required.

Ideally, when removing an item the scatter table ends up exactly as it would have appeared had that item never been inserted in the first place. Note that exactly the same constraint is met by the Withdraw method for the ChainedScatterTable class given in Program gif. It turns out that a variation of that algorithm can be used to implement the Withdraw method for the OpenScatterTable class as shown in Program gif.

   program13443
Program: OpenScatterTableV2 Withdraw method.

The algorithm begins by checking that the scatter table is not empty. Then it calls FindInstance to determine the position i of the item to be removed. If the item to be removed is not in the scatter table FindInstance returns -1 and an exception is thrown. Otherwise, FindInstance falls between 0 and M-1, which indicates that the item was found.

In the general case, the item to be deleted falls in the middle of a cluster. Deleting it would create a hole in the middle of the cluster. What we need to do is to find another item further down in the cluster which can be moved up to fill in the hole that would be created when the item at position i is deleted. The purpose of the loop on lines 12-20 is to find the position j of an item which can be moved safely into position i. Note the implementation here implicitly assumes that a linear probing sequence is used--the C method is not called explicitly. An item at position j can be moved safely to position i only if the hash value of the item at position j is not cyclically contained in the interval between i and j.

If an item is found at some position j that can be moved safely, then that item is moved to position i on line 23. The effect of moving the item at position j to position i, is to move the hole from position i to position j (line 24). Therefore, another iteration of the main loop (lines 10-25) is needed to fill in the relocated hole in the cluster.

If no item can be found to fill in the hole, then it is safe to split the cluster in two. Eventually, either because no item can be found to fill in the hole or because the hole has moved to the end of the cluster, there is nothing more to do other than to delete the hole. Thus, on line 26 the entry at position i is set to EMPTY and the associated obj is set to null. Notice that the third state DELETED is not required in this implementation of Withdraw.

If we use the Withdraw implementation of Program gif, the scatter table entries will only ever be in one of two two states--OCCUPIED or EMPTY. Consequently, we can improve the bound on the worst-case for the search from tex2html_wrap_inline62353 to tex2html_wrap_inline62371, where n is the number of items in the scatter table.

Determining the running time of Program gif is a little tricky. Assuming the item to be deleted is actually in the table, the running time to find the position of that item (line 7) is tex2html_wrap_inline61997, where tex2html_wrap_inline60465 is the number of item actually in the scatter table. In the worst case, the scatter table is comprised of a single cluster of n items, and we are deleting the first item of the cluster. In this case, the main loop on lines 10-25 makes a pass through the entire cluster, in the worst case moving the hole to the end of the cluster one position at at time. Thus, the running time of the main loop is tex2html_wrap_inline62381. The remaining lines require a constant amount of additional time. Altogether, the running time for the Withdraw method is tex2html_wrap_inline62383 in the worst case.


next up previous contents index

Bruno Copyright © 2001 by Bruno R. Preiss, P.Eng. All rights reserved.