Chapter 4 |
1. This construct makes <bst.h 24> idempotent (see idempotent), that is, including it many times has the same effect as including it once. This is important because some C constructs, such as type definitions with typedef, are erroneous if included in a program multiple times.
Of course, <Table assertion function control directives 593> is included outside the #ifndef-protected part of <bst.h 24>. This is intentional (see Exercise 2.9-1 for details).
1. Under many circumstances we often want to know how many items are in a binary tree. In these cases it's cheaper to keep track of item counts as we go instead of counting them each time, which requires a full binary tree traversal.
It would be better to omit it if we never needed to know how many items were in the tree, or if we only needed to know very seldom.
1. The purpose for conditional definition of BST_MAX_HEIGHT is not to keep it from being redefined if the header file is included multiple times. There's a higher-level “include guard” for that (see Exercise 4-1), and, besides, identical definitions of a macro are okay in C. Instead, it is to allow the user to set the maximum height of binary trees by defining that macro before <bst.h 24> is #included. The limit can be adjusted upward for larger computers or downward for smaller ones.
The main pitfall is that a user program will use different values of BST_MAX_HEIGHT in different source files. This leads to undefined behavior. Less of a problem are definitions to invalid values, which will be caught at compile time by the compiler.
2. The functions need to adjust the pointer from the rotated subtree's parent, so they take a double-pointer struct bst_node **. An alternative would be to accept two parameters: the rotated subtree's parent node and the bst_link[] index of the subtree.
/* Rotates right at *yp. */ static void
rotate_right (struct bst_node **yp)
{ struct bst_node *y = *yp; struct bst_node *x = y->bst_link[0]; y->bst_link[0] = x->bst_link[1]; x->bst_link[1] = y; *yp = x; }
/* Rotates left at *xp. */ static void
rotate_left (struct bst_node **xp)
{ struct bst_node *x = *xp; struct bst_node *y = x->bst_link[1]; x->bst_link[1] = y->bst_link[0]; y->bst_link[0] = x; *xp = y; }
1. This is a dirty trick. The bst_root member of struct bst_table is not a struct bst_node, but we are pretending that it is by casting its address to struct bst_node *. We can get away with this only because the first member of struct bst_node * is bst_link, whose first element bst_link[0] is a struct bst_node *, the same type as bst_root. ANSI C guarantees that a pointer to a structure is a pointer to the structure's first member, so this is fine as long as we never try to access any member of *p except bst_link[0]. Trying to access other members would result in undefined behavior.
The reason that we want to do this at all is that it means that the tree's root is not a special case. Otherwise, we have to deal with the root separately from the rest of the nodes in the tree, because of its special status as the only node in the tree not pointed to by the bst_link[] member of a struct bst_node.
It is a good idea to get used to these kinds of pointer cast, because they are common in libavl.
As an alternative, we can declare an actual instance of struct bst_node, store the tree's bst_root into its bst_link[0], and copy its possibly updated value back into bst_root when done. This isn't very elegant, but it works. This technique is used much later in this book, in <TBST main copy function 279>. A different kind of alternative approach is used in Exercise 2.
2. Here, pointer-to-pointer q traverses the tree, starting with a pointer to the root, comparing each node found against item while looking for a null pointer. If an item equal to item is found, it returns a pointer to the item's data. Otherwise, q receives the address of the NULL pointer that becomes the new node, the new node is created, and a pointer to its data is returned.
620. <BST item insertion function, alternate version 620> = void **
bst_probe (struct bst_table *tree, void *item)
{ struct bst_node **q; int cmp; assert (tree != NULL && item != NULL); for (q = &tree->bst_root; *q != NULL; q = &(*q)->bst_link[cmp > 0])
{ cmp = tree->bst_compare (item, (*q)->bst_data, tree->bst_param); if (cmp == 0) return &(*q)->bst_data; } *q = tree->bst_alloc->libavl_malloc (tree->bst_alloc, sizeof **q); if (*q == NULL) return NULL; (*q)->bst_link[0] = (*q)->bst_link[1] = NULL; (*q)->bst_data = item; tree->bst_count++; return &(*q)->bst_data; }
3. The first item to be inserted have the value of the original tree's root. After that, at each step, we can insert either an item with the value of either child x of any node in the original tree corresponding to a node y already in the copy tree, as long as x's value is not already in the copy tree.
4. The function below traverses tree in “level order”. That is, it visits the root, then the root's children, then the children of the root's children, and so on, so that all the nodes at a particular level in the tree are visited in sequence.
See also: [Sedgewick 1998], Program 5.16.
621. <Level-order traversal 621> = /* Calls visit for each of the nodes in tree in level order. Returns nonzero if successful, zero if out of memory. */ static int
bst_traverse_level_order (struct bst_table *tree, bst_item_func *visit)
{ struct bst_node **queue; size_t head, tail; if (tree->bst_count == 0) return 1; queue = tree->bst_alloc->libavl_malloc (tree->bst_alloc,
sizeof *queue * tree->bst_count); if (queue == NULL) return 0; head = tail = 0; queue[head++] = tree->bst_root; while (head != tail)
{ struct bst_node *cur = queue[tail++]; visit (cur->bst_data, tree->bst_param); if (cur->bst_link[0] != NULL) queue[head++] = cur->bst_link[0]; if (cur->bst_link[1] != NULL) queue[head++] = cur->bst_link[1]; } tree->bst_alloc->libavl_free (tree->bst_alloc, queue); return 1; }
622. <Root insertion of existing node in arbitrary subtree 622> = /* Performs root insertion of n at root within tree. Subtree root must not contain a node matching n. Returns nonzero only if successful. */ static int
root_insert (struct bst_table *tree, struct bst_node **root, struct bst_node *n)
{ struct bst_node *pa[BST_MAX_HEIGHT]; /* Nodes on stack. */ unsigned char da[BST_MAX_HEIGHT]; /* Directions moved from stack nodes. */ int k; /* Stack height. */ struct bst_node *p; /* Traverses tree looking for insertion point. */ assert (tree != NULL && n != NULL); <Step 1: Search for insertion point in arbitrary subtree 623> <Step 2: Insert n into arbitrary subtree 624> <Step 3: Move BST node to root 36> return 1; }
623. <Step 1: Search for insertion point in arbitrary subtree 623> = pa[0] = (struct bst_node *) root; da[0] = 0; k = 1; for (p = *root; p != NULL; p = p->bst_link[da[k - 1]])
{ int cmp = tree->bst_compare (n->bst_data, p->bst_data, tree->bst_param); assert (cmp != 0); if (k >= BST_MAX_HEIGHT) return 0; pa[k] = p; da[k++] = cmp > 0; }
This code is included in 622.
624. <Step 2: Insert n into arbitrary subtree 624> = pa[k - 1]->bst_link[da[k - 1]] = n;
This code is included in 622 and 625.
2. The idea is to optimize for the common case but allow for fallback to a slower algorithm that doesn't require a stack when necessary.
625. <Robust root insertion of existing node in arbitrary subtree 625> = /* Performs root insertion of n at root within tree. Subtree root must not contain a node matching n. Never fails and will not rebalance tree. */ static void
root_insert (struct bst_table *tree, struct bst_node **root, struct bst_node *n)
{ struct bst_node *pa[BST_MAX_HEIGHT]; /* Nodes on stack. */ unsigned char da[BST_MAX_HEIGHT]; /* Directions moved from stack nodes. */ int k; /* Stack height. */ int overflow = 0; /* Set nonzero if stack overflowed. */ struct bst_node *p; /* Traverses tree looking for insertion point. */ assert (tree != NULL && n != NULL); <Step 1: Robustly search for insertion point in arbitrary subtree 626> <Step 2: Insert n into arbitrary subtree 624> <Step 3: Robustly move BST node to root 627> }
If the stack overflows while we're searching for the insertion point, we stop keeping track of any nodes but the last one and set overflow so that later we know that overflow occurred:
626. <Step 1: Robustly search for insertion point in arbitrary subtree 626> = pa[0] = (struct bst_node *) root; da[0] = 0; k = 1; for (p = *root; p != NULL; p = p->bst_link[da[k - 1]])
{ int cmp = tree->bst_compare (n->bst_data, p->bst_data, tree->bst_param); assert (cmp != 0); if (k >= BST_MAX_HEIGHT)
{ overflow = 1; k–; } pa[k] = p; da[k++] = cmp > 0; }
This code is included in 625.
Once we've inserted the node, we deal with the rotation in the same way as before if there was no overflow. If overflow occurred, we instead do the rotations one by one, with a full traversal from *root every time:
627. <Step 3: Robustly move BST node to root 627> = if (!overflow) {
<Step 3: Move BST node to root 36>
} else
{ while (*root != n)
{ struct bst_node **r; /* Link to node to rotate. */ struct bst_node *q; /* Node to rotate. */ int dir; for (r = root; ; r = &q->bst_link[dir])
{ q = *r; dir = 0 < tree->bst_compare (n->bst_data, q->bst_data,
tree->bst_param); if (q->bst_link[dir] == n) break; } if (dir == 0)
{ q->bst_link[0] = n->bst_link[1]; n->bst_link[1] = q; }
else
{ q->bst_link[1] = n->bst_link[0]; n->bst_link[0] = q; } *r = n; } }
This code is included in 625.
3. One insertion order that does not require much stack is ascending order. If we insert 1...4 at the root in ascending order, for instance, we get a BST that looks like this:
If we then insert node 5, it will immediately be inserted as the right child of 4, and then a left rotation will make it the root, and we're back where we started without ever using more than one stack entry. Other obvious pathological orders such as descending order and “zig-zag” order behave similarly.
One insertion order that does require an arbitrary amount of stack space is to first insert 1...n in ascending order, then the single item 0. Each of the first group of insertions requires only one stack entry (except the first, which does not use any), but the final insertion uses n - 1.
If we're interested in high average consumption of stack space, the pattern consisting of a series of ascending insertions (n / 2 + 1)...n followed by a second ascending series 1...(n / 2), for even n, is most effective. For instance, each insertion for insertion order 6, 7, 8, 9, 10, 1, 2, 3, 4, 5 requires 0, 1, 1, 1, 1, 5, 6, 6, 6, 6 stack entries, respectively, for a total of 33.
These are, incidentally, the best possible results in each category, as determined by exhaustive search over the 10! == 3,628,800 possible root insertion orders for trees of 10 nodes. (Thanks to Richard Heathfield for suggesting exhaustive search.)
1. Add this before the top-level else clause in <Step 2: Delete BST node 39>:
628. <Case 1.5 in BST deletion 628> = else if (p->bst_link[0] == NULL)
q->bst_link[dir] = p->bst_link[1];
2. Be sure to look at Exercise 3 before actually making this change.
629. <Case 3 in BST deletion, alternate version 629> = struct bst_node *s = r->bst_link[0]; while (s->bst_link[0] != NULL)
{ r = s; s = r->bst_link[0]; } p->bst_data = s->bst_data; r->bst_link[0] = s->bst_link[1]; p = s;
We could, indeed, make similar changes to the other cases, but for these cases the code would become more complicated, not simpler.
3. The semantics for libavl traversers only invalidate traversers with the deleted item selected, but the revised code would actually free the node of the successor to that item. Because struct bst_traverser keeps a pointer to the struct bst_node of the current item, attempts to use a traverser that had selected the successor of the deleted item would result in undefined behavior.
Some other binary tree libraries have looser semantics on their traversers, so they can afford to use this technique.
1. It would probably be faster to check before each call rather than after, because this way many calls would be avoided. However, it might be more difficult to maintain the code, because we would have to remember to check for a null pointer before every call. For instance, the call to traverse_recursive() within walk() might easily be overlooked. Which is “better” is therefore a toss-up, dependent on a program's goals and the programmer's esthetic sense.
630. <Recursive traversal of BST, using nested function 630> = void
walk (struct bst_table *tree, bst_item_func *action, void *param)
{ void
traverse_recursive (struct bst_node *node)
{ if (node != NULL)
{ traverse_recursive (node->bst_link[0]); action (node->bst_data, param); traverse_recursive (node->bst_link[1]); } } assert (tree != NULL && action != NULL); traverse_recursive (tree->bst_root); }
1a. First of all, a minimal-height binary tree of n nodes has a height (see height) of about log2(n), that is, starting from the root and moving only downward, you can visit at most n nodes (including the root) without running out of nodes. Examination of the code should reveal to you that only moving down to the left pushes nodes on the stack and only moving upward pops nodes off. What's more, the first thing the code does is move as far down to the left as it can. So, the maximum height of the stack in a minimum-height binary tree of n nodes is the binary tree's height, or, again, about log2(n).
1b. If a binary tree has only left children, as does the BST on the left below, the stack will grow as tall as the tree, to a height of n. Conversely, if a binary tree has only right children, as does the BST on the right below, no nodes will be pushed onto the stack at all.
1c. It's only acceptable if it's known that the stack will not exceed the fixed maximum height (or if the program aborting with an error is itself acceptable). Otherwise, you should use a recursive method (but see part (e) below), or a dynamically extended stack, or a balanced binary tree library.
1d. Keep in mind this is not the only way or necessarily the best way to handle stack overflow. Our final code for tree traversal will rebalance the tree when it grows too tall.
631. <Iterative traversal of BST, with dynamically allocated stack 631> = static void
traverse_iterative (struct bst_node *node, bst_item_func *action, void *param)
{ struct bst_node **stack = NULL; size_t height = 0; size_t max_height = 0; for (;;)
{ while (node != NULL)
{ if (height >= max_height)
{ max_height = max_height * 2 + 8; stack = realloc (stack, sizeof *stack * max_height); if (stack == NULL)
{ fprintf (stderr, "out of memory\n"); exit (EXIT_FAILURE); } } stack[height++] = node; node = node->bst_link[0]; } if (height == 0) break; node = stack[–height]; action (node->bst_data, param); node = node->bst_link[1]; } free (stack); }
1e. Yes, traverse_recursive() can run out of memory, because its arguments must be stored somewhere by the compiler. Given typical compilers, it will consume more memory per call than traverse_iterative() will per item on the stack, because each call includes two arguments not pushed on traverse_iterative()'s stack, plus any needed compiler-specific bookkeeping information.
1. After calling bst_balance(), the structure of the binary tree may have changed completely, so we need to “find our place” again by setting up the traverser structure as if the traversal had been done on the rebalanced tree all along. Specifically, members node, stack[], and height of struct traverser need to be updated.
It is easy to set up struct traverser in this way, given the previous node in inorder traversal, which we'll call prev. Simply search the tree from the new root to find this node. Along the way, because the stack is used to record nodes whose left subtree we are examining, push nodes onto the stack as we move left down the tree. Member node receives prev->bst_link[1], just as it would have if no overflow had occurred.
A small problem with this approach is that it requires knowing the previous node in inorder, which is neither explicitly noted in struct traverser nor easy to find out. But it is easy to find out the next node: it is the smallest-valued node in the binary tree rooted at the node we were considering when the stack overflowed. (If you need convincing, refer to the code for next_item() above: the while loop descends to the left, pushing nodes as it goes, until it hits a NULL pointer, then the node pushed last is popped and returned.) So we can return this as the next node in inorder while setting up the traverser to return the nodes after it.
Here's the code:
632. <Handle stack overflow during BST traversal 632> = struct bst_node *prev, *iter; prev = node; while (prev->bst_link[0] != NULL) prev = prev->bst_link[0]; bst_balance (trav->table); trav->height = 0; for (iter = trav->table->bst_root; iter != prev; ) if (trav->table->bst_compare (prev->bst_data, iter->bst_data,
trav->table->bst_param) < 0)
{ trav->stack[trav->height++] = iter; iter = iter->bst_link[0]; } else
iter = iter->bst_link[1]; trav->node = iter->bst_link[1]; return prev->bst_data;
Without this code, it is not necessary to have member table in struct traverser.
2. It is possible to write prev_item() given our current next_item(), but the result is not very efficient, for two reasons, both related to the way that struct traverser is used. First, the structure doesn't contain a pointer to the current item. Second, its stack doesn't contain pointers to trees that must be descended to the left to find a predecessor node, only those that must be descended to the right to find a successor node.
The next section will develop an alternate, more general method for traversal that avoids these problems.
1. The bst_probe() function can't disturb any traversals. A change in the tree is only problematic for a traverser if it deletes the currently selected node (which is explicitly undefined: see Traversers) or if it shuffles around any of the nodes that are on the traverser's stack. An insertion into a tree only creates new leaves, so it can't cause either of those problems, and there's no need to increment the generation number.
The same logic applies to bst_t_insert(), presented later.
On the other hand, an insertion into the AVL and red-black trees discussed in the next two chapters can cause restructuring of the tree and thus potentially disturb ongoing traversals. For this reason, the insertion functions for AVL and red-black trees will increment the tree's generation number.
2. First, trav_refresh() is only called from bst_t_next() and bst_t_prev(), and these functions are mirrors of each other, so we need only show it for one of them.
Second, all of the traverser functions check the stack height, so these will not cause an item to be initialized at too high a height, nor will bst_t_next() or bst_t_prev() increase the stack height above its limit.
Since the traverser functions won't force a too-tall stack directly, this leaves the other functions. Only functions that modify the tree could cause problems, by pushing an item farther down in the tree.
There are only four functions that modify a tree. The insertion functions bst_probe() and bst_t_insert() can't cause problems, because they add leaves but never move around nodes. The deletion function bst_delete() does move around nodes in case 3, but it always moves them higher in the tree, never lower. Finally, bst_balance() always ensures that all nodes in the resultant tree are within the tree's height limit.
3. This won't work because the stack may contain pointers to nodes that have been deleted and whose memory have been freed. In ANSI C89 and C99, any use of a pointer to an object after the end of its lifetime results in undefined behavior, even seemingly innocuous uses such as pointer comparisons. What's worse, the memory for the node may already have been recycled for use for another, different node elsewhere in the tree.
This approach does work if there are never any deletions in the tree, or if we use some kind of generation number for each node that we store along with each stack entry. The latter would be overkill unless comparisons are very expensive and the traversals in changing trees are common. Another possibility would be to somehow only select this behavior if there have been no deletions in the binary tree since the traverser was last used. This could be done, for instance, with a second generation number in the binary tree incremented only on deletions, with a corresponding number kept in the traverser.
The following reimplements trav_refresh() to include this optimization. As noted, it will not work if there are any deletions in the tree. It does work for traversers that must be refreshed due to, e.g., rebalancing.
633. <BST traverser refresher, with caching 633> = /* Refreshes the stack of parent pointers in trav and updates its generation number. Will *not* work if any deletions have occurred in the tree. */ static void
trav_refresh (struct bst_traverser *trav)
{ assert (trav != NULL); trav->bst_generation = trav->bst_table->bst_generation; if (trav->bst_node != NULL)
{ bst_comparison_func *cmp = trav->bst_table->bst_compare; void *param = trav->bst_table->bst_param; struct bst_node *node = trav->bst_node; struct bst_node *i = trav->bst_table->bst_root; size_t height = 0; if (trav->bst_height > 0 && i == trav->bst_stack[0]) for (; height < trav->bst_height; height++)
{ struct bst_node *next = trav->bst_stack[height + 1]; if (i->bst_link[0] != next && i->bst_link[1] != next) break; i = next; } while (i != node)
{ assert (height < BST_MAX_HEIGHT); assert (i != NULL); trav->bst_stack[height++] = i; i = i->bst_link[cmp (node->bst_data, i->bst_data, param) > 0]; } trav->bst_height = height; } }
1. It only calls itself if it runs out of stack space. Its call to bst_balance() right before the recursive call ensures that the tree is short enough to fit within the stack, so the recursive call cannot overflow.
1. The assignment statements are harmless, but memcpy() of overlapping regions produces undefined behavior.
1a. Notice the use of & instead of && below. This ensures that both link fields get initialized, so that deallocation can be done in a simple way. If && were used instead then we wouldn't have any way to tell whether (*y)->bst_link[1] had been initialized.
634. <Robust recursive copy of BST, take 1 634> = /* Stores in *y a new copy of tree rooted at x. Returns nonzero if successful, or zero if memory was exhausted.*/ static int
bst_robust_copy_recursive_1 (struct bst_node *x, struct bst_node **y)
{ if (x != NULL)
{ *y = malloc (sizeof **y); if (*y == NULL) return 0; (*y)->bst_data = x->bst_data; if (!(bst_robust_copy_recursive_1 (x->bst_link[0], &(*y)->bst_link[0]) & bst_robust_copy_recursive_1 (x->bst_link[1],
&(*y)->bst_link[1])))
{ bst_deallocate_recursive (*y); *y = NULL; return 0; } } else
*y = NULL; return 1; }
Here's a needed auxiliary function:
635. <Recursive deallocation function 635> = static void
bst_deallocate_recursive (struct bst_node *node)
{ if (node == NULL) return; bst_deallocate_recursive (node->bst_link[0]); bst_deallocate_recursive (node->bst_link[1]); free (node); }
636. <Robust recursive copy of BST, take 2 636> = static struct bst_node error_node; /* Makes and returns a new copy of tree rooted at x. If an allocation error occurs, returns &error_node. */ static struct bst_node *
bst_robust_copy_recursive_2 (struct bst_node *x)
{ struct bst_node *y; if (x == NULL) return NULL; y = malloc (sizeof *y); if (y == NULL) return &error_node; y->bst_data = x->bst_data; y->bst_link[0] = bst_robust_copy_recursive_2 (x->bst_link[0]); y->bst_link[1] = bst_robust_copy_recursive_2 (x->bst_link[1]); if (y->bst_link[0] == &error_node || y->bst_link[1] == &error_node)
{ bst_deallocate_recursive (y); return &error_node; } return y; }
2. Here's one way to do it, which is simple but perhaps not the fastest possible.
637. <Robust recursive copy of BST, take 3 637> = /* Copies tree rooted at x to y, which latter is allocated but not
yet initialized. Returns one if successful, zero if memory was exhausted. In the latter case y is not freed but any partially allocated subtrees are. */ static int
bst_robust_copy_recursive_3 (struct bst_node *x, struct bst_node *y)
{ y->bst_data = x->bst_data; if (x->bst_link[0] != NULL)
{ y->bst_link[0] = malloc (sizeof *y->bst_link[0]); if (y->bst_link[0] == NULL) return 0; if (!bst_robust_copy_recursive_3 (x->bst_link[0], y->bst_link[0]))
{ free (y->bst_link[0]); return 0; } } else
y->bst_link[0] = NULL; if (x->bst_link[1] != NULL)
{ y->bst_link[1] = malloc (sizeof *y->bst_link[1]); if (y->bst_link[1] == NULL) return 0; if (!bst_robust_copy_recursive_3 (x->bst_link[1], y->bst_link[1]))
{ bst_deallocate_recursive (y->bst_link[0]); free (y->bst_link[1]); return 0; } } else
y->bst_link[1] = NULL; return 1; }
638. <Intermediate step between bst_copy_recursive_2() and bst_copy_iterative() 638> = /* Copies org to a newly created tree, which is returned. */ struct bst_table *
bst_copy_iterative (const struct bst_table *org)
{ struct bst_node *stack[2 * (BST_MAX_HEIGHT + 1)]; int height = 0; struct bst_table *new; const struct bst_node *x; struct bst_node *y; new = bst_create (org->bst_compare, org->bst_param, org->bst_alloc); new->bst_count = org->bst_count; if (new->bst_count == 0) return new; x = (const struct bst_node *) &org->bst_root; y = (struct bst_node *) &new->bst_root; for (;;)
{ while (x->bst_link[0] != NULL)
{ y->bst_link[0] =
org->bst_alloc->libavl_malloc (org->bst_alloc, sizeof *y->bst_link[0]); stack[height++] = (struct bst_node *) x; stack[height++] = y; x = x->bst_link[0]; y = y->bst_link[0]; } y->bst_link[0] = NULL; for (;;)
{ y->bst_data = x->bst_data; if (x->bst_link[1] != NULL)
{ y->bst_link[1] =
org->bst_alloc->libavl_malloc (org->bst_alloc, sizeof *y->bst_link[1]); x = x->bst_link[1]; y = y->bst_link[1]; break; } else
y->bst_link[1] = NULL; if (height <= 2) return new; y = stack[–height]; x = stack[–height]; } } }
1. bst_copy() can set bst_data to NULL when memory allocation fails.
1. Factoring out recursion is troublesome in this case. Writing the loop with an explicit stack exposes more explicitly the issue of stack overflow. Failure on stack overflow is not acceptable, because it would leave both trees in disarray, so we handle it by dropping back to a slower algorithm that does not require a stack.
This code also makes use of root_insert() from <Robust root insertion of existing node in arbitrary subtree 625>.
639. <BST join function, iterative version 639> = /* Adds to tree all the nodes in the tree rooted at p. */ static void
fallback_join (struct bst_table *tree, struct bst_node *p)
{ struct bst_node *q; for (; p != NULL; p = q) if (p->bst_link[0] == NULL)
{ q = p->bst_link[1]; p->bst_link[0] = p->bst_link[1] = NULL; root_insert (tree, &tree->bst_root, p); } else
{ q = p->bst_link[0]; p->bst_link[0] = q->bst_link[1]; q->bst_link[1] = p; } } /* Joins a and b, which must be disjoint and have compatible
comparison functions. b is destroyed in the process. */ void
bst_join (struct bst_table *ta, struct bst_table *tb)
{ size_t count = ta->bst_count + tb->bst_count; if (ta->bst_root == NULL) ta->bst_root = tb->bst_root; else if (tb->bst_root != NULL)
{ struct bst_node **pa[BST_MAX_HEIGHT]; struct bst_node *qa[BST_MAX_HEIGHT]; int k = 0; pa[k] = &ta->bst_root; qa[k++] = tb->bst_root; while (k > 0)
{ struct bst_node **a = pa[–k]; struct bst_node *b = qa[k]; for (;;)
{ struct bst_node *b0 = b->bst_link[0]; struct bst_node *b1 = b->bst_link[1]; b->bst_link[0] = b->bst_link[1] = NULL; root_insert (ta, a, b); if (b1 != NULL)
{ if (k < BST_MAX_HEIGHT)
{ pa[k] = &(*a)->bst_link[1]; qa[k] = b1; if (*pa[k] != NULL) k++; else
*pa[k] = qa[k]; }
else
{ int j; fallback_join (ta, b0); fallback_join (ta, b1); for (j = 0; j < k; j++) fallback_join (ta, qa[j]); ta->bst_count = count; free (tb); bst_balance (ta); return; } } a = &(*a)->bst_link[0]; b = b0; if (*a == NULL)
{ *a = b; break; }
else if (b == NULL) break; } } } ta->bst_count = count; free (tb); }
1. Functions not used at all are bst_insert(), bst_replace(), bst_t_replace(), bst_malloc(), and bst_free().
Functions used explicitly within test() or functions that it calls are bst_create(), bst_find(), bst_probe(), bst_delete(), bst_t_init(), bst_t_first(), bst_t_last(), bst_t_insert(), bst_t_find(), bst_t_copy(), bst_t_next(), bst_t_prev(), bst_t_cur(), bst_copy(), and bst_destroy().
The trav_refresh() function is called indirectly by modifying the tree during traversal.
The copy_error_recovery() function is called if a memory allocation error occurs during bst_copy(). The bst_balance() function, and therefore also tree_to_vine(), vine_to_tree(), and compress(), are called if a stack overflow occurs. It is possible to force both these behaviors with command-line options to the test program.
2. Some kinds of errors mean that we can keep going and test other parts of the code. Other kinds of errors mean that something is deeply wrong, and returning without cleanup is the safest action short of terminating the program entirely. The third category is memory allocation errors. In our test program these are always caused intentionally in order to test out the BST functions' error recovery abilities, so a memory allocation error is not really an error at all, and we clean up and return successfully. (A real memory allocation error will cause the program to abort in the memory allocator. See the definition of mt_allocate() within <Memory tracker 126>.)
1. The definition of size_t differs from one compiler to the next. All we know about it for sure is that it's an unsigned type appropriate for representing the size of an object. So we must convert it to some known type in order to pass it to printf(), because printf(), having a variable number of arguments, does not know what type to convert it into.
Incidentally, C99 solves this problem by providing a z modifier for printf() conversions, so that we could use "%zu" to print out size_t values without the need for a cast.
See also: [ISO 1999], section 7.19.6.1.
640. <Generate random permutation of integers 640> = /* Fills the n elements of array[] with a random permutation of the integers between 0 and n - 1. */ static void
permuted_integers (int array[], size_t n)
{ size_t i; for (i = 0; i < n; i++) array[i] = i; for (i = 0; i < n; i++)
{ size_t j = i + (unsigned) rand () / (RAND_MAX / (n - i) + 1); int t = array[j]; array[j] = array[i]; array[i] = t; } }
This code is included in 642.
2. All it takes is a preorder traversal. If the code below is confusing, try looking back at <Initialize smaller and larger within binary search tree 616>.
641. <Generate permutation for balanced tree 641> = /* Generates a list of integers that produce a balanced tree when inserted in order into a binary tree in the usual way. min and max inclusively bound the values to be inserted. Output is deposited starting at *array. */ static void
gen_balanced_tree (int min, int max, int **array)
{ int i; if (min > max) return; i = (min + max + 1) / 2; *(*array)++ = i; gen_balanced_tree (min, i - 1, array); gen_balanced_tree (i + 1, max, array); }
This code is included in 642.
642. <Insertion and deletion order generation 642> = <Generate random permutation of integers 640> <Generate permutation for balanced tree 641> /* Generates a permutation of the integers 0 to n - 1 into insert[] according to insert_order. */ static void
gen_insertions (size_t n, enum insert_order insert_order, int insert[])
{ size_t i; switch (insert_order)
{ case INS_RANDOM: permuted_integers (insert, n); break; case INS_ASCENDING: for (i = 0; i < n; i++) insert[i] = i; break; case INS_DESCENDING: for (i = 0; i < n; i++) insert[i] = n - i - 1; break; case INS_BALANCED: gen_balanced_tree (0, n - 1, &insert); break; case INS_ZIGZAG: for (i = 0; i < n; i++) if (i % 2 == 0)
insert[i] = i / 2; else
insert[i] = n - i / 2 - 1; break; case INS_ASCENDING_SHIFTED: for (i = 0; i < n; i++)
{ insert[i] = i + n / 2; if ((size_t) insert[i] >= n) insert[i] -= n; } break; case INS_CUSTOM: for (i = 0; i < n; i++) if (scanf ("%d", &insert[i]) == 0) fail ("error reading insertion order from stdin"); break; default: assert (0); } } /* Generates a permutation of the integers 0 to n - 1 into delete[] according to delete_order and insert[]. */ static void
gen_deletions (size_t n, enum delete_order delete_order, const int *insert, int *delete)
{ size_t i; switch (delete_order)
{ case DEL_RANDOM: permuted_integers (delete, n); break; case DEL_REVERSE: for (i = 0; i < n; i++) delete[i] = insert[n - i - 1]; break; case DEL_SAME: for (i = 0; i < n; i++) delete[i] = insert[i]; break; case DEL_CUSTOM: for (i = 0; i < n; i++) if (scanf ("%d", &delete[i]) == 0) fail ("error reading deletion order from stdin"); break; default: assert (0); } }
This code is included in 97.
4. The function below is carefully designed. It uses time() to obtain the current time. The alternative clock() is a poor choice because it measures CPU time used, which is often more or less constant among runs. The actual value of a time_t is not portable, so it computes a “hash” of the bytes in it using a multiply-and-add technique. The factor used for multiplication normally comes out as 257, a prime and therefore a good candidate.
See also: [Knuth 1998a], section 3.2.1; [Aho 1986], section 7.6.
643. <Random number seeding 643> = /* Choose and return an initial random seed based on the current time. Based on code by Lawrence Kirby <[email protected]>. */ unsigned
time_seed (void)
{ time_t timeval; /* Current time. */ unsigned char *ptr; /* Type punned pointed into timeval. */ unsigned seed; /* Generated seed. */ size_t i; timeval = time (NULL); ptr = (unsigned char *) &timeval; seed = 0; for (i = 0; i < sizeof timeval; i++) seed = seed * (UCHAR_MAX + 2u) + ptr[i]; return seed; }
This code is included in 97.
644. <Overflow testers 124> += static int
test_bst_t_last (struct bst_table *tree, int n)
{ struct bst_traverser trav; int *last; last = bst_t_last (&trav, tree); if (last == NULL || *last != n - 1)
{ printf (" Last item test failed: expected %d, got %d\n", n - 1, last != NULL ? *last : -1); return 0; } return 1; } static int
test_bst_t_find (struct bst_table *tree, int n)
{ int i; for (i = 0; i < n; i++)
{ struct bst_traverser trav; int *iter; iter = bst_t_find (&trav, tree, &i); if (iter == NULL || *iter != i)
{ printf (" Find item test failed: looked for %d, got %d\n", i, iter != NULL ? *iter : -1); return 0; } } return 1; } static int
test_bst_t_insert (struct bst_table *tree, int n)
{ int i; for (i = 0; i < n; i++)
{ struct bst_traverser trav; int *iter; iter = bst_t_insert (&trav, tree, &i); if (iter == NULL || iter == &i || *iter != i)
{ printf (" Insert item test failed: inserted dup %d, got %d\n", i, iter != NULL ? *iter : -1); return 0; } } return 1; } static int
test_bst_t_next (struct bst_table *tree, int n)
{ struct bst_traverser trav; int i; bst_t_init (&trav, tree); for (i = 0; i < n; i++)
{ int *iter = bst_t_next (&trav); if (iter == NULL || *iter != i)
{ printf (" Next item test failed: expected %d, got %d\n", i, iter != NULL ? *iter : -1); return 0; } } return 1; } static int
test_bst_t_prev (struct bst_table *tree, int n)
{ struct bst_traverser trav; int i; bst_t_init (&trav, tree); for (i = n - 1; i >= 0; i–)
{ int *iter = bst_t_prev (&trav); if (iter == NULL || *iter != i)
{ printf (" Previous item test failed: expected %d, got %d\n", i, iter != NULL ? *iter : -1); return 0; } } return 1; } static int
test_bst_copy (struct bst_table *tree, int n)
{ struct bst_table *copy = bst_copy (tree, NULL, NULL, NULL); int okay = compare_trees (tree->bst_root, copy->bst_root); bst_destroy (copy, NULL); return okay; }
1. Attempting to apply an allocation policy to allocations of zero-byte blocks is silly. How could a failure be indicated, given that one of the successful results for an allocation of 0 bytes is NULL? At any rate, libavl never calls bst_allocate() with a size argument of 0.
See also: [ISO 1990], section 7.10.3.
1. We'll use bsts_, short for “binary search tree with sentinel”, as the prefix for these functions. First, we need node and tree structures:
645. <BSTS structures 645> = /* Node for binary search tree with sentinel. */ struct bsts_node
{ struct bsts_node *link[2]; int data; }; /* Binary search tree with sentinel. */ struct bsts_tree
{ struct bsts_node *root; struct bsts_node sentinel; struct libavl_allocator *alloc; };
This code is included in 649.
Searching is simple:
646. <BSTS functions 646> = /* Returns nonzero only if item is in tree. */ int
bsts_find (struct bsts_tree *tree, int item)
{ const struct bsts_node *node; tree->sentinel.data = item; node = tree->root; while (item != node->data) if (item < node->data)
node = node->link[0]; else
node = node->link[1]; return node != &tree->sentinel; }
See also 647.
Insertion is just a little more complex, because we have to keep track of the link that we just came from (alternately, we could divide the function into multiple cases):
647. <BSTS functions 646> += /* Inserts item into tree, if it is not already present. */ void
bsts_insert (struct bsts_tree *tree, int item)
{ struct bsts_node **q = &tree->root; struct bsts_node *p = tree->root; tree->sentinel.data = item; while (item != p->data)
{ int dir = item > p->data; q = &p->link[dir]; p = p->link[dir]; } if (p == &tree->sentinel)
{ *q = tree->alloc->libavl_malloc (tree->alloc, sizeof **q); if (*q == NULL)
{ fprintf (stderr, "out of memory\n"); exit (EXIT_FAILURE); } (*q)->link[0] = (*q)->link[1] = &tree->sentinel; (*q)->data = item; } }
Our test function will just insert a collection of integers, then make sure that all of them are in the resulting tree. This is not as thorough as it could be, and it doesn't bother to free what it allocates, but it is good enough for now:
648. <BSTS test 648> = /* Tests BSTS functions. insert and delete must contain some permutation of values 0...n - 1. */ int
test_correctness (struct libavl_allocator *alloc, int *insert, int *delete, int n, int verbosity)
{ struct bsts_tree tree; int okay = 1; int i; tree.root = &tree.sentinel; tree.alloc = alloc; for (i = 0; i < n; i++) bsts_insert (&tree, insert[i]); for (i = 0; i < n; i++) if (!bsts_find (&tree, i))
{ printf ("%d should be in tree, but isn't\n", i); okay = 0; } return okay; } /* Not supported. */ int
test_overflow (struct libavl_allocator *alloc, int order[], int n,
int verbosity)
{ return 0; }
This code is included in 649.
Function test() doesn't free allocated nodes, resulting in a memory leak. You should fix this if you are concerned about it.
Here's the whole program:
649. <bsts.c 649> = <License 1> #include <assert.h> #include <stdio.h> #include <stdlib.h> #include "test.h" <BSTS structures 645> <Memory allocator; tbl => bsts 5> <Default memory allocator header; tbl => bsts 7> <Default memory allocation functions; tbl => bsts 6> <BSTS functions 646> <BSTS test 648>
See also:
[Bentley 2000], exercise 7 in chapter 13.