Comparison Function - GNU libavl 2.0.2

The C language provides the void * generic pointer for dealing with data of unknown type. We will use this type to allow our tables to contain a wide range of data types. This flexibility does keep the table from working directly with its data. Instead, the table's user must provide means to operate on data items. This section describes the user-provided functions for comparing items, and the next section describes two other kinds of user-provided functions.

There is more than one kind of generic algorithm for searching. We can search by comparison of keys, by digital properties of the keys, or by computing a function of the keys. In this book, we are only interested in the first possibility, so we need a way to compare data items. This is done with a user-provided function compatible with tbl_comparison_func, declared as follows:

2. <Table function types 2> =
/* Function types. */
typedef int tbl_comparison_func (const void *tbl_a, const void *tbl_b, 
                                 void *tbl_param);

See also 4.

This code is included in 14.

A comparison function takes two pointers to data items, here called a and b, and compares their keys. It returns a negative value if a < b, zero if a == b, or a positive value if a > b. It takes a third parameter, here called param, which is user-provided.

A comparison function must work more or less like an arithmetic comparison within the domain of the data. This could be alphabetical ordering for strings, a set of nested sort orders (e.g., sort first by last name, with duplicates by first name), or any other comparison function that behaves in a “natural” way. A comparison function in the exact class of those acceptable is called a strict weak ordering, for which the exact rules are explained in Exercise 5.

Here's a function that can be used as a comparison function for the case that the void * pointers point to single ints:

3. <Comparison function for ints 3> =
/* Comparison function for pointers to ints. 
   param is not used. */
int 
compare_ints (const void *pa, const void *pb, void *param) 
{
  const int *a = pa;
  const int *b = pb;

  if (*a < *b) 
    return -1;
  else if (*a > *b) 
    return +1;
  else 
    return 0;
}

This code is included in 134.

Here's another comparison function for data items that point to ordinary C strings:

/* Comparison function for strings. 
   param is not used. */
int 
compare_strings (const void *pa, const void *pb, void *param) 
{
  return strcmp (pa, pb);
}

See also: [FSF 1999], node “Defining the Comparison Function”; [ISO 1998], section 25.3, “Sorting and related operations”; [SGI 1993], section “Strict Weak Ordering”.

Exercises:

1. In C, integers may be cast to pointers, including void *, and vice versa. Explain why it is not a good idea to use an integer cast to void * as a data item. When would such a technique would be acceptable? [answer]

2. When would the following be an acceptable alternate definition for compare_ints()?

int 
compare_ints (const void *pa, const void *pb, void *param) 
{
  return *((int *) pa) - *((int *) pb);
}

[answer]

3. Could strcmp(), suitably cast, be used in place of compare_strings()? [answer]

4. Write a comparison function for data items that, in any particular table, are character arrays of fixed length. Among different tables, the length may differ, so the third parameter to the function points to a size_t specifying the length for a given table. [answer]

*5. For a comparison function f() to be a strict weak ordering, the following must hold for all possible data items a, b, and c:

Irreflexivity: For every a, f(a, a) == 0.
Antisymmetry: If f(a, b) > 0, then f(b, a) < 0.
Transitivity: If f(a, b) > 0 and f(b, c) > 0, then f(a, c) > 0.
Transitivity of equivalence: If f(a, b) == 0 and f(b, c) == 0, then f(a, c) == 0.

Consider the following questions that explore the definition of a strict weak ordering.

Explain how compare_ints() above satisfies each point of the definition.
Can the standard C library function strcmp() be used for a strict weak ordering?
Propose an irreflexive, antisymmetric, transitive function that lacks transitivity of equivalence.

[answer]

*6. libavl uses a ternary comparison function that returns a negative value for <, zero for ==, positive for >. Other libraries use binary comparison functions that return nonzero for < or zero for >=. Consider these questions about the differences:

Write a C expression, in terms of a binary comparison function f() and two items a and b, that is nonzero if and only if a == b as defined by f(). Write a similar expression for a > b.
Write a binary comparison function “wrapper” for a libavl comparison function.
Rewrite bst_find() based on a binary comparison function. (You can use the wrapper from above to simulate a binary comparison function.)

[answer]