Next: One-to-Many Joins
Up: Joining Tables
Previous: Additional Join Possibilities
Choosing a Join Key
The join key is the value used to link rows between tables. For example,
in Figure
, 648 is the
customer key, appearing in the customer table to uniquely identify
the row, and in the salesorder table to refer to that specific
customer row.
Some people might question whether an identification number is needed.
Should the customer name be used as a join key? Using it as the join
key is not a good idea for several reasons:
- Numbers are less likely to be entered incorrectly.
- Two customers with the same name would be impossible to distinguish
in a join.
- If the customer name changes, all references to that name would have
to change.
- Numeric joins are more efficient than joins of long character strings.
- Numbers require less storage space than character strings.
In the statename table, the two-letter state code is probably
a good join key for the following reasons:
- Two-letter codes are easy for users to remember and enter.
- State codes are always unique.
- State codes do not change.
- Joins of short two-letter codes are not significantly slower than
integer joins.
- Two-letter codes do not require significantly more storage space than
integers.
Essentially, two choices for join keys exist: identification numbers
and short character codes. If an item is referenced repeatedly, it
is best to use a short character code as a join key. You can display
this key to users and allow them to refer to customers and employees
using codes. Users prefer to identify items by short, fixed-length
character codes containing numbers and letters. For example, customers
might be identified by six-character codes (FLE001),
employees by their initials (BAW), and
parts by five-character codes (E7245). Codes are easy
to use and remember. In many cases, users can choose the codes, as
long as they are unique.
It is possible to allow users to enter short character codes and still
use identification numbers as join keys. Adding a code column
to the table accomplishes this goal. For the customer table,
a new column called code can be added to hold the customer
code. When the user enters a customer code, the query can find the
customer_id assigned to the customer code, then use that
customer_id in joins with other tables. Figure
shows a query using a customer code to find all order numbers for
that customer.
SELECT order_id
FROM customer, salesorder
WHERE customer.code = 'FLE001' AND
customer.customer_id = salesorder.customer_id;
In some cases, identification numbers work well and codes are unnecessary,
as in the following cases:
- Items with short lifespans, such as order numbers
- Items without appropriate codes, such as payroll batch numbers
- Items used internally and not referenced by users
Defining codes for such values would be useless. It is better to allow
the database to assign a unique number to each item. Chapter
discusses database support for assigning unique identifiers.
No universal rule dictates when you should choose codes or identification
numbers. U.S. states are clearly better keyed on codes, because
only 50 exist. The resulting codes are short, unique, and
well known by most users. At the other extreme, order numbers are
best used without codes because too many of them are possible and
codes would be of little use.
Next: One-to-Many Joins
Up: Joining Tables
Previous: Additional Join Possibilities
Bruce Momjian
2005-04-21