7. Input/Output and Command-line Processing

Contents:
I/O Redirectors
String I/O
Command-line Processing

The past few chapters have gone into detail about various shell programming techniques, mostly focused on the flow of data and control through shell programs. In this chapter, we'll switch the focus to two related topics. The first is the shell's mechanisms for doing file-oriented input and output. We'll present information that expands on what you already know about the shell's basic I/O redirectors.

Second, we'll "zoom in" and talk about I/O at the line and word level. This is a fundamentally different topic, since it involves moving information between the domains of files/terminals and shell variables. print and command substitution are two ways of doing this that we've seen so far.

Our discussion of line and word I/O will lead into a more detailed explanation of how the shell processes command lines. This information is necessary so that you can understand exactly how the shell deals with quotation, and so that you can appreciate the power of an advanced command called eval, which we will cover at the end of the chapter.

7.1 I/O Redirectors

In Chapter 1, Korn Shell Basics you learned about the shell's basic I/O redirectors, >, <, and |. Although these are enough to get you through 95% of your UNIX life, you should know that the Korn shell supports a total of 16 I/O redirectors. Table 7.1 lists them, including the three we've already seen. Although some of the rest are useful, others are mainly for systems programmers. We will wait until the next chapter to discuss the last three, which, along with >|, are not present in most Bourne shell versions.

Table 7.1: I/O Redirectors
Redirector	Function
> file	Direct standard output to file
< file	Take standard input from file
cmd1 \| cmd2	Pipe; take standard output of cmd1 as standard input to cmd2
>> file	Direct standard output to file; append to file if it already exists
>\| file	Force standard output to file even if noclobber set
<> file	Use file as both standard input and standard output
<< label	Here-document; see text
n> file	Direct file descriptor n to file
n< file	Set file as file descriptor n
>&n	Duplicate standard output to file descriptor n
<&n	Duplicate standard input from file descriptor n
<&-	Close the standard input
>&-	Close the standard output
\|&	Background process with I/O from parent shell
>&p	Direct background process' standard output to the parent shell's standard output
<&p	Direct parent shell's standard input to background process' standard input

Notice that some of the redirectors in Table 7.1 contain a digit n, and that their descriptions contain the term file descriptor; we'll cover that in a little while.

The first two new redirectors, >> and >|, are simple variations on the standard output redirector >. The >> appends to the output file (instead of overwriting it) if it already exists; otherwise it acts exactly like >. A common use of >> is for adding a line to an initialization file (such as .profile or .mailrc) when you don't want to bother with a text editor. For example:

cat >> .mailrc
alias fred [email protected]
^D

As we saw in Chapter 1, cat without an argument uses standard input as its input. This allows you to type the input and end it with [CTRL-D] on its own line. The alias line will be appended to the file .mailrc if it already exists; if it doesn't, the file is created with that one line.

Recall from Chapter 3, Customizing Your Environment that you can prevent the shell from overwriting a file with > file by typing set -o noclobber. >| overrides noclobber - it's the "Do it anyway, dammit!" redirector.

The redirector <> is mainly meant for use with device files (in the /dev directory), i.e., files that correspond to hardware devices such as terminals and communication lines. Low-level systems programmers can use it to test device drivers; otherwise, it's not very useful. But if you use a windowing system like X, you can try the following to see how it works:

Create two terminal windows (e.g., xterms).
In one of them, type who am i to find out the name of the window's "pseudo-device." This will be the second word in the output.
In the other, type cat <> /dev/pty, where pty is the name you found in the last step.
Back in the first window, type some characters. You will see them appear in alternate windows.
Type [CTRL-C] in both windows to end the process.

7.1.1 Here-documents

The << label redirector essentially forces the input to a command to be the shell's standard input, which is read until there is a line that contains only label. The input in between is called a here-document. Here-documents aren't very interesting when used from the command prompt. In fact, it's the same as the normal use of standard input except for the label. We could have used a here-document in the previous example of >>, like this (EOF, for "end of file," is an often-used label):

cat >> .mailrc << EOF
alias fred [email protected]
EOF

Here-documents are meant to be used from within shell scripts; they let you specify "batch" input to programs. A common use of here-documents is with simple text editors like ed(1). Here is a programming task that uses a here-document in this way:

Task 7.1

The s file command in mail(1) saves the current message in file. If the message came over a network (such as the Internet), then it has several header lines prepended that give information about network routing. Write a shell script that deletes the header lines from the file.

We can use ed to delete the header lines. To do this, we need to know something about the syntax of mail messages; specifically, that there is always a blank line between the header lines and the message text. The ed command 1,/^[]*$/d does the trick: it means, "Delete from line 1 until the first blank line." We also need the ed commands w (write the changed file) and q (quit). Here is the code that solves the task:

ed $1 << EOF
1,/^[]*$/d
w
q
EOF

The shell does parameter (variable) substitution and command substitution on text in a here-document, meaning that you can use shell variables and commands to customize the text. Here is a simple task for system administrators that shows how this works:

Task 7.2

Write a script that sends a mail message to a set of users saying that a new version of a certain program has been installed in a certain directory.

You can get a list of all users on the system in various ways; perhaps the easiest is to use cut to extract the first field of /etc/passwd, the file that contains all user account information. Fields in this file are separated by colons (:). [1]

[1] There are a few possible problems with this; for example, /etc/passwd usually contains information on "accounts" that aren't associated with people, like uucp, lp, and daemon. We'll ignore such problems for the purpose of this example.

Given such a list of users, the following code does the trick:

pgmname=$1
for user in $(cut -f1 -d: /etc/passwd); do
    mail $user << EOF
Dear $user,

A new version of $pgmname has been installed in $(whence pgmname).

Regards,
Your friendly neighborhood sysadmin.
EOF
done

The shell will substitute the appropriate values for the name of the program and its directory.

The redirector << has two variations. First, you can prevent the shell from doing parameter and command substitution by surrounding the label in single or double quotes. In the above example, if you used the line mail $user << 'EOF', then $pgmname and $(whence pgmname) would remain untouched.

The second variation is <<-, which deletes leading TABs (but not blanks) from the here-document and the label line. This allows you to indent the here-document's text, making the shell script more readable:

pgmname=$1
for user in $(cut -f1 -d: /etc/passwd); do
    mail $user <<- EOF
        Dear user,

        A new version of $pgmname has been installed in $(whence pgmname).

        Regards,

        Your friendly neighborhood sysadmin.
EOF
done

Of course, you need to choose your label so that it doesn't appear as an actual input line.

7.1.2 File Descriptors

The next few redirectors in Table 7.1 depend on the notion of a file descriptor. Like the device files used with <>, this is a low-level UNIX I/O concept that is of interest only to systems programmers - and then only occasionally. File descriptors are historical relics that really should be banished from the realm of shell use. [2] You can get by with a few basic facts about them; for the whole bloody story, look at the entries for read(), write(), fcntl(), and others in Section 2 of the UNIX manual.

[2] The C shell's set of redirectors contains no mention of file descriptors whatsoever.

File descriptors are integers starting at 0 that index an array of file information within a process. When a process starts, it usually has three file descriptors open. These correspond to the three standards: standard input (file descriptor 0), standard output (1), and standard error (2). If a process opens UNIX files for input or output, they are assigned to the next available file descriptors, starting with 3.

By far the most common use of file descriptors with the Korn shell is in saving standard error in a file. For example, if you want to save the error messages from a long job in a file so that they don't scroll off the screen, append 2> file to your command. If you also want to save standard output, append > file1 2> file2.

This leads to another programming task.

Task 7.3

You want to start a long job in the background (so that your terminal is freed up) and save both standard output and standard error in a single log file. Write a script that does this.

We'll call this script start. The code is very terse:

"$@" > logfile 2>&1 &

This line executes whatever command and parameters follow start. (The command cannot contain pipes or output redirectors.) It sends the command's standard output to logfile.

Then, the redirector 2>&1 says, "send standard error (file descriptor 2) to the same place as standard output (file descriptor 1)." 2>&1 is actually a combination of two redirectors in Table 7.1: n> file and >&n. Since standard output is redirected to logfile, standard error will go there too. The final & puts the job in the background so that you get your shell prompt back.

As a small variation on this theme, we can send both standard output and standard error into a pipe instead of a file: command 2>&1 | ... does this. (Make sure you understand why.) Here is a script that sends both standard output and standard error to the logfile (as above) and to the terminal:

"$@" 2>&1 | tee logfile &

The command tee(1) takes its standard input and copies it to standard output and the file given as argument.

These scripts have one shortcoming: you must remain logged in until the job completes. Although you can always type jobs (see Chapter 1) to check on progress, you can't leave your office for the day unless you want to risk a breach of security or waste electricity. We'll see how to solve this problem in the next chapter.

The other file-descriptor-oriented redirectors (e.g., <&n) are usually used for reading input from (or writing output to) more than one file at the same time. We'll see an example later in this chapter. Otherwise, they're mainly meant for systems programmers, as are <&- (force standard input to close) and >&- (force standard output to close).

Before we leave this topic, we should just note that 1> is the same as >, and 0< is the same as <. If you understand this, then you probably know all you need to know about file descriptors.