[Chapter 12] 12.2 Monitoring Your System

12.2 Monitoring Your System

Another important aspect of firewall maintenance involves monitoring your system. Monitoring is intended to tell you several things:

Has your firewall been compromised?
What kinds of attacks are being tried against your firewall?
Is your firewall in working order?
Is your firewall able to provide the service your users need?

In order to answer these questions, you'll need to know what the normal pattern of usage is.

12.2.1 Special-Purpose Monitoring Devices

You'll do most of your monitoring using the tools and the logging provided by the existing parts of your firewall, but you may find it convenient to have some dedicated monitoring devices as well. For example, you may want to put a monitoring station on your perimeter net so you can be sure only the packets you expect are going across it. You can use a general-purpose computer with network snooping software on it, or you can use a special-purpose network sniffer.

How can you make certain that this monitoring machine can't be used by an intruder? In fact, you'd prefer that an intruder not even detect its existence. On some network hardware, you can disable transmission in the network interface (with sufficient expertise and a pair of wire cutters), which will make the machine impossible to detect and extremely difficult for an intruder to use (because it can't reply). If you have source for your operating system, you can always disable transmission there; however, in this case, it's much harder to be certain you've been successful. In most cases, you'll have to settle for extremely cautious configuration of the machine. Treat it like a bastion host that needs to do less and be more secure.

12.2.2 What Should You Watch For?

In a perfect world, you'd like to know absolutely everything that goes through your firewall - every packet dropped or accepted, every connection requested. In the real world, neither the firewall nor your brain can cope with that much information. To come up with a practical compromise, you'll want to turn on the most verbose logging that doesn't slow down your machines too much and that doesn't fill up your disks too fast; then, you'll want to summarize the logs that are produced.

You can improve the disk space problem by writing verbose logs to high-capacity tapes. DAT and 8mm tapes are cheap, and they hold a lot of data, but they have some drawbacks. They're not particularly fast; they rarely can write at more than 800K a second, under the best circumstances, and log entries are generally too short to achieve maximum performance. They're also annoying to read data from. If you're interested in using them, write summary logs to disk, and write everything to tape. If you find a situation where you need more data, you can go back to the tape for it. A tape drive can probably keep up with the packets on an average Internet connection, but it won't keep up with an internal connection at full LAN speeds or even with a T-1 connection to the Internet that's at close to its maximum performance.

In particular, you want to log the following cases:

All dropped packets, denied connections, and rejected attempts
At least the time, protocol, and user name for every successful connection to or through your bastion host
All error messages from your routers, your bastion host, and any proxying programs

NOTE: For security reasons, some information should never be logged where an intruder could possibly be able to read it. For example, although you should log failed login attempts, you should not log the password that was used in the failed attempt. Users frequently mistype their own passwords, and logging these mistyped passwords would make it easier for a computer cracker to break into a user's account.
Some system administrators believe that the account name should also not be logged on failed login attempts, especially when the account typed by the user is nonexistent. The reason is that users occasionally type their passwords when they are prompted for their user names. If invalid accounts are logged, it might be possible for an attacker to use those logs to infer people's passwords.

What are you watching for? You want to know what your usual pattern is (and what trends there are in it), and you want to be alerted to any exceptions to that pattern. To recognize when things are going wrong, you have to understand what happens when things are going right. It's important to know what messages you get when everything is working. Most systems produce error messages that sound peculiar and threatening even when they're working perfectly well. For example, in the sample syslog output in Example 12.1, messages 10, 14, and 17 all look vaguely threatening, but are in fact perfectly OK.[1] (See the section in Chapter 5 called "Setting Up System Logs.") If you see those messages for the first time when you're trying to debug a problem, you're likely to leap to the conclusion that the messages have something to do with your problem and get thoroughly sidetracked. Even if you never do figure out what the messages are and why they're appearing, just knowing that certain messages appear even when things are working fine will save you time.

[1] Message 10 is a common network failure that will result in a retry, and how good do you expect your connection to Cameroon to be? 14 is traceroute running. 17 says there are no synonyms defined, which you presumably already know.

Example 12.1: A Sample syslog File (Line Numbers Added)

Most of your logging will probably be done via the UNIX syslog facility or some other similar file-based log mechanism. You'll need to develop log-scanning scripts to analyze each of these log files on a regular basis. Some firewall packages, such as the TIS FWTK, come with scripts to analyze and summarize their own logs. You could use these scripts as templates for your own logging, or you could write your own scripts from scratch in awk, perl, or some other suitable language. Chapter 5 discusses a package named SWATCH, often used for log monitoring and analysis.

As you can see, the log file is verbose and not particularly readable (even with better linebreaks inserted!). An unimportant error condition on a distant host (the server name mismatch on nhs-relay.ac.cv) is producing multiple error messages (11, 12, and 13, in this highly condensed version). The log file is also in chronological order, which is not particularly the order of importance. Example 12.2 shows a report based on a log file, with messages arranged in a more useful order, and somewhat summarized.

Example 12.2: A Report Based on a syslog File

In general, it's safer to write scripts to filter out messages to be ignored (leaving unusual stuff), rather than writing scripts to identify the unusual stuff directly. The reason for this is that you seldom know all of the different messages your firewall might produce. It's easier to ignore the benign messages than to recognize the dangerous ones.

Log messages fall into three categories:

Known to be OK (e.g., "login succeeded for user smith"): You would like to ignore these. Message 3 is clearly in this category.
Known to be dangerous (e.g., "bad disk block at location 0x47c7a8"): You would like these to cause some action to happen; this may be anything from sending someone email, to submitting a trouble ticket, to paging you.
Unknown: You would like these to be sent off for a human to examine. Message 18 is one of these; why is someone sending UDP packets from port 20 to an arbitrary port above 1024? That doesn't match any common protocol.

Setting up the criteria is an iterative process; once a human has examined a mystery message, future examples of that message can probably be classified as either OK or dangerous without being examined again. You'll change the rules as time goes on.

Often, log entries must be considered in context. A message that's mildly mysterious if it occurs once is cause for serious worry if it occurs every minute. For example, "login succeeded for user smith" is good, unless it's preceded by three "login failed" messages for every user above "smith" in your password file; in that case, it's very bad indeed. In the syslog example, message 9 shows an unexceptional outbound TCP connection, logged just on general principles. It wouldn't be at all worrying if it weren't preceded by messages 6 through 8. In context, you know that someone made three failed tries at logging in as "admin," finally succeeded, and then immediately started up an outbound connection. This looks extremely suspicious. Message 7 doesn't mean anything at all without context.

12.2.3 The Good, the Bad, and the Ugly

Once you go beyond the obvious (for example, it's OK for users to log in; it's not OK for the disk to be bad) how can you tell when you're in trouble? Some rules of thumb:

Once is an accident; twice is coincidence; three times is enemy action: One user who tries to log in at 2 A.M. and fails is up too late and can't type. Two users who try to log in at 2 A.M. may have been at the same party, but you're certainly going to be curious about the incident. Three or more attempts to log in at 2 A.M., and someone is trying to break in. This rule of thumb applies mostly to attempts on separate accounts; stubborn repeated attempts by the same user to do the same thing that doesn't work probably merely indicates that the user is single-minded - and wrong.
Accidents don't try to cover themselves up: If your log files are missing, if entries have been deleted, or if there is any other evidence that somebody has been covering his tracks, you probably have a break-in. If not, you have some other serious problem. (Either something is broken, or somebody administering the machine is deleting things inappropriately.)
Most mysteries don't mean anything: For everybody who sets out to track down a mysterious problem or a strange log entry, and finds an intruder, there are 99 people who set out to track down a mysterious problem or a strange log entry, and find an annoying but trivial bug. You should still try to track these things down, but there's no need to panic.
Straightforward explanations are usually correct: It's possible that you were broken into at the same time you had another known problem, but it's not likely. If you know that you had a hardware failure, or a person wandering around doing misguided things, you'll want to spend some time ruling out side effects of the known problem before you decide that you also have an intruder. On the other hand, if your files are mysteriously disappearing and there's nothing apparently wrong with your disk, somebody is probably deleting them, and you'll want to spend a very long time ruling out an intruder before you decide that your filesystem code is buggy.

You're going to end up classifying suspicious events into several categories:

You know what caused it, and it's not a security problem.
You don't know what caused it, you're probably never going to know what caused it, but whatever it was, it's not happening anymore.
Somebody was trying to break in, but not very hard; this is a probe.
Somebody made a serious attempt to get in; this is an attack.
Somebody actually broke in.

The boundaries between these categories are vague. Unless you're dealing with messages from the first category (i.e., a known nonproblem), it's going to come down to a judgment call most of the time. It's impossible to provide an exhaustive list of the symptoms of any of these situations, but here are some generalizations that may help.

You should suspect that someone's been probing your site if you see:

A few attempts to access services at insecure ports (e.g., attempts to contact portmapper or an X server).
Attempts to log in with common account names (e.g., guest or lp; most attempts to log in as "anonymous" are mistakes).
Requests to tftp files or to transfer NIS maps.
Somebody feeding the debug command to your SMTP server.

You should be more concerned if you see any of the following; an attack may be going on:

Multiple failed attempts to log in to valid accounts on your machines, particularly accounts that are used across the Internet, or attempts on accounts in the order in which they appear in your password file.
Unusual accepted packets or commands whose purpose you don't understand.
Packets sent to every port in a range.
Successful logins from an unexpected site.

You should suspect a successful break-in if you see:

Deleted or modified log files.
Programs that suddenly omit expected information (this suggests that they have been replaced with versions that ignore the intruder's files and programs). On UNIX machines, the most frequent victims are ls, ps, and ifconfig.
New log files containing password information or packet traces that you can't explain.
Directories that contain more administrative entries than they should. For example, on UNIX machines, directories should contain two entries with names made out of periods ("." and "..", indicating "this directory" and "parent directory"), but there should not be more than two such entries (for instance, "..." or ".. "). If it looks as if there is more than one entry for each, the extra entry probably has spaces in it and is being used to conceal the file or directory from casual observation.
Unexpected logins as privileged users (for example, root) or unexpected users who are suddenly able to become privileged users.
Apparent probes or attacks coming from your own machines.
Extra processes with names that are variants of common system processes (e.g., both sendmail and Sendmail are running, or init and initd; this is another trick for sneaking things in where you won't notice them).
An unexpected change in the login prompt for your machine, or for other machines you reach from yours. This indicates the program that displays the prompt has been modified.

12.2.4 Responding to Probes

Inevitably, you're going to detect apparent probes of your firewall - packets sent to services you don't offer to the Internet, attempts to log in to nonexistent accounts, and so on. Probes are the Internet equivalent of someone walking down a line of doors and checking every door knob to see if it's locked. Probers generally try one or two things, and, if they don't get an interesting response, they move on. If you're inclined to do so, you can spend a lot of time chasing down incidents like this, attempting to figure out where the probes are coming from and who is behind them. However, in most situations, it probably isn't worth the effort. The novelty of chasing down probes of this kind fades quickly. If you're getting persistent probes from some site, you might contact the management of that site to let them know what's going on, but that's usually about as far as folks need to go in responding to these probes.

It's unfortunate that on the Internet today, probes are so frequent that the laissez faire attitude we've described is often an appropriate one. In good neighborhoods, people don't get away with trying doorknobs. You have a right to be unhappy with people who behave this way, and trying to get them to stop is perfectly reasonable. However, you do need to decide where you're going to spend your energy. Save extreme responses for extreme situations. Treating probers with maximum harshness is just going to convince people that you are unreasonable.

Some people amuse themselves by setting up firewall machines to lead on people who try common probes. For example, they put a password file in the anonymous FTP area that appears to contain user account data. However, if the prober breaks the encrypted passwords, he sees a snide message. This is a harmless way to spend your spare time, and it provides a satisfactory feeling of revenge, but it doesn't actually improve your security much. It simply annoys attackers, and doing so may cause them to take a personal interest in breaking into your site.

Different sites have different opinions about what constitutes a probe, and what constitutes a full-fledged attack. Most people call something a probe as long as they know it's not going to work, even if it is determined and drawn out. For example, somebody who determinedly tries every possible combination of lowercase alphabetic characters as your root password is not going to succeed, and can probably be ignored as a probe until you get tired of reading the log messages. (That kind of attack won't succeed, no matter how many combinations are tried.) However, if you have the time and the energy, it's probably worth pursuing people who are making determined attempts, even when you know they'll fail.

There are several freely available packages that probe for known vulnerabilities in a system. The most famous one these days is SATAN, developed by Dan Farmer and Wietse Venema. SATAN, as distributed, does nothing but probe; it does not take advantage of the vulnerabilities it looks for.[2] On the other hand, there is no benign reason for anybody but you to be running SATAN against your site. The program is highly configurable, and, therefore, it might have been configured to probe for more obscure vulnerabilities. SATAN's probes will be detected by normal firewall logging, either by packet filters rejecting the packets or by the servers on the bastion host doing the same. Specialized SATAN detectors are now available, but most of them rely on the ability to start up promiscuous mode on an Ethernet interface (which you should have disabled on your bastion hosts). These detectors also distinguish SATAN from random attempts by timing, which is easily modified. If you already have reasonable logging turned on, running a SATAN detector will not increase your security.

[2] These are well-known vulnerabilities that you will be protected against if you follow this book's advice and keep up with CERT-CC advisories.

Because SATAN is widely available and does not pose a threat to a firewalled site, it is reasonable to regard people who run it as merely probing your system, rather than mounting a determined attack. They have expended little effort and have little chance of success. Probes based on the use of SATAN will appear in your logs like any other probe - as a cluster of rejected packets from the same source.

12.2.5 Responding to Attacks

If your logs show that someone is making a determined attack against your system (see the rules of thumb we presented in "The Good, the Bad, and the Ugly," earlier in this chapter), you probably want to do a little more than sit back and watch. Chapter 13 describes in detail how you should respond to a real security incident.


12.1 Housekeeping		12.3 Keeping Up to Date