Monitoring Server Health (Perl for System Administration) Book Home Perl for System AdministrationSearch this book

7.6. Monitoring Server Health

For our final set of examples, we'll take a look at several ways to monitor the health of an SQL server. This sort of routine monitoring is similar in nature to the network service monitoring we saw in Chapter 5, "TCP/IP Name Services".

7.6.1. Space Monitoring

To get technical for a moment, database servers are places to hold stuff. If you run out of space to hold stuff, this is known as either "a bad thing" or "a very bad thing." As a result, programs that help us monitor the amount of space allocated and used on a server are very useful indeed. Let's look at a DBI program designed to look at the space situation on a Sybase server.

Here's a snippet of output from a program that shows graphically how space is used in each database on the server. Each section shows a bar chart of the percentage of allocated data and log space in use in a database. In the following chart, d stands for data space and l stands for log space. For each bar the percentage of space used and the total available space is indicated:

|ddddddd                                          |15.23%/5MB
hpotter--------|                                                 |
               |                                                 |0.90%/5MB

               |ddddddd                                          |15.23%/5MB
dumbledore-----|                                                 |
               |                                                 |1.52%/5MB


               |dddddddd                                         |16.48%/5MB
hgranger-------|                                                 |
               |                                                 |1.52%/5MB

               |ddddddd                                          |15.23%/5MB
rweasley-------|                                                 |
               |l                                                |3.40%/5MB

               |ddddddddddddddddddddddddddd                      |54.39%/2MB
hagrid---------|                                                 |
               |- no log                                         |

Here's how we generated this output:

use DBI;

$admin = 'sa';
print "Enter passwd for $admin: ";
chomp($pw = <STDIN>);
$pages = 2; # data is stored in 2k pages

# connect to the server
$dbh = DBI->connect('dbi:Sybase:',$admin,$pw);
die "Unable to connect: $DBI::errstr\n" 
  unless (defined $dbh);

# get the name of the databases on the server
$sth = $dbh->prepare(q{SELECT name from sysdatabases}) or
  die "Unable to prepare sysdatabases query: ".$dbh->errstr."\n";
$sth->execute or
  die "Unable to execute sysdatabases query: ".$dbh->errstr."\n";

while ($aref = $sth->fetchrow_arrayref) {
    push(@dbs, $aref->[0]);
}
$sth->finish;

# retrieve stats for each of the databases
foreach $db (@dbs) {
    
    # get and total the size column from all non-log segments
    $size    = &querysum(qq{SELECT size FROM master.dbo.sysusages 
                           WHERE  dbid = db_id(\'$db\') 
                           AND    segmap != 4});
    # get and total the size column for the log segment
    $logsize = &querysum(qq{SELECT size FROM master.dbo.sysusages
                           WHERE  dbid = db_id(\'$db\') 
                           AND    segmap = 4});		      

    # change to the database and retrieve usage stats
    $dbh->do(q{use $db}) or
      die "Unable to change to $db: ".$dbh->errstr."\n";

    # we used the reserved_pgs function to return the number of pages
    # used by both the data (doampg) and index (ioampg) part 
    # of the database
    $used=&querysum(q{SELECT reserved_pgs(id,doampg)+reserved_pgs(id,ioampg) 
                     FROM sysindexes 
                     WHERE id != 8});

    # same, except this time we look at the log usage
    $logused=&querysum(q{SELECT reserved_pgs(id, doampg) 
                        FROM sysindexes 
                        WHERE id=8});
    
    # show this information graphically
    &graph($db,$size,$logsize,$used,$logused);
}
$dbh->disconnect;

# prepare/exec a given single-column SELECT query, return 
# the sum of the results
sub querysum {
    my($query) = shift;
    my($sth,$aref,$sum);

    $sth = $dbh->prepare($query) or
      die "Unable to prepare $query: ".$dbh->errstr."\n";
    $sth->execute or 
      die "Unable to exec $query: ".$dbh->errstr."\n";

    while ($aref=$sth->fetchrow_arrayref) {
        $sum += $aref->[0];
    }
    $sth->finish;

    $sum;
}

# print out nice chart given database name, size, log size, 
# and usage info
sub graph {
    my($dbname,$size,$logsize,$used,$logused) = @_;

    # line for data space usage
    print ' 'x15 . '|'.'d'x (50 *($used/$size)) .
          ' 'x (50-(50*($used/$size))) . '|';

    # percentage used and total M for data space
    printf("%.2f",($used/$size*100));
    print "%/". (($size * $pages)/1024)."MB\n";
    print $dbname.'-'x(14-length($dbname)).'-|'.(' 'x 49)."|\n";

    if (defined $logsize) { # line for log space usage  
        print ' 'x15 . '|' . 'l'x (50 *($logused/$logsize)) .
	           ' 'x (50-(50*($logused/$logsize))) . '|';
        # percentage used and total M for log space
        printf("%.2f",($logused/$logsize*100));
        print "%/". (($logsize * $pages)/1024)."MB\n";
    }
    else { # some databases do not have separate log space
	     print ' 'x15 . "|- no log".(' 'x 41)."|\n";
    }
    print "\n";
}

SQL-savvy folks will probably wonder why a special subroutine (querysum) is being invoked to total the contents of a single column instead of using the perfectly good SUM calculation operator in SQL. querysum( ) is just meant to be an example of the sort of manipulation on the fly one can do from Perl. A Perl subroutine is probably more appropriate for more complex jobs. For instance, if we need to keep separate running totals based on a regular expression, that's probably best done from Perl rather than asking the server to perform the tabulation (even if it could).

7.6.2. Monitoring the CPU Health of a SQL Server

For the final example of this chapter, we'll use DBI to show us a minute-by-minute status line display of the CPU health of a SQL server. Just to make it more interesting, we'll monitor two separate servers simultaneously from the same script. We'll comment on this code in a moment:

use DBI;

$syadmin = "sa";
print "Sybase admin passwd: ";
chomp($sypw = <STDIN>);

$msadmin = "sa";
print "MS-SQL admin passwd: ";
chomp($mspw = <STDIN>);

# connect to Sybase server
$sydbh = DBI->connect("dbi:Sybase:server=SYBASE",$syadmin,$sypw);
die "Unable to connect to sybase server: $DBI::errstr\n" 
  unless (defined $sydbh);
# turn on the ChopBlanks option to remove trailing whitespace in columns
$sydbh->{ChopBlanks} = 1;

# connect to MS-SQL server (handy that we can use DBD::Sybase for this!)
$msdbh = DBI->connect("dbi:Sybase:server=MSSQL",$msadmin,$mspw);
die "Unable to connect to mssql server: $DBI::errstr\n" 
  unless (defined $msdbh);
# turn on the ChopBlanks option to remove trailing whitespace in columns 
$msdbh->{ChopBlanks} = 1;

$|=1; # turn off STDOUT IO buffering

# initialize the signal handler so we can cleanup nicely
$SIG{INT} = sub {$byebye = 1;};

# infinitely loop unless our interrupt flag has been set
while (1) {
    last if ($byebye);
    
    # run the stored procedure sp_monitor
    $systh = $sydbh->prepare(q{sp_monitor}) or
       die "Unable to prepare sy sp_monitor:".$sydbh->errstr."\n";
    $systh->execute or 
       die "Unable to execute sy sp_monitor:".$sydbh->errstr."\n";
    # loop to retrieve the lines from the output we need. 
    # We know we have all of it when we see the cpu_busy information
    while($href = $systh->fetchrow_hashref or 
          $systh->{syb_more_results}) {
             # got what we needed, stop asking
             last if (defined $href->{cpu_busy}); 
    }
    $systh->finish;
    
    # substitute out everything but the % number from 
    # the values we receive 
    for (keys %{$href}) {
	    $href->{$_} =~ s/.*-(\d+%)/\1/;
    }

    # collect all the data we need into a single line
    $info = "Sybase: (".$href->{cpu_busy}." CPU), ".
                    "(".$href->{io_busy}." IO), ".
                    "(".$href->{idle}." idle)   ";

    # ok, now let's do it all over again for the second server (MS-SQL)
    $mssth = $msdbh->prepare(q{sp_monitor}) or
       die "Unable to prepare ms sp_monitor:".$msdbh->errstr."\n";
    $mssth->execute or
       die "Unable to execute ms sp_monitor:".$msdbh->errstr."\n";
    while($href = $mssth->fetchrow_hashref or 
            $mssth->{syb_more_results}) {
              # got what we needed, stop asking
	          last if (defined $href->{cpu_busy}); 
    }
    $mssth->finish;

    # substitute out everything but the % number from 
    # the values we receive
    for (keys %{$href}) {	
	    $href->{$_} =~ s/.*-(\d+%)/\1/;
    }

    $info .= "MSSQL: (" . $href->{'cpu_busy'}." CPU), ".
                    "(".$href->{'io_busy'}." IO), ".
                    "(".$href->{'idle'}." idle)";
    print " "x78,"\r";
    print $info,"\r";
    
    sleep(5) unless ($byebye);
}

# only end up here if we've broken out of the loop thanks to an interrupt
$sydbh->disconnect;
$msdbh->disconnect;

This script keeps this line on your screen, which is refreshed every five seconds:

Sybase: (33% CPU), (33% IO), (0% idle)   MSSQL: (0% CPU), (0% IO), (100% idle)

The heart of this program is the stored procedure sp_monitor that exists both on Sybase and MS-SQL. sp_monitor's output looks like this:

last_run                    current_run                 seconds     
--------------------------- --------------------------- ----------- 
Aug 3 1998 12:05AM          Aug 3 1998 12:05AM          1      
     
cpu_busy                  io_busy                   idle                      
------------------------- ------------------------- ------------------------- 
0(0)-0%                   0(0)-0%                   40335(0)-0%      
         
packets_received          packets_sent              packet_errors             
------------------------- ------------------------- ------------------------- 
1648(0)                   1635(0)                   0(0)              
        
total_read          total_write         total_errors        connections        
------------------- ------------------- ------------------- ------------------ 
391(0)              180(0)              0(0)                11(0)

Unfortunately, sp_monitor exposes a nonportable Sybase-ism that was carried over to MS-SQL: multiple result sets. Each of the lines returned comes back as a separate result set. DBD::Sybase handles this by setting a special statement attribute. That's why you see this test:

while($href = $systh->fetchrow_hashref or 
      $systh->{syb_more_results}) {

and why we exit this loop early once we've seen the columns we're looking for:

# got what we needed, stop asking
	   last if (defined $href->{cpu_busy});

The program itself loops forever until it receives an interrupt signal (most likely from the user pressing Ctrl-C). When we receive this signal, we do the safest thing possible in a signal handler and set an exit flag. This is the technique recommended by the perlipc manpage for safe signal handling. Receiving the INT signal will set a flag that punts us out of the loop on the next iteration. Catching this signal allows the program to nicely close its database handles before shuffling off this mortal coil.

This small program just scratches the surface of the sort of server monitoring we can do. It would be easy to take the results we get from sp_monitor and graph them over time to get a better notion of how our server is being used. Let creeping featurism be your muse.



Library Navigation Links

Copyright © 2001 O'Reilly & Associates. All rights reserved.