Wednesday, January 25, 2006

And now for something different: scripts

It's been a while (year-end releases, 80-hour weeks...), but here's something I think more people will find useful.
It's a script, called check_alert, and it will inspect and rotate your alert log file(s). If you want just rotation, take a serious look at logrotate, a system utility that comes with most *nix distributions nowadays (except HP-UX, but you can download it - or search here).
I made check_alert as configurable as possible, by means of external files. Feel free to use and alter, though I'd appreciate that my name remains mentioned.
Anyway here's the link for the zipped file, and just to make you curious, here's the README:
README file for check_alert

Purpose: checl_alert will check the alert log files of Oracle
databases for errors, and will notify DBAs by email

Written: Januari 2006, by Frank van Bortel

check_alert is configurable by means of two external files, check_alert.errors, and
check_alert.conf

check_alert.conf contains the following variables, used by check_alert:

MAILLIST:
contains the list of people that are mailed with the findings of check_alert.
It should be compliant with the mail program you have defined in MAILPROG.
Example: MAILLIST=f.w.j.van.bortel@nospam.com


ERRORSONLY:
a switch [Y,N] designating whether only errors should be mailed (Y), or
whether findings should always be mailed (N), even when no errors are found.
The errors that are reported can be configured in check_alert.errors
Example: ERRORSONLY=Y

KEEPREPORT:
a switch [Y,N] designating whether the log reporting file should be kept (Y),
or deleted (N), when done.

TMPDIR:
variable designates a temporary directory; no files will be left there, but
check_alert must have read/write permissions there.
Example: TMPDIR=/tmp

ORATAB: designates the location of the oratab file, as maintained by Oracle installation tools
Note that different platforms use different locations
Example: ORATAB=/etc/oratab

EXTDATE:
defines how long the logs should be kept; format mask according to the date
executable. Most date executables support these formats:
+%m%d will append MMDD to the logfile, keeping the logfiles for a year
+%d will append the day of the month (DD) format, keeping logfiles for a month
+%w will append the day-of-the week [0-6], keeping the logfiles for a week
+%u will append the day-of-the week [1-7], starting monday, rotating logfiles weekly
see also man date for more formats
Example: EXTDATE=`date '+%u'`

MAILPROG:
The name of the executable to send email from this environment. The program must support
the -s (subject) switch. Must distributions will use mail, that also supports the -s switch,
HP-UX does not, and has mailx for this purpose.
Example: MAILPROG=mailx

ERRORS:
The file where error patterns are kept. check_alert uses grep -if to scan for errors, making
the matching case INsensitive.
NOTE: DO NOT LEAVE THIS FILE EMPTY
Example: ERRORS=check_alert.errors

Example of check_alert.errors:
ora-
error

check_alert will read the contents of the file, defined in ORATAB, ignore all empty lines, comment
lines (lines, starting with the # sign), and undefined entries (lines, starting with *).
For each valid entry, the ORACLE_SID is determined, as well as the value of PACKAGE, if not defined.
Based on that, ORACLE_BASE is defined. ORACLE_BASE will be /oracle/$PACKAGE, if PACKAGE is defined,
or the first two subdirectories of ORACLE_HOME as defined in ORATAB.
According to the standards in place, the alert log file for ORACLE_SID can be found at
ORACLE_BASE/admin/dbORACLE_SID/bdump/alert_ORACLE_SID.log

If the alert log can not be found, an entry is made in the log report and the following
three possible locations are tried:
# $ORACLE_BASE/admin/oracle/db$ORACLE_SID
# $ORACLE_HOME/rdbms/log
# $ORACLE_HOME/admin/rdbms/log
If a directory exists, the log file is assumed to be in that directory.
If it still does not exist, an entry is made in the log report and checking is
skipped.
The log report is a temporary file in TMPDIR, named TMPDIR/check_alert_ORACLE_SID.log

If the alert log file is found, it is checked for errors. The error patterns to be checked
on are defined in check_alert.errors. Matching lines are copied to the log report file.
If no errors are found, a note is made in the log report as well.

As a last step, the alert log file contents is copied to a backup file, with an extension,
defined in EXTDATE. This will govern how long rotated alert log files are kept: if a single
day-of-week number is used, files will be overwritten after a week. Subsequently, if a
day-of-month schema is used, logfiles will be rotated monthly.
Using a naming schema with month or day-of-year naming schema is not advised.
The original alert log file is deleted (as Oracle opens and closes the alert log file on
writes, this can be doen without the need of recreating the file, and setting ownership).

The log report is then sent to the receipties, defined in MAILLIST, using the program, defined
in MAILPROG, based on the fact whether ERRORSONLY is set to Y or N.
If ERRORSONLY is set to Y, the report will only be sent if errors are found, as defined in
check_alert.errors. This is the 'no news means good news' option.

If ERRORSONLY is set to N, the report will be sent regardless of any errors found.

When done, the log report file is removed when the KEEPREPORT flag is set to N.
When testing, or in new environments, it may be beneficial to leave ERRORSONLY=Y, but
set KEEPREPORT=Y. In that manner, relevant files can still be viewed on the system, but
the DBA is not overwhelmed by emails, most of which just indicate all is well (I hope).