Consolidated Console and System Management

of the HP3000 Using HP OpenView ITO

Ernest M. Wolshin

Harvard Pilgrim Health Care
100 Hayden Avenue
Lexington, Mass 02173
781-676-3099
781-676-3832
E-mail: ernie_wolshin@hphc.org


Introduction

Today I would like to review Harvard Pilgrim’s experience with the implementation of OpenView ITO to control and monitor its HP3000 servers. I will start with a short history of the data processing environment at HPHC and the initiation of the OpenView project. Then I will review the role of the console in computer operations for those who may not be Operators or may be somewhat removed for day to day operations.

Next we will look at OpenView for the HP3000 in detail. We will examine how OpenView interfaces to the HP3000, how it collects data, and message processing in OpenView.

Lastly we will look at viewing messages using the Active Message Browser, the View/some command, and message browsers derived from node icons and message group ions. We will end with an example about setting up message filters in OpenView and some tips on the construction of message filters.

Background

Harvard Community Health Care is the top HMO in the Northeastern United States. It was formed from the merger of Harvard Community Health Plan and Pilgrim Health Care in 1995. Both organizations, at the time, used the HP3000 for their data processing needs. Pilgrim Health Care employed seven HP3000’s and Harvard Community Health Plan used two HP3000’s. After the merger, five additional HP3000’s were purchased. Two data centers were maintained until June of 1998. In the fall of that year, they were combined into a single data center.

The new data center housed fourteen HP3000’s under one roof and a host of other technologies also including IBM, VAX, HP-UX, NT and Digital Unix systems. Only one or two Operators focus on the HP3000 and its operation during a shift.

Those of you who have worked closely with the HP3000 know that the Operators “live” on the console. In our data center, there are also several workstations and terminals that are used to perform tasks on the HP3000 that are not specific to the console.

The need for an enterprise management solution was recognized as part of the Data Center consolidation. At that time, we chose HP OpenView as the tool that would give us the functionality and flexibility to manage our operations. We contracted with HP for a consulting engagement in the fall of 1998. The consultant was on site for two periods. First for five days and latter for eight with a week hiatus in between. HP provided tools in the form of scripts that could be used to monitor HP3000 resources and processes. There was also extensive one-on-one training on how to configure and use HP OpenView. Latter, two HPHC staff members were sent to an OpenView course at HP. During the fall of 1998, we gained practical experience in the configuration HP OpenView. It was time of experimentation in which we learned how to use many of OpenView’s features.

During the fall and the winter of 1998-1999 certain important lessons were learned about enterprise management.

1. Enterprise management is not for the Timid!

2. Enterprise management is not a turnkey system.

3. Each site must do a significant amount of configuration work to tailor any enterprise management solution to their operation.

4. Training the operators and technical staff is an essential key to success.

5. In-house staff must serve the role of on-going development and training.

6. Development is on-going. There is no final end point, only the next milestone.>

7. Management must strongly support the effort if it is going to succeed.

The HP3000 Console

Next, let’s take a look at the HP3000 console. The Operators “live” on the console. Almost all software, including operating system software, custom software, and third party software send messages to the console. The messages that are observed on the console are limited because the scrollable memory of the console is limited. This is true if the console is a PC or a terminal. New messages push the old messages out of memory at a fairly rapid rate.

There are certain key events you don’t want to miss in the life of an HP3000 or any other computer for that matter. For our operation, the key events we monitor with OpenView are the following:

Resource or Process

Critical Events

1. Backups

Bad tapes, mount requests

2. Tape drives

tapes stuck in drive

3. Printers

Forms request, out of paper statues, mech. problems

4. Job Schedulers

Hung schedules, abended jobs

5. Network software and hardware

Connection problems

6. Mirrored disks

Lost mirrored pairs

7. Disks

Bad sectors

8. Disk space

Low disk space

9. Databases

Database shadow problems, full datasets

10. Programs

Critical success or failure messages

How OpenView Connects to the HP3000

Next, let take a look at how HP OpenView connects to the HP3000. There are two parts to the OpenView software. There is software that runs on a Management Server. In our case, this is an HP9000/K420. It is on the same network segment as the HP3000s it manages. This is not a requirement for OpenView, only the way we have chosen to implement it. The OpenView management server could be far way in some other city in the case of a widely distributed system. The Management Server is accessed using an HP700 workstation, or a HP X-term or a PC running Reflection-X.

On the HP3000 there is an OpenView account called OVOPC that contains the HP OpenView software. The OpenView job that runs continuously is called OPCAGTJ,AGENT.OVOPC. You can check on the status of the OpenView agent by doing a SHOWJOB to see if it is running and by running the program

opcagt.bin.ovopc –status.

This will show you what OpenView processes are running on the HP3000.

How OpenView Collects Information

On the HP3000 there are four primary processes by which OpenView collects information about the system. First is the console logfile interceptor. Console messages are intercepted, copied and sent to the OpenView agent. The messages still appear in their entirety on the physical console and the physical console still has all its functionality. The OpenView agent runs the console messages through a series of filters. These filters pick out important messages that are then sent to the management server for display.

Next is the logfile reader. Any ASCII file can be designated as a logfile for OpenView to read. The logfile must have sequential records. OpenView can run a script to perform an evaluation, generate a record from the evaluation process and add it to the logfile. Then it reads the entry and sends it to the OpenView agent. The logfile can also be read by sensing a change in the modification time and date. MPE utilities such as SHOWJOB, DISCFREE, SHOWOUT, DBUTIL, VOLUTIL, etc. can be run and the output reformatted to produce one line of text per event or resource. Also, a job can send output to a disk file which can be designated a logfile as well. Logfile readers are used to monitor disk space, volume set status, mirrored disk status, and spoolfile queues.

The third way OpenView monitors activities on the HP3000 is through the use of threshold monitors. Threshold monitors are scripts that return a value to OpenView about the status of a resource or process. OpenView can run these scripts at intervals and compare the returned value to a threshold. Thresholds can be descending, ascending or a triggered by transition in value either way. We use this method to monitor dataset fullness, outstanding tape replies, forms requests, the presence of standing jobs, and Netbase status.

The fourth method by which OpenView gets information from the HP3000 is by messages sent from the opcmsg facility. Messages can be sent directly to the OpenView agent from a job stream. The syntax is as follows:

Opcmsg.bin.ovopc “a=app o=msggroup s=severity ‘msg_text=text’”

Where

This program can be run in a job stream, inside an “if” statement, that tests for an error condition and spawns an OpenView message in response to that error condition. For an example consider the portion of the following job;

There is one other way OpenView can ascertain the state of the HP3000. This is through the use of SNMP traps. SNMP services can be run on the HP3000 and be polled by the management server. This is the method used to tell if the HP3000 is up or down, at least as far as the network is concerned. It is even possible to write you own M.I.B.’s for specialized monitoring. We have not implemented this as yet, however.

Display of Information in OpenView

Lets take a look at how information is displayed on the management server. This is the OpenView desktop that is seen on a HP700 workstation or X-term when you log into the Management Server and bring up OpenView. If you use Reflection-X on a PC, only one or two window at a time can be displayed with any degree to detail unless you use a large monitor. There are four basic windows that display information in OpenView.

First and foremost is the Active Message Browser window. This is where all the messages from all the HP3000 systems are displayed. This is our consolidated console screen. The Active Message Browser displays all the messages that were not suppressed by the OpenView agent on the HP3000. Messages from the console interceptor, logfile reader, threshold monitor, opcmsg messages and SNMP statuses are displayed.

There are also three windows with various icons The three windows are the Node Bank window, the Message Group window and the Application Group Window. The Node Bank window contains an icon for each system that is monitored. The Message Group Window represents categories of messages. When a message is filtered and sent to the management server, a message group label is attached. It is a useful handle by which groups of like messages can be displayed together. The Application Group window contains icons that can perform tasks on a managed node. Click on the MPE icon to open up a new window of MPE applications. Drag an HP3000 icon over the application icon for disk space and a window will open up showing the output from a discfree on that system.

How you arrange these window are up to you. This is how we configured ours.

In the Browser, on the left-hand side, are colored bars under the heading of Sev. These are the message severity labels. A severity label is attached, when a message is filtered on the HP3000 and sent to the Management server. The label is an attribute that is determines when you design a message filter. Red indicates critical messages like a downed systems or bad disks. Orange indicates a major problem such as a halted production schedule. Yellow is a minor problem such as several unanswered tape replies. Blue is a warning label such as low disk space. Green means that this is a “normal” message. Green messages usually are informational and do not represent the presence of a problem. The next column from the left is headed by a funny acronym called SUIAONE. Each letter in this acronym stands for a message attribute that was attached by the filter that matched the message. The letters signify the following:

Next are columns for the date, time and sources of the message.

The sixth, seventh and eighth columns from the left contain three message labels that are assigned by the message filter on the HP3000. These are Application, MsgGroup and Object. You can use these labels for any short label information you want the message browser to display. Keep in mind that these labels are useful for grouping messages into like categories.

On the far right side of the browser window is the message that was passed to the Management Server from the HP3000. Many of these messages start with a label such as RRUN037 or SYS012. When the message filter processes the message on the HP3000, it is possible to rewrite the message that is sent to the message browser. This can make an esoteric message much more readable. One item I add to each re-written message, is a label that tells what filter did the processing. When you have over 350 filters processing messages, it can be hard to tell which filter is responsible if you don’t add a filter designation to the message displayed. Don’t confuse this label with the MsgGroup, Application or Object label that OpenView assigns.

The message browser displayed when an OpenView session is first started is not the only way to view system message. Other message browsers can be opened that display only selected messages. The icons in the Node Bank and MsgGroup window can be used to launch browser windows with selected messages. Right click on a node icon and a pull down menu will appear. Drag down to message browser. A new browser window will open up with only messages from that node. Right click on a message group icon and drag down to message browser. A new browser window will open up that contains only messages that have been labeled with message group name. Now you can see how useful the message group attribute can be. These additional browsers are updated in real time just like the main browser.

This work well if you are interested in all messages from a node or message group but what if you want to select by date or even a particular message phrase? This is possible too. In the main browser note the menu items across the top. Left click on the View menu and drag down to Some….

A window opens up that allows you to select messages by any number of message attributes. Messages can be selected by node, message group, application, object, severity code, a message phrase, data and time, matched or unmatched ( by a filter) and owned or unowned. Select multiple criteria to create a browser window of your own design. It can stay up continuously and display new messages that meet your criteria. This is a great way to monitor important events or processes. It also demonstrates the use of message labels such as MsgGroup, Application and Object. If all messages from backups are assigned the label “Backup” by the filters on the HP3000, you can easily monitor a backups for a particular system or systems. If you want to see all the NETIPC error messages from a point in time when you knew there was a problem, just enter the phrase NETIPC and select the date and time of the event. Browsers can find the critical messages that you need to know.

There is one more type of browser that is used. This is the History Browser. Messages in the Active Message Browser can be acknowledged. When an Operator acknowledges a message in the Active Message Browser, that message is sent to the history database. From this location, only the History Browser can view it.

The history browser allows you to look back in time even further than the Active Message Browser. From the Active Message Browser window, click on View and drag down to History. A new browser window opens up that shows all the acknowledged messages on the system. We keep a about 50,000 messages in our database. This should allow a retrospective view of approximately one month depending on the number of systems that are monitored.

In summary there are three types of browser windows; the main browser or Active Message Browser, the filtered browser and the History Browser. It is not unusual to have several browser windows open simultaneously.

The Message Unit in OpenView

Now that we have seen how to look at messages, lets take a closer look at the whole life cycle of a message. Here is the sequence:

1. Messages are created on the HP3000.

2. The OpenView agent on the HP3000 picks up messages.

3. Messages are matched against a set of selection filters.

4. If the message filter does not suppress the message, it is sent to the Management Server.

5. Automatic actions are performed if any are attached to the message
e.g.execute a script, send a helpdesk ticket, beep someone

6. Messages are displayed in the “Active Message Browser” with date, time, severity, and system of origin. The original message text may be re-written to produce text that is more meaningful.

7. The Operator observes the message.

8. The Operator decides if any operator initiated actions should be performed: ex: reply to a tape request or forms request.

9. The Operator may read any attached instructions in the message.

10. The operator adds any annotations to the message such as who was called and what was done.

11. The Operator acknowledges the messages.

12. The message leaves the active message browser and goes to the history database.

13. The message can be viewed in the History Browser if there is unfinished business.

We have talked about message attributes when we discussed the various browser windows. Now that we discussed the life cycle of a message and touched upon more attributes, it is a good time summarize all the attributes and objects of a message. These are as follows:>

Message Attributes and Objects

Message Attribute

Message Object

Node

Automatic actions

Time

Operator initiated actions

Date

Instructions

Severity (importance)

Annotations or Notes

Application label

Message group label

Object label

Ownership

When you click on a message in the message browser, a window opens up showing these message attributes. This window contains fields that correspond to all the various message attributes and objects listed above.

Message Filters in OpenView

We have talked a lot about message filtering to this point. Let’s take a closer look at message filters.

The construction of message filters is the most important task that you will perform in OpenView.

A message is one line of text on the console. Each line of a multi-line message on the console is a single message to OpenView. We only want to see a fraction of all the console messages. Just how selective do we want to be? On the average, there are 7,000-15,000 messages that pass over the console each day for the HP3000s in our data center. At present, there are fourteen HP3000’s which we wish to monitor. That equates to about 140,000 lines of messages a day! Excluding backups, we only want to see 5-10 important message a day. Therefore OpenView should reject 99.9% of all console messages and pass only 0.1% of the messages to the Active Message Browser. Message filtering has to be very selective to do this. The criteria for critical message selection have to be even more selective.

Lets look at the process for building message filters. First, collect some console logfiles. Next look for message patterns. In the message patterns, there are variable parts and constant parts. The variable parts are job numbers, job names, error numbers, and some message phrases.. Next decide on the messages to detect and the messages to ignore. There should be far more messages to ignore than messages that are important.

Select the kind of wild cards you will use. They are as follows:

Wild Card Character

Type of character

*

wild card any part of a message, space, words, numbers

@

Wild card a character string

#

Wild card a number or string of numbers

S

Wild card separators such as spaces or tabs

There are other operators that are used in creating a message matching expression.

Operator

Function

[ ]

Enclose groups of expressions

|

Logical “or”

!

Logical “not”

< >

Encloses a wild card and declares a variable.

With these operators and wild cards we can build our message filter. Consider the representative message.

NetIPC ERROR in VT; Job: #J1543; PIN 763; Info: 1

The phrase “NetIPC ERROR in VT:Job” is a constant from instance to instance. The job number, pin number and info number will vary however. Substitute wild cards for the variable portions. After substitution, the expression looks as follows:

NetIPC ERROR in VT; Job #<*>; PIN <#>; Info <#>

A “*” has replaced the letter J and the job number and “#” have replace the pin numbers and info numbers.

Next, assign variables to the wild card portions. The expression now looks like this:

NetIPC ERROR in VT; Job #<*.sess>; PIN <#.pin>; Info <#.info>

After the wild card, place a period and the name of the variable has been placed. The wild card and the variable name must be inside the “ < > “. If this expression matches a message, you can use the variables to print a modified message in the Active Message Browser. For example;

A NetIPC error has occurred in a VT session for Job #<sess> and Pin number <pin>. The info number is <info>

This expression will be displayed with the session number, pin number and info number from the original message. The message has been rewritten to be more user friendly. This is only a brief introduction to the pattern matching language for message filtering. There are more advanced expressions that use inequalities and more complex substitution. The documentation from HP for the OpenView software reviews this. Below is an example of a screen used to specify the above message filter. The whole re-written message scrolls beyond the Message text window.

Harvard Pilgrim employees over 350 message filters in the console filter template for the HP3000. About 120 pass messages to the Active Message Browser and the rest suppress messages.

From our experience I would like to offer the following tips and warning concerning message filters.

  • Configure the console template to pass unmatched messages. That way you won’t miss critical but unexpected messages.
  • Passing all unmatched messages means you will have to do a very thorough job of blocking extraneous messages.
  • Group you filters by function or source. Make it easy to find the filter you need to modify or verify.
  • Test every filter against a logfile extract to make sure it will work. OpenView allows you to test a single filter or a whole template against a logfile and see what is left unmatched.
  • After adding several new filters, test the whole template to make sure it will still pass unmatched messages.
  • Make use of message labels such as MsgGroup, Application and Object to simplify the display of groups of like messages.
  • QA in OpenView

    QA in OpenView should not be overlooked. If no messages appeared on an HP3000 console for even five minutes, this would appear strange to most Operators. This is especially true on busy systems. But what if no messages for a particular system are seen in OpenView. Since we only allow important messages to be displayed, no messages might mean everything is fine and nothing needs attention. How can you be sure this is the case?

    I would recommend sending test messages to the OpenView agent on regular basis. Depending on your operation, you might want to send a series of test messages daily. Unexpected messages are the most important messages to test. Send a tellop message that contains a string that is not matched by any message filter. Is it displayed? Hopefully it is. You don’t want to miss any important but unexpected messages.

    To test the message filters, run a job that send messages to the console that contain critical phrases. You can also place test entries in a logfile and test your logfile readers. It is also important to test your notification systems like computer generated beeper messages, automatic helpdesk tickets or E-mail messages. You should do this to have more confidence that your system work when you need it.

    Author | Title | Track