The usual method of identifying problems is to characterize the situation in
which the problem occurs and then investigate which of the possible causes are
actually responsible for the problem. Finding the cause is often sufficient to
suggest the resolution of the problem. For example, assume that the problem is
characterized as "the user is unable to open a line with the DSLINE
command." A possible cause is that the user entered a command using incorrect
syntax. You would resolve the problem by correcting the command and reissuing
it. However, if the syntax was correct, you would have to look for another
possible cause, such as an inactive link or a failure of the remote node.
Thus, in most cases you start with the characterization of
the problem and investigate the possible causes. The difficult part
of troubleshooting is to identify the actual cause of the problem.
Once you know the actual cause, you can take the appropriate action
to resolve the problem.
Characterize the Problem
It is important to ask questions when you are trying to characterize a problem.
Start with global questions and gradually get more specific. Depending
on the response, you ask another series of question, until you have
enough information to understand exactly what happened.
Key questions to ask are as follows:
Was an error message generated? Use the NS 3000/iX Error
Messages Reference Manual to look up the cause of the error and
take the action suggested. If this does not resolve the problem,
continue with the next question.
Is the problem isolated to one user or program? If so, continue to
the next question. If more than one user is involved, proceed to
question 6.
Did the user perform the operation correctly? Was syntax correct?
Does the user have the correct logon and authority to use the command
or service? Correct any problems found. If the operation was correct,
continue with the next question.
Did the problem occur while the user was running a program? Were
there program errors? If so, investigate and correct the program
errors. Otherwise, continue with the next question.
Did the problem occur while attempting to open a line or transmit
data? If so, investigate the connection between this system and the
remote system.
If more than one user is involved, does the problem affect all
users? The entire node? If so, has anything changed recently? Some
possibilities are:
New software and hardware installation.
Same hardware but changes to the software. Has the
configuration file been modified? Has the MPE/iX configuration
been changed?
Same software but changes to the hardware.
Do you suspect hardware or software?
It is often difficult to determine whether the problem is hardware or
software related. Symptoms that mean you should suspect the hardware
are:
Bad LAN card or PSI dumps.
Link level errors, either returned to the user or logged to
the console. This includes CI errors, NMERR errors, power fails,
and link shutdowns.
Lost data — data is sent but not received at the link
destination. (This could also be caused by a software
problem.)
Symptoms that mean you should suspect the software are:
Logging messages at the console.
Network Services errors returned to users or programs.
MPE/iX file system (FSERR) or command interface (CIERR) errors
(except "Remote Not Responding" errors).
Data corruption.
Terminal hangs.
Intermittent errors.
Network-wide problems.
Identify the Cause of Problems
The type of investigation that you use to identify the possible causes of a
problem depends on whether the problem affects one user or an individual
situation, or if the problem is node-wide. Once you have the answers to the
questions listed previously, use the flowchart in
Figure 4-1 "Characterizing the Problem" as
a guide and see Chapter 5 "Common Network Problems" for
a problem resolution strategy.
Figure 4-1 Characterizing the Problem