Rich Trapp
Manage Business Solutions
P.O. Box 2281
Fort Collins, CO 80522
970/224.1016
800/421.1016
fax: 970/416.1543
Reactive Trouble
Shooting
Finding the problem
Capturing output
Duplicating Aborts
Found what’s running
Recreate environment
Using command files vs
JCL
Image logging & roll forward recovery
Source Code Options
Compiler directives
COBOLII’s builtin Debug Module1
Debugging displays in lower case for easy deletion
CAUSEBREAK Intrinsic
DEBUG Intrinsic
Proactive Troubleshooting
Source Code Options
Use compiler directives
Use Builtin Debug Module
Save Compiler/Prep Listings
Manage Source Code
Don’t abstract errors! Keep the specifics!
Environmental Options
System wide
;SETDUMP=”loadinfo;tr,I,d”
Common abort routine in system libraries
Create spoolfiles/log files with error details
Common Gotchas
Code changes around IF/THEN/ELSE blocks
Check the Compiler listing
Copylibs, Include Files and Dictionaries
Alignment Issues
Switch Stubs
Common Problems
Errors
File Errors (FSERR)
Loader Errors
Image errors
Program error messages vs.
System error messages
Aborts
Stack overflow
Bounds violation
Data/Instruction Memory
Protection Traps
Hangs/Strange behavior
Is it looping or hanging?
How to Read an Abort Trace
Overview
Examples
Run Time Options
;SETDUMP=”cmds”
RUN options
Compatibility Mode
Native Mode
;DEBUG
Resources
On MPE/iX
FOS Utilities/Free stuff
Add On Products
On the Internet
HP3000-L
Online Manuals
Electronic Support Center
I will discuss reactive and proactive trouble shooting along with several resources that are available on MPE/iX. I will focus on COBOLII/XL, but most of the techniques described are easily ported to other languages.
What do I do once I’ve had an abort?
FILE myfile;DEV=LP
or
FILE myfile;DEV=DISC;SAVE;ACC=APPEND
:RUN program ;STDLIST=*myfile
or
:RUN program > *myfile
Build mymsg;REC=-80,1,F,ASCII;DISC=500;MSG
On terminal #1:
FILE mymsg,OLD;SHR;ACC=APPEND
RUN myprog > *mymsg
On terminal #2:
FILE mymsg;SHR
PRINT *mymsg
If terminal status request is sent, it will go to file if $STDLIST is redirected. This will cause vplus to hang waiting for the terminal response. If this happens:
1 Break and Abort the program
2 Reset the terminal to get out of block mode (alt-r in Reflections 1)
3 :print the $stdlist file
4 Copy the string of characters into your clipboard
5 Restart the program as before
6 Paste the contents of clipboard onto the hung screen
Configure your terminal (or emulator) to either print or save output to a file.
Enable LOG BOTTOM and run the program. Anything going to $STDLIST should be logged.
:SHOWPROC;JOB=#Snnn
UDCs: Use SHOWCATALOG;USER=!HPUSER to see what UDC files are in use.
File Equations: Use LISTEQ to see what file equations are set; copy these to a command file.
Save Intermediate files: LISTFTEMP @.@,2 to see what (if any) temp files may be being used.
Use to see what transactions happen and in what order (see also RAT/3000)
COBOL
$CONTROL VERBS, BOUNDS
Native Mode Pascal & Fortran
$CODE_OFFSETS ON, RANGE ON$
ENVIRONMENT DIVISION.
CONFIGURATION SECTION.
SOURCE-COMPUTER. HP3000 WITH DEBUGGING MODE.
PROCEDURE DIVISION.
DECLARATIVES.
REQD-section-name SECTION.
USE FOR DEBUGGING ON ALL PROCEDURES.
REQD-paragraph-name.
DISPLAY
"mypgm:"
DEBUG-LINE
" "
DEBUG-NAME " "
DEBUG-CONTENTS.
END DECLARATIVES.
REQD-if-none-already SECTION.
D DISPLAY "Pgm was compiled with
DEBUGGING MODE specified."
DISPLAY "NORMAL FIRST LINE OF CODE..."
STOP RUN.
To invoke the debug procedure, you
must use ;PARM=1 on the run:
:Run mypgm;parm=1
Make quick & dirty/temporary code changes in lower case so they can be easily located later.
Acts as if you’ve hit the <BREAK> key at this point in the program.
Drops into DEBUG in this point in the program if you have PM (or write access)
What can I do to minimize resolution time before I have a problem?
You may want to modify COBCNTL.PUB.SYS & use global file equation to point to modified version.
This compiler directive enables lots of extra runtime checking on data.
Used the variable COBRUNTIME set by SETVAR. Set this is system level logon UDC to “MMAAAAANN”
Perhaps use compile jobs or command files so others will be able to recompile & link properly. Save listings somehow so a recompile won’t be required if you have to read a stack trace.
If you need to recompile, you need to be sure you’re not using old code and re-introducing old bugs.
The more accurate the error message, the easier it is to determine the cause. (Even if it’s not “user friendly”)
Will cause every abort
to output useful information.
Make it easy to modify if it’s only in 1 place.
Can make locating errors faster than waiting for users to notify you.
Be careful not to end the block of code with a premature period.
Did that 1 line change you made really compile?
Source may exist that you don’t see directly.
16 vs 32 bit alignment & slack bytes
Don’t forget to modify
switch stuff if calling and called routines change!
These are messages which
may or may not cause the program to stop running. The program may continue
running or may choose to exit based on its error handling.
Errors that occur
regarding files.
Errors that occur when
trying to load and run a program.
Errors coming from the
IMAGE/XL subsystem intrinsics.
It’s important to know
if an error is coming from a subsystem used by a program or from the program
itself. Is this an error generated by
the program or a system generated error which is triggered by something the
program does?
These are errors which
cause the system to terminate the process and/or job which induced the error.
This is caused by trying
to use more stack space than the process is allowed.
This occurs when an
instruction tries to access memory that is not allocated to this process. It is
usually an indication of a bad parameter or corrupted variables being used
after they have been corrupted. This is
a compatibility mode error.
These errors indicate
that a native mode program is having a similar problem to a CM Bounds
Violation.
Repeat :SHOWPROC;JOB=#Snnnn & watch the CPU totals.
Does it lock any files or databases?
Stack trace is written from the bottom up. The oldest code is at the bottom, the most recent code is on the top. The last column of the trace is routine_name+$nnnn where nnnn is the offset from beginning of this routine.
Use :SETDUMP to get full trace.
Find the ‘highest’ (most recent) level on the trace that’s not error handling code (i.e. if “routine_name” isn’t yours or is obviously trap handling code)
Find verbs/code_offsets listing on compile for that code
Find the pb_loc column closest to (without going over) the offset listed in the abort trace. This will be the beginning of the line of code on which the error occurred. (or the call to the routine which actually had the problem).
No examples yet.
Defines how much memory
the program can use. Maximum setting is 31232; Larger numbers are accepted, but
rounded down to max. If this setting is
too low, stack overflow errors can occur.
If it’s too big you may see FSERR 74 “No room left in stack segment for
another file entry” (Very rare!)
This will move some file
stuff out of your program’s stack space and give you more room back. Also helps
FSERR 74.
Using ;MAXDATA=31232;NOCB
on the run is biggest stack you can get. If you still get stack overflow
problems with this on the run, it’s time to tweek source code.
This tells the loader to
try to resolve external calls the program makes in the order specified:
LIB=G means search in
SL.<program group>.<program account> first; then search
SL.PUB.<program account>; then search SL.PUB.SYS
LIB=P means to start
searching at SL.PUB.<program account>;then search SL.PUB.SYS
LIB=S means to only search SL.PUB.SYS
This causes a listing of
what routines are called & where they are found. This is very useful for
determining where a certain routine is coming from. The formal file designator
is SEGLIST and defaults to $STDLIST.
A list of xl names which
are searched in order for called external routines.
Similar to the behavior
of this for CM programs except the library must be called “XL” instead of “SL”.
Similar to the CM
version, but much more verbose.
This entry specifies a
routine to be used to resolve any unresolved external calls rather than
generating a loader error. A “safe” routine to use here is “resetcontrol”.
This rarely needed, but
can be used to solve a valid “NM STACK OVERFLOW” error. Similar to ;MAXDATA for
CM programs.
Will display the error message corresponding to a Subsystem and Error message number. Enter “M” for messages; The subsystem number; then the actual error number. You must know/guess the subsystem number.
2 CI Messages (CIERR xx)
8 CM
File system error Messages (FSERR xx)
9 CM Loader Messages (LDERR xx)
143 NM File System
Converting between hex, octal or decimal
;CHAR;HEX to look at data
;SUBSET= to see subsets of records
Query’s SAVE function to see buffers
Somewhat painful, but good method of testing specific IMAGE calls without writing a program. Good for duplicating IMAGE errors.
The poor man’s glance!
with “SUBSCRIBE HP3000-L firstname lastname”
Electronic Support Center