HPlogo STREAMS/UX for the HP 9000 Reference Manual > Chapter 6 Debugging STREAMS/UX Modules and Drivers

Using adb

» 

Technical documentation

Complete book in PDF

 » Table of Contents

 » Index

This section describes how to use adb on core dumps obtained following a system crash. See "Generating and Retrieving System Core Dumps" for information on how these dumps are obtained. adb can also be used to examine a system that is currently running.

See the adb(1) man page or ADB Tutorial for more information.

Invoking adb

When using adb on a system core dump, you must use the "-k" option. This option will tell adb to treat the core dump as a system core dump instead of a user process core dump, which is organized differently. For example, to call adb on the dump pair vmcore.1 and vmunix.1, perform the following:

adb -k vmunix.1 vmcore.1

When using adb on a running HP-UX system, you also use the "-k" option, and use /stand/vmunix as the object file and /dev/mem as the core file:

adb -k /stand/vmunix /dev/mem

You will probably need to be superuser to access /dev/mem. Because you are examining a running (and continuously changing) system, adb will not be able to set you up in any specific process context, but you will be able to examine kernel global variables.

Context on Entry to adb

adb maintains a set of registers corresponding to the registers of the machine. The adb command $r will print out the values of these registers. When adb is invoked on a system core with the -k option, it sets these registers to the values of the machine registers at the time the system core dump was taken. These register values are not the values the registers contained at the point the panic or trap occurred. Instead, they are the values the registers contained at the time the kernel started dumping a copy of physical memory to the swap area. How to use these "dump time" register values to determine the state of the registers at the time the trap or panic occurred will be described later. These "panic time" register values enable the user to examine the context of the process that was running at the time of the system crash.

Debugging Hung Systems

If the system core dump is from a transfer of control (TOC) of a hung system, adb will be unable to determine the "dump time" or "panic time" register values. In these cases, adb can still be used to determine the contents of the kernel message buffer (see "Finding the Panic Message"), and to examine kernel global variables (see "Obtaining Important Kernel Global Variables"), but it will not be able to give you a stack trace or context for the process that was running at the time of the system crash.

It is especially important, when looking at a dump from a system which appeared to be hung, to check the kernel globals freemem, freemem_cnt, and avenrun. These variables may indicate that your system was out of memory or was overloaded. (See "Obtaining Important Kernel Global Variables" for more information.)

It can also be helpful, before doing a TOC on a system which appears hung, to determine how complete the system paralysis is. The following table describes hang symptoms, from the least severe to the most severe. This table may help you determine where your system fits on this continuum.

Symptom

Explanation

Some processes, like your shell or your tests, do not run, but other processes are running.

Your system is not hung, but there is some other problem holding back your processes. If you have a terminal session that is working, use strdb and adb to look at the kernel and the STREAMS/UX subsystem state.

You cannot login, either locally or remotely.

Your system may not be hung, its networking software state, terminal I/O or getty processes may be deadlocked in some way. If you have a terminal session that is working, use strdb and adb to look at the kernel and the STREAMS/UX subsystem state.

You cannot ping your system.

Your system may not be hung, its networking software state may be deadlocked in some way. If you have a terminal session that is working, use strdb and adb to look at the kernel and the STREAMS/UX subsystem state.

Carriage returns do not echo on the console or on other login sessions.

Your system is hung, but is probably TOC-able. TOC the system and examine the kernel globals in the dump.

Your system has an LED activity display which is not being updated; it is showing no system activity at all.

Your system is hung, but is probably TOC-able. TOC the system and examine the kernel globals in the dump.

Your system has an access port enabled, and typing CTRL-b on the console gives no response, or you attempt to TOC a system without an access port with no success.

Your system is ignoring very high-level interrupts, and it is so thoroughly hung that you will probably be unable to TOC it. Hangs as severe as this are extremely rare. Hit the system reset button, and try to debug the problem using other methods such as code reviews, panics, or printfs.

Finding the Panic Message

The kernel maintains a circular message buffer into which text can be printed using the kernel printf, msg_printf, and cmn_err routines. At the time of a panic, a panic message is printed to this buffer. A stack trace consisting of instruction addresses in hexadecimal is also printed out, as well as the current instruction and data addresses being accessed at the time of the crash. Other interesting information may also be located in the buffer, such as system boot-up messages and kernel error messages that may help pin down the cause of the panic. To print out this buffer, invoke adb on the system dump and type the following:

msgbuf+10/s

Examples of msgbuf contents are included in the examples at the end of this chapter.

Interpreting the Panic Stack Trace

adb can be used to translate the hexadecimal stack trace printed after the panic message into procedure addresses. For each hexadecimal number in the stack trace, use the adb i command to determine where in the kernel the address occurs. For example, the hex stack trace below can be deciphered as follows:

PC-Offset Stack Trace (read across, most recent is 1st):
0x0016da70 0x000e5a68 0x000d34cc 0x0009ea14 0x00099714 0x0009
2fdc
0x0006e0c8 0x0006dbb8 0x0006d2a8 0x001954e8 0x00194fa4 0x000b
7e24
0x001846d4 0x00181730 0x00156538 0x00156af8 0x001567b8 0x000e
6d80
0x000d3aac
End Of Stack

In adb (text preceded by "#" are comments):

0x0016da70/i                            # use of adb i command
panic+30: addil -1000,dp # adb's response
0x000e5a68/i
trap+0xADC: b trap+1004
0x000d34cc/i
$call_trap+20: rsm 1,r0
0x0009ea14/i
flushq+60: ldbs 0xD(r21),r22
0x00099714/i
q_free+1C: ldw -0xA4(sp),r31

Manual Stack Back-Tracing

You may need to use adb to manually back-trace your stack. This is necessary when the hexadecimal stack trace printed by panic is incomplete. For example, panic may print a few hex addresses and then the message:

stktrc:  cannot find descriptor

or

stktrc: cannot find rp

You may also need to do a manual stack back-tracing if you wish to find out how the arguments the routines in your stack trace were called. You will need the value of the stack pointer for each routine in the stack and manual stack back-tracing will tell you these values.

PA-RISC Procedure Calling Conventions Overview

The following is a very brief overview of the PA-RISC procedure calling convention. More information can be obtained from the PA-RISC Procedure Calling Conventions Reference Manual.

PA-RISC machines have 32 general use registers. These registers are identical physically, but are assigned different roles by the PA-RISC operating systems and compilers in order to enable procedure calls to take place efficiently and consistently. The following table lists these special roles:

Table 6-1 General Use Register Roles

Register

Role
r0Value is always zero.

r1

Scratch register.

r2

Return pointer, also known as rp. This is the instruction address the called procedure will return to when it is finished executing.

r3 - r18

Callee saves. If the called procedure wishes to modify any of these registers, it must save the original contents on its stack and restore the contents before returning to the caller.

r19 - r22

Caller saves. The called procedure is free to modify these registers without saving the original contents. If the calling procedure wants to retain the contents, it must save them before making the procedure call and restore them after the call returns.

r23 - r26

First four procedure arguments, also known as arg0, arg1, arg2, and arg3. The calling procedure loads the first four procedure arguments into these registers before making the procedure call.

r27

Global data pointer, also known as dp.

r28 - r29

Procedure return values, also known as ret0 and ret1. The called procedure loads the return values into these registers before returning.

r30

Stack pointer, also known as sp.

r31

Millicode return pointer, or scratch register.

 

The only registers you need to be concerned with for manual stack back-tracing are r2 (rp) and r30 (sp), although the other registers become important when trying to determine what arguments a procedure in the trace was called with.

In order to implement these register roles, at the start of each procedure a stack frame is allocated and callee save registers which the called procedure is planning to modify are stored in the stack frame. The stack frame is allocated simply by incrementing the sp by the size of the stack frame needed, using either the stwm or ldo instruction. For example, below are the instructions which create the stack frame for ioctl. Numbers in brackets ([ ]) refer to the notes below.

ioctl:          stw     rp,-14(sp)    [1]
ioctl+4: stwm r3,100(sp) [2]
ioctl+8: stw r4,-0xFC(sp) [3]
ioctl+0xC: stw r5,-0xF8(sp) [4]
ioctl+10: stw r6,-0xF4(sp) [5]

[1] Store return instruction address at 0x14 above the caller's stack pointer. Note that the return address is stored in the caller's stack frame, not the callee's stack frame.

[2] Store the contents of r3 at the current sp, then allocate the stack frame by adding 0x100 to sp. The stwm instruction stands for store word and modify.

[3] Store the contents of r4 at sp - 0xFC, just below where you stored r3.

[4] Store the contents of r5 at sp - 0xF8, just below where you stored r4.

[5] Store the contents of r6 at sp - 0xF4, just below where you stored r5.

The instruction ldo (load offset) can be used instead of stwm for allocating the stack. For example:

doadump:        stw     rp,-14(sp)  [1]
doadump+4: ldo 30(sp),sp [2]

[1] Store return instruction address in caller's stack frame.

[2] Add 0x30 to the current value in register sp and store the result in sp, allocating stack frame.

Basic Stack Back-Tracing

Given the stack pointer, sp, and the current instruction address, pcoqh, it is possible to get the previous stack pointer and instruction address. The starting values for sp and pcoqh are obtained from the adb $r command. As mentioned above, when adb is invoked on a system core with the -k option, it sets these registers to the values of the machine registers at the time the system core dump was taken. The $r command prints out these registers. Below are the first few lines of the $r display.

pcsqh 0         pcoqh     24B34 doadump+0xEC
pcsqt 0 pcoqt 0 _fp_status
rp 0xDBF48 panic_boot+354

arg0 1 arg1 0xC57B arg2 2000 arg3
9BD70152
sp 20F380 ret0 303847 ret1 797 dp 1F6000

There are four steps to back-tracing a stack:

  1. Determine the size of the current stack frame.

    The size of the current stack frame is simply the amount the sp is incremented at the entry to the current procedure. To find that number, use adb to print out the first few instructions of the current procedure. To determine the initial current procedure, look at the value of the register pcoqh, which appears at the end of the first line of the $r output. In most cases, this initial procedure will be doadump.

    doadump/3i
    doadump+3: stw rp,-14(sp)
    ldo 30(sp),sp
    mfctl iva,r22

    doadump's second instruction is an ldo which increments the stack pointer by 0x30, so doadump's stack frame size is 0x30.

  2. Determine the previous stack pointer.

    The previous stack pointer is the current stack pointer, minus the current stack frame size. adb can be used to keep track of the sp register by calculating the previous stack pointer using the following adb commands:

    <sp-0x30>sp     [1]
    .=X [2]
    20F350 [3]

    [1] Take the current value of the sp register, decrement it by 0x30, and store the result back into the sp register. See adb documentation for more information on adb registers and the "<" and ">" operators.

    [2] Print out the new value of sp. This information should be saved in case you need to find out the contents of registers which have been pushed onto the stack frame. See adb documentation for more information about the concept of ".", the current location in the core file.

    [3] adb output in response to the previous command, .=X

  3. Find the current return pointer.

    Your current procedure is doadump, and you have just set sp so that it is the same value it was when doadump was first entered, before the ldo instruction was executed. Recall that doadump's first instruction is:

    stw     rp,-14(sp)

    Because you have just set sp to the same value it had when doadump's first instruction was executed, you can find the rp by looking at what is in sp-0x14:

    <sp-0x14/X                                    [1]
    crash_monarch_stack+1EC: 0xDBF48 [2]

    [1] Print out the value of the location sp-0x14 in hexadecimal.

    [2] adb's response. crash_monarch_stack+1EC can safely be ignored. 0xDBF48 is the instruction address which was in rp.

  4. Find out which procedure the return pointer points to.

    The adb i command will tell you this:

    0xDBF48/i                                              [1]
    panic_boot+354: comibt,=,n 0,ret0,panic_boot+368 [2]

    [1] use of the i command

    [2] adb's response

Notice that the $r command has already indicated that rp corresponds to panic_boot+354.

To continue back-tracing the stack, iterate the four steps shown above. Here is the adb sequence of commands and responses to trace the next two levels back in this stack. Text preceded by "#" are comments.

panic_boot/3i                       # look at beginning of
panic_boot: # panic_boot for stack frame
panic_boot: stw rp,-14(sp) # size
stwm r3,80(sp) # stack frame size is 0x80
stw r4,-7C(sp)
<sp-0x80>sp # calculate new sp
.=X # print out new sp
20F2D0
<sp-0x14/X # find rp in caller's
crash_monarch_stack+16C: 0xDB938 # stack frame
0xDB938/i # what instruction address
boot+24: addil 0,dp # does rp correspond to?
boot/3i # look at beginning of boot
boot: # for stack frame size
boot: stw rp,-14(sp)
stwm r3,80(sp) # stack frame size is 0x80
stw r4,-7C(sp)
<sp-0x80>sp # calculate new sp
.=X # print out new sp
20F250
<sp-0x14/X # find rp in caller's
crash_monarch_stack+0xEC: 1518A4 # stack frame
1518A4/i # what instruction address
panic+0xF0: ldw -94(sp),rp # does rp correspond to?
panic/3i # look at beginning of panic
panic: # for stack frame size
panic: stw rp,-14(sp)
stwm r3,80(sp) # stack frame size is 0x80
stw r4,-7C(sp)

If you are doing a manual stack back-trace in order to find out values of registers which have been pushed onto the stack, it is useful to save the results of the four steps at each iteration for future reference. A table such as the following can be helpful:

sp

pcoqh

Procedure Address

Frame Size

20F380
20F350
20F2D0
20F250
24B34
0xDBF48
0xDB938
1518A4
doadump+0xEC
panic_boot+354
boot+24
panic+0xF0
0x30
0x80
0x80
0x80

Exceptions to the Four Steps

The four basic steps of stack back-tracing have some exceptions:

  • panic: If your procedure address is in panic, you need to take special steps to find out the true value of your current stack pointer. Instead of being the previous sp minus the previous frame size, panic's sp can be found at location panic_save_state. Do the following to find the value using adb and reset adb's copy of sp:

    panic_save_state/X                        [1]
    panic_save_state: [2]
    panic_save_state: 7FFE6F48
    7FFE6F48>sp [3]

    [1] Ask adb to print out location panic_save_state in hex.

    [2] These two lines are adb's response. panic's actual sp is 7FFE6F48.

    [3] Reset sp to the correct address.

    Now that you have panic's real stack pointer, the other steps in the back-tracing process can be executed normally. Text preceded by "#" are comments.

    <sp-0x80>sp                         # calculate new sp
    .=X # print out new sp
    7FFE6EC8
    <sp-0x14/X # find rp in caller's
    7FFE6EB4: 0xDF108 # stack frame
    0xDF108/i # what instruction address
    trap+0xA28: b trap+0xF18 # does rp correspond to?
    trap/3i # Look at beginning of trap
    trap: # for stack frame size
    trap: stw rp,-14(sp)
    stwm r3,100(sp) # stack frame size is 0x100
    stw r4,-0xFC(sp)
    <sp-0x100>sp # calculate new sp
    .=X # print out new sp
    7FFE6DC8
    <sp-0x14/X # find rp in caller's
    7FFE6DB4: 0xD0BD4 # stack frame
    0xD0BD4/i # what instruction address
    $call_trap+20: rsm 1,r0 # does rp correspond to?

  • $call_trap, $call_int, $ihndlr_rtn, $thndlr_rtn, $RDB_trap_patch, $RDB_int_patch: These procedures do not follow the ordinary procedure calling conventions. They are written in assembly language, and are used to create a save state structure which saves the values of all registers at the time of a trap or an interrupt. The save state is then passed to trap() or the appropriate interrupt routine. The save state starts at sp - 0x230, and you can retrieve the previous stack pointer and current pcogh from the save state, as shown below. The offsets into the save state are for the 10.0 release, and may change from release to release.

    <sp-0x230>sp                         [1]
    <sp+0x84/X [2]
    7FFE6C1C: 96B70 [3]
    <sp+0x78/X [4]
    7FFE6C10: 7FFE6B98 [5]
    7FFE6B98>sp [6]
    96B70/i [7]
    qenable+10: ldws 0(r20),r21
    qenable/3i
    qenable:
    qenable: stw rp,-14(sp)
    ldo 80(sp),sp
    stw arg0,-0xA4(sp)

    [1] Reset sp to point to the top of the save state structure.

    [2] Save state structure + 0x84 is the location of the pcogh.

    [3] adb's response -- 96B70 is the return instruction address.

    [4] Save state structure + 0x78 is the location of the sp.

    [5] adb's response -- 7FFE6B98 is the current stack pointer.

    [6] Reset sp to the correct value.

    [7] Continue to iterate the four basic stack back-tracing steps.

The table of results from the back-tracing so far should look like this:

sp

pcoqh

Procedure Address

Frame Size

20F380
20F350
20F2D0
7FFE6F48
7FFE6EC8
7FFE6DC8
7FFE6B98
24B34
0xDBF48
0xDB938
1518A4
0xDF108
0xD0BD4
96B70
doadump+0xEC
panic_boot+354
boot+24
panic+0xF0
trap+0xA28
$call_trap+20
qenable+10
0x30
0x80
0x80
0x80
0x100

0x80

Mapping Assembly Language Locations to Source Code Lines

Once you know the instruction address location where the system panic or trap occurred, the troubleshooting step is to find where in the source code the panic or trap occurred. For panics, search the source code for the panic which uses the same string that was printed out when the kernel panicked. This will tell you exactly where the panic occurred in the source code. The method for traps is to use adb to print out the procedure in which the trap occurred in assembly language. Then, work backwards from the instruction address, looking for clues in the assembly instructions which will help pinpoint the corresponding location in the source. The most useful clue is a branch to another procedure. In PA-RISC, branches are done with the branch and link instruction, bl, and in assembly a branch will look like this:

bl      copen,rp   [1]

[1] a procedure call to copen()

or:

bl      creat+34,rp  (save_pn_info)  [1]

[1] a procedure call to save_pn_info()

By comparing the branches in the assembly code before and after the instruction where the trap occurred with the procedure calls in the source code, the corresponding source code line can often be determined. See the examples at the end of this chapter for more details.

Other useful assembly code landmarks are the use of the extru, extrs, zdep, and ldws instructions in checking and setting flag bits, and the use of the compare and branch instructions, comb, combf, combt, comib, comibf, and comibt, to implement if statements. For example, the ioctl() source code:

if ((fp->f_flag & (FREAD|FWRITE)) == 0)

is implemented by the assembly code:

ioctl+60:       ldws    0(r8),r13                [1]
ioctl+64: extru r13,1F,2,r14 [2]
ioctl+68: comibf,=,n 0,r14,ioctl+80 [3]

[1] Load from memory address pointed to by r8, into r13.

[2] Extract 2 bits from r13, starting at bit 1F, place bits in r14.

[3] If r14 is not zero, branch to ioctl+0x80.

In the example above, fp is in r8. If fp were null, a trap type 15 would occur at ioctl+60, when attempting to load off of a null pointer.

For more information about PA-RISC assembly language, see the Assembly Language Reference Manual (part number 92432-90001), the PA-RISC 1.1 Architecture and Instruction Set Reference Manual (part number 09740-90039), or the PA-RISC Procedure Calling Conventions Reference Manual (part number 09740-90015).

Obtaining Procedure Argument Values

It is often useful in debugging a problem to know what parameter values a procedure in the stack trace was called with. For example, in the following stack trace it would be useful to know the arguments flushq() was called with.

panic+30:       addil   -1000,dp
trap+0xADC: b trap+1004
$call_trap+20: rsm 1,r0
flushq+60: ldbs 0xD(r21),r22
q_free+1C: ldw -0xA4(sp),r31

Obtaining the First Four Arguments

Arguments 0 through 3 are passed from the calling procedure to the called procedure by loading the values into registers 23 - 26. These registers are also known as arg0, arg1, arg2, and arg3. For example, here is bmap() preparing to call realloccg() by moving realloccg()'s arguments from the registers they are in to the argument registers by doing an or on the source registers with r0, which is always zero:

bmap+16C:       or      r10,r0,arg1
bmap+170: or ret0,r0,arg2
bmap+174: or r8,r0,arg3
bmap+178: or r4,r0,arg0
bmap+17C:

Next, here is flushq() preparing to call rmvq() by loading arg0 and arg1 from its stack frame. Note that arg1 gets loaded in the delay slot of the branch instruction bl. See the Assembly Language Reference Manual or the PA-RISC 1.1 Architecture and Instruction Set Reference Manual for more information on branch delay slots.

flushq+0xE0:    ldw     -64(sp),arg0
flushq+0xE4: bl rmvq,rp
flushq+0xE8: ldw -34(sp),arg1

After allocating its stack frame and saving any callee save registers, the called procedure will usually load the argument registers into some of the callee save registers that it just saved the values of. For example, here is realloccg() saving the contents of the callee save registers r3 - r10 and loading arg0 - arg3 into some callee save registers.

realloccg:      stw     rp,-14(sp)
realloccg+4: stwm r3,80(sp)
realloccg+8: stw r4,-7C(sp)
realloccg+0xC: stw r5,-78(sp)
realloccg+10: stw r6,-74(sp)
realloccg+14: stw r7,-70(sp)
realloccg+18: stw r8,-6C(sp)
realloccg+1C: stw r9,-68(sp)
realloccg+20: stw r10,-64(sp)
realloccg+24: or arg0,r0,r3
realloccg+28: or arg1,r0,r6
realloccg+2C: or arg2,r0,r7
realloccg+30: or arg3,r0,r4

Here is rmvq() storing its arguments away in its stack frame:

rmvq:           stw     rp,-14(sp)
rmvq+4: ldo 80(sp),sp
rmvq+8: stw arg0,-0xA4(sp)
rmvq+0xC: stw arg1,-0xA8(sp)

If the arguments were put into callee save registers, the next procedure up in the stack trace will save these registers in its stack frame. You can retrieve these values from the stack. If the arguments are stored on the stack frame, you can also retrieve them from the stack. But first you must make sure that the contents of the callee save registers or the stack frame locations you are interested in were not modified between the time the arguments were loaded at the beginning of the procedure and the time the next procedure call on the stack trace took place. The easiest way to determine this is to have adb print out the assembly code for the procedure into a file and use an editor such as vi to find all references to the register between the beginning of the procedure and the branch to the next procedure in the stack trace. If none of these references modify the register, the value which the next procedure has saved in its stack frame is valid.

To print the assembly of a procedure to a file using adb:

$>filename             [1]
procedure,100/ia [2]
$> [3]

[1] Tell adb to direct stdout to the file filename. There should be no space between $> and the filename.

[2] Print the first 0x400 instructions of procedure.

[3] Set stdout back to the terminal.

Now, edit filename, and search for all instances of the register or stack frame location of interest. Any instruction which would modify the contents of the register could potentially overwrite the information you are trying to get. Below are some examples of modifying instructions. Note that in all cases the register being modified, also known as the target register, is the last register in the instruction.

ldw     10(r3),r4       will overwrite r4
ldhs 4(r3),rp will overwrite rp
ldo -1(r20),r22 will overwrite r22
ldwx r31(arg3),r21 will overwrite r21
or r3,r0,arg0 will overwrite arg0
extrs ret1,1F,10,r21 will overwrite r21
zdep r20,1A,1B,r31 will overwrite r31
sub r31,arg1,r31 will overwrite r31
sh3add arg1,r0,r31 will overwrite r31
stw r19,-38(sp) will overwrite memory location sp - 0x38

Sometimes an instruction which modifies the register of interest can appear to occur between the beginning of the procedure and the call to the next procedure in the stack because of how the assembly code is laid out. However, the modifying instruction actually would not have been executed because it was part of a conditional code path that was not taken. For example, this C code from ioctl():

if ((fp->f_flag & (FREAD|FWRITE)) == 0) {
u.u_error = EBADF;
return;
}

compiles into this assembly:

ioctl+60:       ldws    0(r8),r13
ioctl+64: extru r13,1F,2,r14
ioctl+68: comibf,=,n 0,r14,ioctl+80
ioctl+6C: ldw 68(r3),r19
ioctl+70: ldo 9(r0),r21
ioctl+74: sth r21,312(r19)
ioctl+78: b ioctl+7F0
ioctl+7C: ldw -1D4(sp),rp
ioctl+80: ldws 4(r5),r7

If the if statement is false, the branch at ioctl+68 is taken, and instruction ioctl+6C is never executed because the ,n in ioctl+68 causes the instruction in the branch delay slot to be nullified, or not executed. ioctl+70 through ioctl+7c are never executed because the branch at ioctl+68 branches past these instructions to ioctl+80. If ioctl+6c through ioctl+7C had been executed, r19, r21, and rp would have been modified.

Suppose you have determined that the procedure whose arguments you are interested in does not modify the registers it loaded the arguments into before the next procedure call in your stack. You can look at the appropriate location in the stack frame of the next procedure call in the stack to get the value. For example, if a routine whose registers you are interested in has called panic, you look at the beginning of panic's assembly to see which callee save registers it saves in its stack.

panic:          stw     rp,-14(sp)
panic+4: stwm r3,40(sp)
panic+8: stw r4,-3C(sp)
panic+0xC: stw r5,-38(sp)
panic+10: stw r6,-34(sp)

Obtain panic's sp by manual stack back-tracing, and then r3 is at sp - 0x40, r4 at sp - 0x3C, and so on.

Obtaining Arguments 5 through N

Only the first four arguments to a procedure are passed via registers. Any remaining arguments are pushed onto the calling procedure's stack frame, where the called procedure will retrieve them. If you have the calling procedure's sp you can use adb to get the values of the arguments. For example, symlink() calls lookuppn(), which has six arguments. Here is the assembly code which sets up the six arguments:

symlink+40:     stw     r4,-34(sp)
symlink+44: stw r3,-38(sp)
symlink+48: ldo -3C(sp),arg2
symlink+4C: ldo -9C(sp),arg0
symlink+50: or r0,r0,arg1
symlink+54: bl rename+34,rp (lookuppn)
symlink+58: or r0,r0,arg3

If you want to get the fifth argument, you see that symlink() places it in its stack frame at sp - 0x34. Argument 5 is at -0x34 because the procedure calling convention specifies that arguments get placed in the stack frame in reverse order, so arg6 is at sp - 0x38, just above arg5, and if lookuppn() had seven arguments, arg7 would be placed at sp - 0x3C. If you know symlink()'s sp from doing a manual stack back-trace, you can use it to get the value of argument 5:

7FFE6B98-0x34/X
7FFE6B64: 2D7298 # adb's response

Obtaining Register Contents from Trap save_state or panic_save_state Areas

If the system core dump was produced by a panic or a trap, copies of all the registers at the time of the trap or panic were saved in memory and are available in the core dump. For a trap, the registers are saved on the stack, in the order specified in the struct save_state, which is defined in /usr/include/machine/save_state.h. For a panic, the registers are saved in a statically allocated memory location called panic_save_state, in the order specified in the struct rpb, which is defined in /usr/include/machine/rpb.h. See the examples at the end of this chapter for details of how to access registers in the trap save_state area. The mechanics of accessing panic_save_state fields are similar, though the offsets into the save area are different. For example, if you want to get r3 out of the panic_save_state area, look at /usr/include/machine/rpb.h and note that the field rp_gr3 is the sixth word in struct rpb. Therefore, it can be found at panic_save_state + 5 words == panic_save_state + 0x14.

Not all registers in these save areas are guaranteed to be the same as at the time of the panic or trap, because some registers must be used by the system to execute the panic or trap path and save away the other registers. Registers which may not be preserved are r1, r19 - r22, r31, arg0, arg1, arg2, and arg3. Use your judgment with the contents of these registers in the save areas. If they look odd, they may have been overwritten.

If your stack trace includes a call to trap(), it will also have a call to panic() higher up (later in time) than the trap. In this case, it is safer to look in the trap save_state structure on the stack than the panic_save_state area for registers you are curious about, because the trap saved the registers closer in time to when the problem which caused the system crash occurred.

Obtaining Important Kernel Global Variables

To print out the value of a kernel global variable, simply use the symbol name with the appropriate formatting option (see adb(1) and the ADB Tutorial for more information). The following table lists some of the more interesting kernel globals, with the appropriate adb format for printing them, and brief descriptions of what they mean.

adb Command

Description

msgbuf+0xc/sD

Kernel's circular printf buffer.

freemem/D

Amount of free memory, in pages. If zero or a small number, system is out of memory.

physmem/D

Size of physical memory, in pages.

maxfree/D

Number of free pages soon after system boot.

desfree/D

Number of free pages the system tries to keep available.

minfree/D

Minimum free pages before system starts swapping processes out.

avefree/D

Average number of free pages over past 5 seconds.

avefree30/D

Average number of free pages over past 30 seconds.

freemem_cnt/D

Number of processes currently waiting for memory. If large number, many processes are stopped waiting for memory.

avenrun/3F

System load average, for the last one minute, five minutes, and 10 minutes, in floating point notation. If large numbers, system may be too heavily loaded.

lbolt/X

Seconds since boot.

time/Y

Current time, printed out in ctime(3C) format.

_release_version/s

HP-UX version string.

utsname+0x9/s

System hostname.

utsname+0x12/s

HP-UX release number.

utsname+0x24/s

System hardware model number.

Obtaining Values from the Process Table Entry and User Area

It is possible to use adb to print out fields of interest from the process table entry and user area of the process that was running when the system crashed. The following subsection describes how to print certain important fields and gives a very brief description of each field. For more information on the meaning of these fields, see The Design of the UNIX Operating System by Maurice Bach, pub. Prentice-Hall, or The Design and Implementation of the 4.3 BSD UNIX Operating System by Leffler, McKusick, Karels and Quarterman, pub. Addison-Wesley.

adb, when called with the -k option, should print out the address of the user area and process table entry of the process that was running when the system crashed. adb will print this out when it is first entered, so the first output you should see from adb is:

u 7FFE6000 u.u_procp 4D2F20

u is the location of the user area, and should always be at virtual address 7FFE6000. When the kernel switches to a new process, it always maps the physical address of the process' user area to virtual address 7FFE6000. u.u_procp is the location of this process' process table entry. This address will vary from process to process. If adb does not print the u and u.u_procp values on entry, it was unable to determine the currently running process at crash time. adb was unable to print these values probably because your core dump was the result of a Transfer of Control (TOC).

If the process that caused the panic was running on the Interrupt Control Stack (ICS), the u and u.u_procp pointers will not contain valid information for the process. When an interrupt occurs the kernel executes the appropriate kernel code to process the interrupt without switching to a new user context. The u and u_procp address which adb will print will be the process that was running when the interrupt occurred. The interrupt interrupted the running of that process in order to process the interrupt. Look at the panic message in msgbuf to tell if the panic occurred while on the ICS. If you see a message like the following after the hex stack trace, the process was on the ICS.

NOT sync'ing disks (on the ICS) (0 buffers to flush):

Important User Area Fields

The table below describes the adb command to use to print important user area fields. u means the value marked u printed on adb entry (see example above). When executing the adb commands in the table below, substitute the u value printed on adb entry for the letter u.

Field Name

Address

Description

u_procp

u+0x258/X

Pointer to process table entry.

u_comm

u+0x260/s [Series 700] u+0x264/s [Series 800]

Name of command used to start this process. For STREAMS/UX, this is usually strsched.

u_arg

u+0x270/10X [Series 700] u+0x274/10X [Series 800]

Arguments to current system call. For STREAMS/UX service routines being run by strsched, these should all be zero.

For example, to print u_comm, given the adb entry printout u 7FFE6000 u.u_procp 4D2F20, type:

0x7FFE6000+0x260/s

See /usr/include/sys/user.h for more information on fields in the user area. These offset values are for HP-UX release 10.0, and may change from release to release.

Important Process Table Fields

The table below describes the adb command to use to print important process table fields. p means the value marked u.u_procp printed on adb entry (see example above). When executing the adb commands in the table below, substitute the u.u_procp value printed out on adb entry for the letter p. For example, to print out p_flag, given the adb entry printout at the beginning of this section, type:

0x4D2F20+0x20/X

See /usr/include/sys/proc.h for more information on fields in the proc structure. These offset values are for HP-UX release 10.0, and may change from release to release.

Field Name

Address

Description

p_flag

p+0x20/X [Series 700] p+0xc/X [Series 800]

per-process flags, see proc.h

p_flag2

p+0x24/X [Series 700] p+0x48/X [Series 800]

per-process flags, see proc.h

p_mpflag

p+0x10/X [Series 800 only]

per-process flags, see proc.h

p_stat

p+0xc/b [Series 700] p+0x32/b [Series 800]

current process state, see proc.h

p_uid

p+0x2c/D [Series 700] p+0x0x50/D [Series 800]

real user id, used to direct tty signals

p_suid

p+0x30/D [Series 700] p+0x54/D [Series 800]

set effective uid

p_pid

p+0x38/D [Series 700] p+0x5c/D [Series 800]

process id

p_ppid

p+0x3c/D [Series 700] p+0x60/D [Series 800]

process id of parent

p_pgrp

p+0x34/D [Series 700] p+0x58/D [Series 800]

process id of process group leader

p_wchan

p+0x40/X [Series 700] p+0x1c/X [Series 800]

event process is sleeping on should be zero if currently running

p_sleeptime

p+0x24/X [Series 800 only]

time of last sleep or wakeup (in seconds)

p_cptickstotal

p+0x4c/X [Series 700] p+0x14/X [Series 800]

cpu ticks (total for life of process)

p_cursig

p+0xe/b [Series 700] p+0x34/b [Series 800]

number of current pending signal, if any

p_sig

p+0x10/X [Series 700] p+0x38/X [Series 800]

signals pending to this process

p_sigmask

p+0x14/X [Series 700] p+0x3c/X [Series 800]

current signal mask

p_sigignore

p+0x18/X [Series 700] p+0x40/X [Series 800]

signals being ignored

p_sigcatch

p+0x1c/X [Series 700] p+0x44/X [Series 800]

signals being caught by user

© 1995 Hewlett-Packard Development Company, L.P.