|
|
STREAMS/UX for the HP 9000 Reference Manual > Chapter 6 Debugging STREAMS/UX Modules and DriversUsing adb |
|
This section describes how to use adb on core dumps obtained following a system crash. See "Generating and Retrieving System Core Dumps" for information on how these dumps are obtained. adb can also be used to examine a system that is currently running. See the adb(1) man page or ADB Tutorial for more information. When using adb on a system core dump, you must use the "-k" option. This option will tell adb to treat the core dump as a system core dump instead of a user process core dump, which is organized differently. For example, to call adb on the dump pair vmcore.1 and vmunix.1, perform the following:
When using adb on a running HP-UX system, you also use the "-k" option, and use /stand/vmunix as the object file and /dev/mem as the core file:
You will probably need to be superuser to access /dev/mem. Because you are examining a running (and continuously changing) system, adb will not be able to set you up in any specific process context, but you will be able to examine kernel global variables. adb maintains a set of registers corresponding to the registers of the machine. The adb command $r will print out the values of these registers. When adb is invoked on a system core with the -k option, it sets these registers to the values of the machine registers at the time the system core dump was taken. These register values are not the values the registers contained at the point the panic or trap occurred. Instead, they are the values the registers contained at the time the kernel started dumping a copy of physical memory to the swap area. How to use these "dump time" register values to determine the state of the registers at the time the trap or panic occurred will be described later. These "panic time" register values enable the user to examine the context of the process that was running at the time of the system crash. If the system core dump is from a transfer of control (TOC) of a hung system, adb will be unable to determine the "dump time" or "panic time" register values. In these cases, adb can still be used to determine the contents of the kernel message buffer (see "Finding the Panic Message"), and to examine kernel global variables (see "Obtaining Important Kernel Global Variables"), but it will not be able to give you a stack trace or context for the process that was running at the time of the system crash. It is especially important, when looking at a dump from a system which appeared to be hung, to check the kernel globals freemem, freemem_cnt, and avenrun. These variables may indicate that your system was out of memory or was overloaded. (See "Obtaining Important Kernel Global Variables" for more information.) It can also be helpful, before doing a TOC on a system which appears hung, to determine how complete the system paralysis is. The following table describes hang symptoms, from the least severe to the most severe. This table may help you determine where your system fits on this continuum.
The kernel maintains a circular message buffer into which text can be printed using the kernel printf, msg_printf, and cmn_err routines. At the time of a panic, a panic message is printed to this buffer. A stack trace consisting of instruction addresses in hexadecimal is also printed out, as well as the current instruction and data addresses being accessed at the time of the crash. Other interesting information may also be located in the buffer, such as system boot-up messages and kernel error messages that may help pin down the cause of the panic. To print out this buffer, invoke adb on the system dump and type the following:
Examples of msgbuf contents are included in the examples at the end of this chapter. adb can be used to translate the hexadecimal stack trace printed after the panic message into procedure addresses. For each hexadecimal number in the stack trace, use the adb i command to determine where in the kernel the address occurs. For example, the hex stack trace below can be deciphered as follows:
In adb (text preceded by "#" are comments):
You may need to use adb to manually back-trace your stack. This is necessary when the hexadecimal stack trace printed by panic is incomplete. For example, panic may print a few hex addresses and then the message:
or
You may also need to do a manual stack back-tracing if you wish to find out how the arguments the routines in your stack trace were called. You will need the value of the stack pointer for each routine in the stack and manual stack back-tracing will tell you these values. The following is a very brief overview of the PA-RISC procedure calling convention. More information can be obtained from the PA-RISC Procedure Calling Conventions Reference Manual. PA-RISC machines have 32 general use registers. These registers are identical physically, but are assigned different roles by the PA-RISC operating systems and compilers in order to enable procedure calls to take place efficiently and consistently. The following table lists these special roles: Table 6-1 General Use Register Roles
The only registers you need to be concerned with for manual stack back-tracing are r2 (rp) and r30 (sp), although the other registers become important when trying to determine what arguments a procedure in the trace was called with. In order to implement these register roles, at the start of each procedure a stack frame is allocated and callee save registers which the called procedure is planning to modify are stored in the stack frame. The stack frame is allocated simply by incrementing the sp by the size of the stack frame needed, using either the stwm or ldo instruction. For example, below are the instructions which create the stack frame for ioctl. Numbers in brackets ([ ]) refer to the notes below.
[1] Store return instruction address at 0x14 above the caller's stack pointer. Note that the return address is stored in the caller's stack frame, not the callee's stack frame. [2] Store the contents of r3 at the current sp, then allocate the stack frame by adding 0x100 to sp. The stwm instruction stands for store word and modify. [3] Store the contents of r4 at sp - 0xFC, just below where you stored r3. [4] Store the contents of r5 at sp - 0xF8, just below where you stored r4. [5] Store the contents of r6 at sp - 0xF4, just below where you stored r5. The instruction ldo (load offset) can be used instead of stwm for allocating the stack. For example:
[1] Store return instruction address in caller's stack frame. [2] Add 0x30 to the current value in register sp and store the result in sp, allocating stack frame. Given the stack pointer, sp, and the current instruction address, pcoqh, it is possible to get the previous stack pointer and instruction address. The starting values for sp and pcoqh are obtained from the adb $r command. As mentioned above, when adb is invoked on a system core with the -k option, it sets these registers to the values of the machine registers at the time the system core dump was taken. The $r command prints out these registers. Below are the first few lines of the $r display.
There are four steps to back-tracing a stack:
Notice that the $r command has already indicated that rp corresponds to panic_boot+354. To continue back-tracing the stack, iterate the four steps shown above. Here is the adb sequence of commands and responses to trace the next two levels back in this stack. Text preceded by "#" are comments.
If you are doing a manual stack back-trace in order to find out values of registers which have been pushed onto the stack, it is useful to save the results of the four steps at each iteration for future reference. A table such as the following can be helpful:
The four basic steps of stack back-tracing have some exceptions:
The table of results from the back-tracing so far should look like this:
Once you know the instruction address location where the system panic or trap occurred, the troubleshooting step is to find where in the source code the panic or trap occurred. For panics, search the source code for the panic which uses the same string that was printed out when the kernel panicked. This will tell you exactly where the panic occurred in the source code. The method for traps is to use adb to print out the procedure in which the trap occurred in assembly language. Then, work backwards from the instruction address, looking for clues in the assembly instructions which will help pinpoint the corresponding location in the source. The most useful clue is a branch to another procedure. In PA-RISC, branches are done with the branch and link instruction, bl, and in assembly a branch will look like this:
[1] a procedure call to copen() or:
[1] a procedure call to save_pn_info() By comparing the branches in the assembly code before and after the instruction where the trap occurred with the procedure calls in the source code, the corresponding source code line can often be determined. See the examples at the end of this chapter for more details. Other useful assembly code landmarks are the use of the extru, extrs, zdep, and ldws instructions in checking and setting flag bits, and the use of the compare and branch instructions, comb, combf, combt, comib, comibf, and comibt, to implement if statements. For example, the ioctl() source code:
is implemented by the assembly code:
[1] Load from memory address pointed to by r8, into r13. [2] Extract 2 bits from r13, starting at bit 1F, place bits in r14. [3] If r14 is not zero, branch to ioctl+0x80. In the example above, fp is in r8. If fp were null, a trap type 15 would occur at ioctl+60, when attempting to load off of a null pointer. For more information about PA-RISC assembly language, see the Assembly Language Reference Manual (part number 92432-90001), the PA-RISC 1.1 Architecture and Instruction Set Reference Manual (part number 09740-90039), or the PA-RISC Procedure Calling Conventions Reference Manual (part number 09740-90015). It is often useful in debugging a problem to know what parameter values a procedure in the stack trace was called with. For example, in the following stack trace it would be useful to know the arguments flushq() was called with.
Arguments 0 through 3 are passed from the calling procedure to the called procedure by loading the values into registers 23 - 26. These registers are also known as arg0, arg1, arg2, and arg3. For example, here is bmap() preparing to call realloccg() by moving realloccg()'s arguments from the registers they are in to the argument registers by doing an or on the source registers with r0, which is always zero:
Next, here is flushq() preparing to call rmvq() by loading arg0 and arg1 from its stack frame. Note that arg1 gets loaded in the delay slot of the branch instruction bl. See the Assembly Language Reference Manual or the PA-RISC 1.1 Architecture and Instruction Set Reference Manual for more information on branch delay slots.
After allocating its stack frame and saving any callee save registers, the called procedure will usually load the argument registers into some of the callee save registers that it just saved the values of. For example, here is realloccg() saving the contents of the callee save registers r3 - r10 and loading arg0 - arg3 into some callee save registers.
Here is rmvq() storing its arguments away in its stack frame:
If the arguments were put into callee save registers, the next procedure up in the stack trace will save these registers in its stack frame. You can retrieve these values from the stack. If the arguments are stored on the stack frame, you can also retrieve them from the stack. But first you must make sure that the contents of the callee save registers or the stack frame locations you are interested in were not modified between the time the arguments were loaded at the beginning of the procedure and the time the next procedure call on the stack trace took place. The easiest way to determine this is to have adb print out the assembly code for the procedure into a file and use an editor such as vi to find all references to the register between the beginning of the procedure and the branch to the next procedure in the stack trace. If none of these references modify the register, the value which the next procedure has saved in its stack frame is valid. To print the assembly of a procedure to a file using adb:
[1] Tell adb to direct stdout to the file filename. There should be no space between $> and the filename. [2] Print the first 0x400 instructions of procedure. [3] Set stdout back to the terminal. Now, edit filename, and search for all instances of the register or stack frame location of interest. Any instruction which would modify the contents of the register could potentially overwrite the information you are trying to get. Below are some examples of modifying instructions. Note that in all cases the register being modified, also known as the target register, is the last register in the instruction.
Sometimes an instruction which modifies the register of interest can appear to occur between the beginning of the procedure and the call to the next procedure in the stack because of how the assembly code is laid out. However, the modifying instruction actually would not have been executed because it was part of a conditional code path that was not taken. For example, this C code from ioctl():
compiles into this assembly:
If the if statement is false, the branch at ioctl+68 is taken, and instruction ioctl+6C is never executed because the ,n in ioctl+68 causes the instruction in the branch delay slot to be nullified, or not executed. ioctl+70 through ioctl+7c are never executed because the branch at ioctl+68 branches past these instructions to ioctl+80. If ioctl+6c through ioctl+7C had been executed, r19, r21, and rp would have been modified. Suppose you have determined that the procedure whose arguments you are interested in does not modify the registers it loaded the arguments into before the next procedure call in your stack. You can look at the appropriate location in the stack frame of the next procedure call in the stack to get the value. For example, if a routine whose registers you are interested in has called panic, you look at the beginning of panic's assembly to see which callee save registers it saves in its stack.
Obtain panic's sp by manual stack back-tracing, and then r3 is at sp - 0x40, r4 at sp - 0x3C, and so on. Only the first four arguments to a procedure are passed via registers. Any remaining arguments are pushed onto the calling procedure's stack frame, where the called procedure will retrieve them. If you have the calling procedure's sp you can use adb to get the values of the arguments. For example, symlink() calls lookuppn(), which has six arguments. Here is the assembly code which sets up the six arguments:
If you want to get the fifth argument, you see that symlink() places it in its stack frame at sp - 0x34. Argument 5 is at -0x34 because the procedure calling convention specifies that arguments get placed in the stack frame in reverse order, so arg6 is at sp - 0x38, just above arg5, and if lookuppn() had seven arguments, arg7 would be placed at sp - 0x3C. If you know symlink()'s sp from doing a manual stack back-trace, you can use it to get the value of argument 5:
If the system core dump was produced by a panic or a trap, copies of all the registers at the time of the trap or panic were saved in memory and are available in the core dump. For a trap, the registers are saved on the stack, in the order specified in the struct save_state, which is defined in /usr/include/machine/save_state.h. For a panic, the registers are saved in a statically allocated memory location called panic_save_state, in the order specified in the struct rpb, which is defined in /usr/include/machine/rpb.h. See the examples at the end of this chapter for details of how to access registers in the trap save_state area. The mechanics of accessing panic_save_state fields are similar, though the offsets into the save area are different. For example, if you want to get r3 out of the panic_save_state area, look at /usr/include/machine/rpb.h and note that the field rp_gr3 is the sixth word in struct rpb. Therefore, it can be found at panic_save_state + 5 words == panic_save_state + 0x14. Not all registers in these save areas are guaranteed to be the same as at the time of the panic or trap, because some registers must be used by the system to execute the panic or trap path and save away the other registers. Registers which may not be preserved are r1, r19 - r22, r31, arg0, arg1, arg2, and arg3. Use your judgment with the contents of these registers in the save areas. If they look odd, they may have been overwritten. If your stack trace includes a call to trap(), it will also have a call to panic() higher up (later in time) than the trap. In this case, it is safer to look in the trap save_state structure on the stack than the panic_save_state area for registers you are curious about, because the trap saved the registers closer in time to when the problem which caused the system crash occurred. To print out the value of a kernel global variable, simply use the symbol name with the appropriate formatting option (see adb(1) and the ADB Tutorial for more information). The following table lists some of the more interesting kernel globals, with the appropriate adb format for printing them, and brief descriptions of what they mean.
It is possible to use adb to print out fields of interest from the process table entry and user area of the process that was running when the system crashed. The following subsection describes how to print certain important fields and gives a very brief description of each field. For more information on the meaning of these fields, see The Design of the UNIX Operating System by Maurice Bach, pub. Prentice-Hall, or The Design and Implementation of the 4.3 BSD UNIX Operating System by Leffler, McKusick, Karels and Quarterman, pub. Addison-Wesley. adb, when called with the -k option, should print out the address of the user area and process table entry of the process that was running when the system crashed. adb will print this out when it is first entered, so the first output you should see from adb is:
u is the location of the user area, and should always be at virtual address 7FFE6000. When the kernel switches to a new process, it always maps the physical address of the process' user area to virtual address 7FFE6000. u.u_procp is the location of this process' process table entry. This address will vary from process to process. If adb does not print the u and u.u_procp values on entry, it was unable to determine the currently running process at crash time. adb was unable to print these values probably because your core dump was the result of a Transfer of Control (TOC). If the process that caused the panic was running on the Interrupt Control Stack (ICS), the u and u.u_procp pointers will not contain valid information for the process. When an interrupt occurs the kernel executes the appropriate kernel code to process the interrupt without switching to a new user context. The u and u_procp address which adb will print will be the process that was running when the interrupt occurred. The interrupt interrupted the running of that process in order to process the interrupt. Look at the panic message in msgbuf to tell if the panic occurred while on the ICS. If you see a message like the following after the hex stack trace, the process was on the ICS.
The table below describes the adb command to use to print important user area fields. u means the value marked u printed on adb entry (see example above). When executing the adb commands in the table below, substitute the u value printed on adb entry for the letter u.
For example, to print u_comm, given the adb entry printout u 7FFE6000 u.u_procp 4D2F20, type:
See /usr/include/sys/user.h for more information on fields in the user area. These offset values are for HP-UX release 10.0, and may change from release to release. The table below describes the adb command to use to print important process table fields. p means the value marked u.u_procp printed on adb entry (see example above). When executing the adb commands in the table below, substitute the u.u_procp value printed out on adb entry for the letter p. For example, to print out p_flag, given the adb entry printout at the beginning of this section, type:
See /usr/include/sys/proc.h for more information on fields in the proc structure. These offset values are for HP-UX release 10.0, and may change from release to release.
|
|