HPlogo STREAMS/UX for the HP 9000 Reference Manual > Chapter 6 Debugging STREAMS/UX Modules and Drivers

HP-UX Kernel Debugging Tools and strdb

» 

Technical documentation

Complete book in PDF

 » Table of Contents

 » Index

The strdb tool can be used in conjunction with other standard HP-UX kernel debugging tools to provide STREAMS/UX-specific information and data formatting. Generally, if your system is running normally except for STREAMS/UX, it is recommended that you use strdb to debug the problem. If your system panics or hangs, strdb can be used on the resulting system core dump, along with adb to diagnose the problem. strdb is documented earlier in this chapter, and examples of using adb and strdb together are given at the end of this chapter.

What Is a System Panic?

Unlike user code, programming errors in kernel code can cause system panics. A system panic will result in a panic message to the console. Also, a system core dump will be generated. This is a copy of physical memory at the time of the panic. The panic message and core dump can be examined using adb and strdb to determine the cause of the panic.

There are three main categories of panics. The first category is when a kernel routine calls panic() because of a system inconsistency from which it cannot recover. In this case, the panic message contains a string from the routine that called panic(), explaining why panic was called. In the example below, the panic string is "ifree: freeing free inode." A hexadecimal stack trace will also be printed. Interpreting the stack trace will be described later.

System Panic:
@(#)9245XA HP-UX (A.10.00) #1: Wed Sep 28 15:47:13 PDT 1994
panic: (display==0xb000, flags==0x0) ifree: freeing free inode

PC-Offset Stack Trace (read across, most recent is 1st):
0x0014766c 0x001480b0 0x000b3a38 0x000b411c 0x000b3b78 0x000b76
5c
0x000b10d8 0x000aefd0 0x0001c500
End Of Stack

The second category is the occurrence of a kernel level trap or exception condition. These usually involve virtual memory and are described below. A hexadecimal stack trace is also printed.

The third is the occurrence of a High Priority Machine Check (HPMC), which usually indicates a hardware problem. An HPMC is characterized by a total, sudden system halt and an HPMC "tombstone" printed on the console, which records the contents of the system's registers. If you encounter an HPMC, contact your HP service representative. Note that an HPMC tombstone is also printed out after a TOC (Transfer of Control -- see "Transfer of Control In Case of System Hang" for details). There is no need to contact an HP representative for an HPMC tombstone that is the result of a TOC.

Traps

Some very common panics occur from either the trap routing or interrupt routing routines. Whenever this low level code detects a trap occurring in the system and it believes that it cannot be corrected, it will panic the machine. The most common faults are described below.

Data Segmentation Faults

Usually, a data segmentation fault occurs when a process (in kernel mode) attempts to dereference a null pointer. If you receive a data segmentation fault, information similar to the following will be printed on the system console:

trap type 15, pcsq.pcoq = 0.85b7c, isr.ior = 0.4
@(#)9245XA HP-UX (A.10.00) #0: Sat Aug 13 23:17:54 PDT 1994
panic: (display==0xbf00, flags==0x0) Data segmentation fault

pcsq.pcoq is the current instruction address, and isr.ior is the current data address. This trap message means that the instruction at location 0x85b7c tried to reference address 4 in space 0. You could look in adb to see what the instruction was trying to do. The instruction may have been attempting to get a value 4 bytes off of some pointer. Because of a possible logic problem, the pointer might not have been initialized.

Instruction Page Faults

An instruction page fault occurs when a process in kernel mode jumps to an address which is not mapped, and tries to execute it. Because the page is not mapped, and the kernel is not paged, a fault is generated. This would appear as the following:

trap type 6 pcsq.pcoq = 0.0 isr.ior = 4.78
@(#)9245XA HP-UX (A.10.00) #0: Sat Aug 13 23:17:54 PDT 1994
panic: (display==0xbf00, flags==0x0) Instruction page fault

The pcsq.pcoq pair is important; the user attempted to jump to page zero and start executing. In this case, because the fault was an instruction page fault, the isr.ior pair is meaningless. The page fault may have occurred because of an indirect procedure call, where the address of the routine to be called was not initialized.

Protection Violations

A third common panic is the protection violation. This type of panic occurs when the kernel tries to reference a data structure that does not belong to the current process. This panic also occurs if the kernel attempts to reference an object in a way which is not permitted by the access rights assigned to the page where the object resides, for example, an attempt to write on a read-only page. Another frequently overlooked area of protection faults are unaligned access violations. These appear to be protection faults, but are caused by performing an operation on an unaligned address, for example, load word on a non-word aligned address. In each of these cases, trap type 18 or 7 would be generated. The pcsq.pcoq pair would give the offending instruction, and the isr.ior would give the offending data address referenced.

© 1995 Hewlett-Packard Development Company, L.P.