HP 3000 Manuals

Identifying Wait States and Areas for Optimization [ HP SPT/XL User's Manual: Analysis Software ] MPE/iX 5.0 Documentation


HP SPT/XL User's Manual: Analysis Software

Identifying Wait States and Areas for Optimization 

The life cycle of a process in MPE involves three fundamental states:

 *  Using CPU time.

 *  Waiting for CPU time, in priority order.

 *  Waiting for event completion--waiting for disk I/O, memory, and
    locks.

Many MPE XL applications spend a majority of time in the first two
states, either using or waiting for the CPU. These are commonly referred
to as CPU-intensive applications.  For these applications profiling
should concentrate on the dominant transaction and on the intrinsics that
are the primary contributors to use of CPU time.

But there will be instances where an application is spending significant
time waiting for an event to be completed.  Factors such as programming
techniques and locality of data can contribute to these wait states.
Time spent here is an area for potential improvement, because achieving a
CPU-intensive application is a fundamental goal of program tuning.

The following are examples of waiting for event completion:

 *  MPE XL lets a process request that logical file-writes be considered
    incomplete until physically written to disk using the MPE File System
    Intrinsic FCONTROL. While this technique is valid, it works against
    the design of fundamental parts of MPE XL, and results in programs
    spending increased amounts of time waiting for disk I/O writes to be
    completed.

 *  Database loading characteristics that increase the randomness with
    which data is accessed by a program make it more difficult for MPE/XL
    to fetch required data--a program may thus experience a higher degree
    of waiting for disk I/O reads to complete.

 *  The amount of time that programs spend waiting for locks is largely
    due to the implementation of locking strategies for programs which
    share access to MPE files, KSAM files, or TurboIMAGE databases, and
    for explicit use of Resource Identification Numbers (RINs.)

    If two or more processes sharing a common resource use an
    inconsistent locking strategy or apply locks around indefinite events
    such as terminal reads, they spend increasing amounts of their
    transaction processing time waiting for locks to be applied.


NOTE An important part of application tuning is to maximize the CPU-intensive nature of a program by minimizing the time spent waiting for disk I/O, waiting for memory, and waiting for locks.
Is Processing Time Spent Efficiently? PROCESS_ORDER_LINE's Transaction Summary Information in figure 4-8 shows: * The transaction is only 47.6 percent CPU intensive. The other 52.4 percent of processing time is spent waiting for event completion. * Of the 52.4 percent of time this transaction spends waiting for event completion, 24.3 percent is waiting for disk I/O. * No time was spent waiting for memory or locks. (This can be expected when data is collected in a single-user environment.)
[]
Figure 4-8. Transaction Summary Information
NOTE Our goal is to reduce the time spent in wait states to insignificant levels. Since a transaction that shares hardware and software resources always spends some of its time waiting for disk I/O, memory, and locks, these are our primary areas of concentration.
Let's take a look at the 24.3 percent of time PROCESS_ORDER_LINE spent waiting for disk I/O. Waiting for Disk I/O A good starting point for isolating the cause of wait for disk I/O is to study the Transaction Intrinsic Profile as shown in figure 4-9. It identifies the three intrinsics where the wait states dominate. For example, although FCONTROL contributes only 14 percent of this transaction's processing time, the histogram shows that most of this time is spent waiting for event completion (* - * indicates wait states).
[]
Figure 4-9. Transaction Intrinsic Profile We look at Detailed Intrinsic Information for FCONTROL in figure 4-10 to obtain location and average timing information for unique intrinsic calls. It shows that FCONTROL was called 250 times from statement 11 within the procedure save_single_order_line, within the HPOE program file.
[]
Figure 4-10. Detailed Intrinsic Information Of the time spent within FCONTROL, 48.1 percent is spent waiting for disk I/O within the program file HPOE. This intrinsic's wait states suggest that FCONTROL's contribution to waiting for disk I/O may be the major contributor to the overall characteristic for PROCESS_ORDER_LINE. HP SPT/XL has given us a clue to identifying and correcting the problem. Reducing the Wait for Disk I/O. HP SPT/XL points us to a location or statement number in the Order Entry program code. Analyzing the Pascal compiler's information points the programmer (through the code offsets listing) to the statement containing the FCONTROL call. In this example, the FCONTROL call was an FCONTROL(2) which explicitly forces physical I/O completion. Examining the code around this call showed that PROCESS_ORDER_LINE saves each order line in a temporary work file that is summarized during the COMPLETE_ORDER transaction. Each FWRITE in our example is accompanied by an FCONTROL(2) call to force physical I/O completion. Physical posting is a valid way to preserve the integrity of critical files in case of an abnormal interruption, but it is expensive overhead. It increases the physical disk I/O and the time the calling process waits for disk I/O. Key parts of program tuning are identifying primary contributors to processing time and deciding if their functions are worth the overhead. We removed the FCONTROL(2) calls from Order Entry and monitored it again with HP SPT/XL, storing the collected data in a new logfile--SPTLOG3. Verify The Improvement. In the Transaction List in figure 4-11, transaction process times in SPTLOG3 are compared with the previous ones in SPTLOG2. It shows that the average processing time per transaction for PROCESS_ORDER_LINE is reduced from .496 seconds to .424 seconds as a result of removing the FCONTROL (2) calls.
[]
Figure 4-11. Transaction List Compare In the Transaction Summary Information screen in figure 4-12 shows that the modification reduced the waiting for disk I/O component from 24.3 percent (see figure 4-8) to 19.8 percent.
[]
Figure 4-12. Transaction Summary Information Again, it would be unwise to assume that a significant wait for disk I/O, characteristically has only a single cause. To continue this analysis, the Transaction Intrinsic Profile and Detailed Intrinsic Information screens could be reviewed to identify other potential contributors to the significant wait for the I/O component. The Transaction Intrinsic Profile from SPTLOG3 data would indicate that both DBGET and DBFIND calls within PROCESS_ORDER_LINE spend a significant proportion of their processing time in wait states. These should be investigated in a similar manner to that for FCONTROL.
NOTE In extreme cases, eliminating or reducing the wait for disk I/O for a particular transaction type could reduce its transaction processing time so much that it is no longer the dominant transaction in a program. As you tune your program, check the Transaction List to verify that the dominant transaction does not change.
Waiting for Locks Now that we've checked the dominant transaction for disk I/O waits, we'll study the effect of contention for software resources. MPE-based programs use two types of flags to guard the integrity of shared entities--locks and latches: * Locks are flags acquired and released directly by an application program's use of MPE or subsystem intrinsics, for example, FLOCK, DBLOCK, CKLOCK, LOCKLOCRIN, and LOCKGLORIN and their unlocking counterparts. * Latches are flags used by MPE and the subsystems to protect their internal data structures, such as the MPE directory and TurboIMAGE buffers. Latches are manipulated by MPE and subsystem calls to internal procedures and are not normally accessible to applications program code. The time a program or transaction waits for locks includes the time it spends waiting to acquire both locks and latches. As with disk I/O wait, the goal is to identify the causes of significant time waiting for locks and reduce their effect until a program's transaction throughput reaches desired levels. Most of this program tuning effort is limited to minimizing waits for explicit locking of entities controlled by program code, such as file, database, and RIN locks. The approach for examining wait for locks is similar to that used for checking wait for disk I/O. The primary difference affects the nature of the data collection. Excessive time spent in a Wait-for- disk state that results from inappropriate programming techniques or poor data locality can usually be detected by monitoring a single process. Wait for locks always implies competition for a common resource by multiple processes. Representative data suitable for investigation of locking contention can usually be acquired only through collection performed in a multi-user test or live environment. Comparing Single and Multi-User Data. The HP SPT/XL logfile, SPTLOG4, contains data collected from observing one of three concurrent order entry processes. This Transaction List in figure 4-13 compares transaction characteristics between single-user and multiple-user collection environments contained in SPTLOG3 and SPTLOG4, respectively.
[]
Figure 4-13. Transaction List Compare The PROCESS_ORDER_LINE transaction still dominates overall processing time. The processing time per transaction in a multiple-user environment--2.189 seconds average--is much higher than its single-user processing time--.424 seconds average--suggesting a contention problem. The Transaction Summary Information in figure 4-14 shows the degree to which PROCESS_ORDER_LINE waits for event completion. In this transaction, 71.1% of the processing time (2.189 seconds) is incurred waiting for locks.
[]
Figure 4-14. Transaction Summary Information Isolating the causes of waiting for locks due to application programming techniques is simpler because fewer intrinsics are involved than in isolating the causes of waiting for disk. The causes that may be responsible occur in pairs and include FLOCK/FUNLOCK, DBLOCK/DBUNLOCK, LOCKLOCRIN/UNLOCKLOCRIN, and LOCKGLORIN/UNLOCKGLORIN. The Transaction Intrinsic Profile for PROCESS_ORDER_LINE transactions (using SPTLOG4) is a good place to start the investigation into the cause of the time spent waiting for locks.
[]
Figure 4-15. Transaction Intrinsic Profile Figure 4-15 shows that 72.4% of the average processing time for PROCESS_ORDER_LINE is incurred in DBLOCK. The Detailed Intrinsic Information screen in figure 4-16 shows detailed data for the DBLOCK intrinsic within PROCESS_ORDER_LINE. DBLOCK, called from statement 17 within the procedure start_commit_single_order_line, averaged over 250 calls, and incurred 98.2% of an overall intrinsic processing time of 1.617 seconds waiting for locks. This shows a severe locking issue generated at a single location in the code.
[]
Figure 4-16. Detailed Intrinsic Information HP SPT/XL has pinpointed the main contribution to a locking problem. To further investigate, we'll use the Intrinsic Trace function to look at representative transactions. The trace shows the code's actual behavior, not only the likely behavior we might predict from studying the source code. Scrolling through the data, the trace shows both the DBLOCK call in figure 4-17 and the DBUNLOCK call in figure 4-18. The transaction spent .961 seconds using DBLOCK.
[]
Figure 4-17. Intrinsic Trace--DBLOCK
[]
Figure 4-18. Intrinsic Trace--DBUNLOCK Locking contention problems usually have either of two primary causes: * Inconsistent locking strategies between two or more processes that share a common resource. * Locking entities around indefinite events such as terminal reads. Detecting incompatible locking strategies may not be possible, as in our example, because we see only what the process was trying to lock, not why it was waiting or what was impeding it. For example, by using HP SPT/XL we may be able to detect that an application process waited a certain period of time to acquire a database lock, but we would not be able to identify that the wait was due to waiting for another process to relinquish a conflicting record level-lock. We can, however, get some clues about the problem by examining the transaction processing time of the events that were covered by a lock/unlock pair. Studying the intervening events shows that this locking pair covers two calls to VREADFIELDS, the VPLUS terminal read intrinsic. This intrinsic involves user "think time," and thus, could be an indefinite event. HP SPT/XL has given us a clue to the nature of the problem. Reducing the Wait for Locks. A study of the source code in this case showed that PROCESS_ORDER_LINE was imposing Mode-1 TurboIMAGE DBLOCKS (database level) around VPLUS terminal, reducing concurrent database access to a minimum. We reduced locking to the record level. As before, it is unwise to assume that all causes of wait for locks were attributable to a single source. To continue this analysis we would perform another collection and repeat the various stages of investigation, including checking for consistency and examining various wait states for the dominant transaction. Using/Waiting for CPU Profiling a CPU-intensive program calls for concentrating on the dominant transaction and the intrinsics that contribute to CPU transaction processing, and making some decisions: * Are all the intrinsic calls constituting the dominant transaction necessary? * Are there alternatives? * For the intrinsics that perform data access, would alternative characteristics for file, database structure, or loading help? For each case, we need to assess cost, effort, and risk. For example, changing database structure or loading characteristics would probably affect programs and transactions other than the one we're investigating, so they must be considered, too. Which Intrinsics Use the Most CPU Time?. We can look at the Transaction Intrinsic Profile for Order Entry's dominant transaction PROCESS_ORDER_LINE (in SPTLOG3), sorted by CPU time. In figure 4-19 we see 44.2% of the CPU time was consumed within VPLUS intrinsics for terminal reads and writes, VREADFIELDS 30.4% plus VSHOWFORM 13.8%. This would be the area of greatest potential optimization. Another 13.5% of the CPU time is consumed within TurboIMAGE access intrinsics (DBPUT, DBGET, DBFIND, and DBUPDATE).
[]
Figure 4-19. Transaction Intrinsic Profile (By CPU Time) * VPLUS There is no optimum way to call VREADFIELDS, since by default VSHOWFORM eliminates unnecessary painting of screens, field enhancements, error messages etc. In this case, t is unlikely that this contribution could be reduced without significantly rewriting the transaction. However, given nothing more than SPT data, we cannot be certain. At least the area of greatest potential return has been identified. VPUTIEEEREAL uses 9.9% of transaction CPU processing time. Intrinsics that perform data-type conversion are usually costly in CPU time per intrinsic--especially those converting floating-point values. They are prime candidates for alternative techniques. In addition, the same profile shows that 8.8% of the CPU time per transaction incurred within PROCESS_ORDER_LINE is spent in the VFIELDEDITS intrinsic. VPLUS allows editing specifications for data fields to be stored as an integral component of the forms file, as opposed to coding them within the application program. Consequently, application optimization may extend beyond source code to examining auxiliary components, such as, the complexity of VPLUS field editing contained in a VPLUS forms file. For this example, some economies in CPU time per transaction may be obtained by either eliminating unnecessary or redundant edits, or replacing general purpose VPLUS field editing by customized applications code. * Compatibility Mode You can also see that 10.0% of CPU time per transaction for PROCESS_ORDER_LINE is spent in the intrinsic HPSWITCHTOCM, which is called once per transaction. The majority of any time attributed to HPSWITCHTOCM is probably spent in the compatibility mode user Segmented Library procedures which may be referenced by the application and not in the HPSWITCHTOCM intrinsic itself. * TurboIMAGE Another 13.5% of the transaction CPU processing time is used in the TurboIMAGE access intrinsics DBPUT, DBFIND, DBGET, and DBUPDATE. To investigate the use of TurboIMAGE intrinsics, look at Summary Data Set Information and Detailed Data Set Information.
[]
Figure 4-20. Summary TurboIMAGE Data Set Information
[]
Figure 4-21. Detailed TurboIMAGE Data Set Information Internally, each TurboIMAGE database uses a common set of intermediate buffers to accommodate the most frequently accessed blocks of a database. The intent is to minimize the frequency with which application processes will incur MPE File System requests as a result of logical database access. The HP SPT/XL analyzer provides statistical information on the access rates and hit rates to this pool of buffers as shown in figure 4-21. Maximizing the buffer hit rates through adjustment of the number of database buffers allocated to a particular database may in itself result in reduced CPU/Intrinsic, thus contributing to the application's optimization process. This information further qualifies the use of database and file access intrinsics. Reducing the frequency of these intrinsics is usually possible only if alternative processing paths are available, or if the historical information in a file or database can be reduced by archiving or housekeeping. To reduce CPU processing per intrinsic, you can optimize database loading characteristics such as reduced percentage occupation of masters, chained loading, and reloading of details.


MPE/iX 5.0 Documentation