HP 3000 Manuals

Identifying Wait States and Areas for Optimization [ HP SPT/XL User's Manual: Analysis Software ] MPE/iX 5.0 Documentation
HP SPT/XL User's Manual: Analysis Software

Identifying Wait States and Areas for Optimization 

The life cycle of a process in MPE involves three fundamental states:

 *  Using CPU time.

 *  Waiting for CPU time, in priority order.

 *  Waiting for event completion--waiting for disk I/O, memory, and
    locks.

Many MPE XL applications spend a majority of time in the first two
states, either using or waiting for the CPU. These are commonly referred
to as CPU-intensive applications.  For these applications profiling
should concentrate on the dominant transaction and on the intrinsics that
are the primary contributors to use of CPU time.

But there will be instances where an application is spending significant
time waiting for an event to be completed.  Factors such as programming
techniques and locality of data can contribute to these wait states.
Time spent here is an area for potential improvement, because achieving a
CPU-intensive application is a fundamental goal of program tuning.

The following are examples of waiting for event completion:

 *  MPE XL lets a process request that logical file-writes be considered
    incomplete until physically written to disk using the MPE File System
    Intrinsic FCONTROL. While this technique is valid, it works against
    the design of fundamental parts of MPE XL, and results in programs
    spending increased amounts of time waiting for disk I/O writes to be
    completed.

 *  Database loading characteristics that increase the randomness with
    which data is accessed by a program make it more difficult for MPE/XL
    to fetch required data--a program may thus experience a higher degree
    of waiting for disk I/O reads to complete.

 *  The amount of time that programs spend waiting for locks is largely
    due to the implementation of locking strategies for programs which
    share access to MPE files, KSAM files, or TurboIMAGE databases, and
    for explicit use of Resource Identification Numbers (RINs.)

    If two or more processes sharing a common resource use an
    inconsistent locking strategy or apply locks around indefinite events
    such as terminal reads, they spend increasing amounts of their
    transaction processing time waiting for locks to be applied.


NOTE  An important part of application tuning is to maximize the
      CPU-intensive nature of a program by minimizing the time spent
      waiting for disk I/O, waiting for memory, and waiting for locks.


Is Processing Time Spent Efficiently? 

PROCESS_ORDER_LINE's Transaction Summary Information in figure 4-8 shows:

 *  The transaction is only 47.6 percent CPU intensive.  The other 52.4
    percent of processing time is spent waiting for event completion.

 *  Of the 52.4 percent of time this transaction spends waiting for event
    completion, 24.3 percent is waiting for disk I/O.

 *  No time was spent waiting for memory or locks.  (This can be expected
    when data is collected in a single-user environment.)


          Figure 4-8.  Transaction Summary Information 


NOTE  Our goal is to reduce the time spent in wait states to
      insignificant levels.  Since a transaction that shares hardware and
      software resources always spends some of its time waiting for disk
      I/O, memory, and locks, these are our primary areas of
      concentration.


Let's take a look at the 24.3 percent of time PROCESS_ORDER_LINE spent
waiting for disk I/O.

Waiting for Disk I/O 

A good starting point for isolating the cause of wait for disk I/O is to
study the Transaction Intrinsic Profile as shown in figure 4-9.  It
identifies the three intrinsics where the wait states dominate.  For
example, although FCONTROL contributes only 14 percent of this
transaction's processing time, the histogram shows that most of this time
is spent waiting for event completion (* - * indicates wait states).


          Figure 4-9.  Transaction Intrinsic Profile 

We look at Detailed Intrinsic Information for FCONTROL in figure 4-10 to
obtain location and average timing information for unique intrinsic
calls.  It shows that FCONTROL was called 250 times from statement 11
within the procedure save_single_order_line, within the HPOE program
file.


          Figure 4-10.  Detailed Intrinsic Information 

Of the time spent within FCONTROL, 48.1 percent is spent waiting for disk
I/O within the program file HPOE. This intrinsic's wait states suggest
that FCONTROL's contribution to waiting for disk I/O may be the major
contributor to the overall characteristic for PROCESS_ORDER_LINE.

HP SPT/XL has given us a clue to identifying and correcting the problem.

Reducing the Wait for Disk I/O.  HP SPT/XL points us to a location or
statement number in the Order Entry program code.  Analyzing the Pascal
compiler's information points the programmer (through the code offsets
listing) to the statement containing the FCONTROL call.

In this example, the FCONTROL call was an FCONTROL(2) which explicitly
forces physical I/O completion.  Examining the code around this call
showed that PROCESS_ORDER_LINE saves each order line in a temporary work
file that is summarized during the COMPLETE_ORDER transaction.

Each FWRITE in our example is accompanied by an FCONTROL(2) call to force
physical I/O completion.  Physical posting is a valid way to preserve the
integrity of critical files in case of an abnormal interruption, but it
is expensive overhead.  It increases the physical disk I/O and the time
the calling process waits for disk I/O.

Key parts of program tuning are identifying primary contributors to

processing time and deciding if their functions are worth the overhead.

We removed the FCONTROL(2) calls from Order Entry and monitored it again
with HP SPT/XL, storing the collected data in a new logfile--SPTLOG3.

Verify The Improvement.  In the Transaction List in figure 4-11,
transaction process times in SPTLOG3 are compared with the previous ones
in SPTLOG2.  It shows that the average processing time per transaction
for PROCESS_ORDER_LINE is reduced from .496 seconds to .424 seconds as a
result of removing the FCONTROL (2) calls.


          Figure 4-11.  Transaction List Compare 

In the Transaction Summary Information screen in figure 4-12 shows that
the modification reduced the waiting for disk I/O component from 24.3
percent (see figure 4-8) to 19.8 percent.


          Figure 4-12.  Transaction Summary Information 

Again, it would be unwise to assume that a significant wait for disk I/O,
characteristically has only a single cause.  To continue this analysis,
the Transaction Intrinsic Profile and Detailed Intrinsic Information
screens could be reviewed to identify other potential contributors to the
significant wait for the I/O component.

The Transaction Intrinsic Profile from SPTLOG3 data would indicate that
both DBGET and DBFIND calls within PROCESS_ORDER_LINE spend a significant
proportion of their processing time in wait states.  These should be
investigated in a similar manner to that for FCONTROL.


NOTE  In extreme cases, eliminating or reducing the wait for disk I/O for
      a particular transaction type could reduce its transaction
      processing time so much that it is no longer the dominant
      transaction in a program.  As you tune your program, check the
      Transaction List to verify that the dominant transaction does not
      change.


Waiting for Locks 

Now that we've checked the dominant transaction for disk I/O waits, we'll
study the effect of contention for software resources.

MPE-based programs use two types of flags to guard the integrity of
shared entities--locks and latches:

 *  Locks are flags acquired and released directly by an application
    program's use of MPE or subsystem intrinsics, for example, FLOCK,
    DBLOCK, CKLOCK, LOCKLOCRIN, and LOCKGLORIN and their unlocking
    counterparts.

 *  Latches are flags used by MPE and the subsystems to protect their
    internal data structures, such as the MPE directory and TurboIMAGE
    buffers.  Latches are manipulated by MPE and subsystem calls to
    internal procedures and are not normally accessible to applications
    program code.

The time a program or transaction waits for locks includes the time it
spends waiting to acquire both locks and latches.  As with disk I/O wait,
the goal is to identify the causes of significant time waiting for locks
and reduce their effect until a program's transaction throughput reaches
desired levels.  Most of this program tuning effort is limited to
minimizing waits for explicit locking of entities controlled by program
code, such as file, database, and RIN locks.

The approach for examining wait for locks is similar to that used for
checking wait for disk I/O. The primary difference affects the nature of
the data collection.  Excessive time spent in a Wait-for- disk state that
results from inappropriate programming techniques or poor data locality
can usually be detected by monitoring a single process.  Wait for locks
always implies competition for a common resource by multiple processes.
Representative data suitable for investigation of locking contention can
usually be acquired only through collection performed in a multi-user
test or live environment.

Comparing Single and Multi-User Data.  The HP SPT/XL logfile, SPTLOG4,
contains data collected from observing one of three concurrent order
entry processes.

This Transaction List in figure 4-13 compares transaction characteristics
between single-user and multiple-user collection environments contained
in SPTLOG3 and SPTLOG4, respectively.


          Figure 4-13.  Transaction List Compare 

The PROCESS_ORDER_LINE transaction still dominates overall processing
time.  The processing time per transaction in a multiple-user
environment--2.189 seconds average--is much higher than its single-user
processing time--.424 seconds average--suggesting a contention problem.

The Transaction Summary Information in figure 4-14 shows the degree to
which PROCESS_ORDER_LINE waits for event completion.  In this
transaction, 71.1% of the processing time (2.189 seconds) is incurred
waiting for locks.


          Figure 4-14.  Transaction Summary Information 

Isolating the causes of waiting for locks due to application programming
techniques is simpler because fewer intrinsics are involved than in
isolating the causes of waiting for disk.  The causes that may be
responsible occur in pairs and include FLOCK/FUNLOCK, DBLOCK/DBUNLOCK,
LOCKLOCRIN/UNLOCKLOCRIN, and LOCKGLORIN/UNLOCKGLORIN.

The Transaction Intrinsic Profile for PROCESS_ORDER_LINE transactions
(using SPTLOG4) is a good place to start the investigation into the cause
of the time spent waiting for locks.


          Figure 4-15.  Transaction Intrinsic Profile 

Figure 4-15 shows that 72.4% of the average processing time for
PROCESS_ORDER_LINE is incurred in DBLOCK.

The Detailed Intrinsic Information screen in figure 4-16 shows detailed
data for the DBLOCK intrinsic within PROCESS_ORDER_LINE. DBLOCK, called
from statement 17 within the procedure start_commit_single_order_line,
averaged over 250 calls, and incurred 98.2% of an overall intrinsic
processing time of 1.617 seconds waiting for locks.  This shows a severe
locking issue generated at a single location in the code.


          Figure 4-16.  Detailed Intrinsic Information 

HP SPT/XL has pinpointed the main contribution to a locking problem.  To
further investigate, we'll use the Intrinsic Trace function to look at
representative transactions.  The trace shows the code's actual behavior,
not only the likely behavior we might predict from studying the source
code.

Scrolling through the data, the trace shows both the DBLOCK call in
figure 4-17 and the DBUNLOCK call in figure 4-18.  The transaction spent
.961 seconds using DBLOCK.


          Figure 4-17.  Intrinsic Trace--DBLOCK 


          Figure 4-18.  Intrinsic Trace--DBUNLOCK 

Locking contention problems usually have either of two primary causes:

 *  Inconsistent locking strategies between two or more processes that
    share a common resource.

 *  Locking entities around indefinite events such as terminal reads.

Detecting incompatible locking strategies may not be possible, as in our
example, because we see only what the process was trying to lock, not why
it was waiting or what was impeding it.  For example, by using HP SPT/XL
we may be able to detect that an application process waited a certain
period of time to acquire a database lock, but we would not be able to
identify that the wait was due to waiting for another process to
relinquish a conflicting record level-lock.

We can, however, get some clues about the problem by examining the
transaction processing time of the events that were covered by a
lock/unlock pair.

Studying the intervening events shows that this locking pair covers two
calls to VREADFIELDS, the VPLUS terminal read intrinsic.  This intrinsic
involves user "think time," and thus, could be an indefinite event.

HP SPT/XL has given us a clue to the nature of the problem.

Reducing the Wait for Locks.  A study of the source code in this case
showed that PROCESS_ORDER_LINE was imposing Mode-1 TurboIMAGE DBLOCKS
(database level) around VPLUS terminal, reducing concurrent database
access to a minimum.

We reduced locking to the record level.

As before, it is unwise to assume that all causes of wait for locks were
attributable to a single source.  To continue this analysis we
would perform another collection and repeat the various stages of
investigation, including checking for consistency and examining various
wait states for the dominant transaction.

Using/Waiting for CPU 

Profiling a CPU-intensive program calls for concentrating on the dominant
transaction and the intrinsics that contribute to CPU transaction
processing, and making some decisions:

 *  Are all the intrinsic calls constituting the dominant transaction
    necessary?

 *  Are there alternatives?

 *  For the intrinsics that perform data access, would alternative
    characteristics for file, database structure, or loading help?

For each case, we need to assess cost, effort, and risk.  For example,
changing database structure or loading characteristics would probably
affect programs and transactions other than the one we're investigating,
so they must be considered, too.

Which Intrinsics Use the Most CPU Time?.  We can look at the
Transaction Intrinsic Profile for Order Entry's dominant transaction
PROCESS_ORDER_LINE (in SPTLOG3), sorted by CPU time.

In figure 4-19 we see 44.2% of the CPU time was consumed within VPLUS
intrinsics for terminal reads and writes, VREADFIELDS 30.4% plus
VSHOWFORM 13.8%.  This would be the area of greatest potential
optimization.  Another 13.5% of the CPU time is consumed within
TurboIMAGE access intrinsics (DBPUT, DBGET, DBFIND, and DBUPDATE).


          Figure 4-19.  Transaction Intrinsic Profile (By CPU Time) 

 *  VPLUS

    There is no optimum way to call VREADFIELDS, since by default
    VSHOWFORM eliminates unnecessary painting of screens, field
    enhancements, error messages etc.  In this case, t is unlikely that
    this contribution could be reduced without significantly rewriting
    the transaction.  However, given nothing more than SPT data, we
    cannot be certain.  At least the area of greatest potential return
    has been identified.

    VPUTIEEEREAL uses 9.9% of transaction CPU processing time.
    Intrinsics that perform data-type conversion are usually costly in
    CPU time per intrinsic--especially those converting floating-point
    values.  They are prime candidates for alternative techniques.

    In addition, the same profile shows that 8.8% of the CPU time per
    transaction incurred within PROCESS_ORDER_LINE is spent in the
    VFIELDEDITS intrinsic.  VPLUS allows editing specifications for data
    fields to be stored as an integral component of the forms file, as
    opposed to coding them within the application program.  Consequently,
    application optimization may extend beyond source code to examining
    auxiliary components, such as, the complexity of VPLUS field editing
    contained in a VPLUS forms file.  For this example, some economies in
    CPU time per transaction may be obtained by either eliminating
    unnecessary or redundant edits, or replacing general purpose VPLUS
    field editing by customized applications code.

 *  Compatibility Mode

    You can also see that 10.0% of CPU time per transaction for
    PROCESS_ORDER_LINE is spent in the intrinsic HPSWITCHTOCM, which is
    called once per transaction.  The majority of any time attributed to
    HPSWITCHTOCM is probably spent in the compatibility mode user
    Segmented Library procedures which may be referenced by the
    application and not in the HPSWITCHTOCM intrinsic itself.

 *  TurboIMAGE

    Another 13.5% of the transaction CPU processing time is used in the
    TurboIMAGE access intrinsics DBPUT, DBFIND, DBGET, and DBUPDATE. To
    investigate the use of TurboIMAGE intrinsics, look at Summary Data
    Set Information and Detailed Data Set Information.


          Figure 4-20.  Summary TurboIMAGE Data Set Information 


          Figure 4-21.  Detailed TurboIMAGE Data Set Information 

Internally, each TurboIMAGE database uses a common set of intermediate
buffers to accommodate the most frequently accessed blocks of a database.
The intent is to minimize the frequency with which application processes
will incur MPE File System requests as a result of logical database
access.  The HP SPT/XL analyzer provides statistical information on the
access rates and hit rates to this pool of buffers as shown in figure
4-21.  Maximizing the buffer hit rates through adjustment of the number
of database buffers allocated to a particular database may in itself
result in reduced CPU/Intrinsic, thus contributing to the application's
optimization process.

This information further qualifies the use of database and file access
intrinsics.  Reducing the frequency of these intrinsics is usually
possible only if alternative processing paths are available, or if the
historical information in a file or database can be reduced by archiving
or housekeeping.  To reduce CPU processing per intrinsic, you can
optimize database loading characteristics such as reduced percentage
occupation of masters, chained loading, and reloading of details.
MPE/iX 5.0 Documentation