Run-Time Efficiency [ HP COBOL II/XL Programmer's Guide ] MPE/iX 5.0 Documentation
HP COBOL II/XL Programmer's Guide
Run-Time Efficiency
You can improve your program's run-time efficiency with the following:
* An improved algorithm.
This is the most important way you can improve your program's
efficiency. Neither control options nor the optimizer can make up
for a slow algorithm. A program that uses a binary search
(without control options or optimization) is still faster than a
program that uses a linear search (with control options and
optimization).
* Coding heuristics.
* Control options.
* The optimizer.
This section discusses the last three ways of improving rum-time
efficiency.
NOTE Coding heuristics and the optimizer do not significantly improve
the performance of I-O-bound programs.
Coding Heuristics
The following coding heuristics for run-time efficiency are guidelines,
not rules, for writing programs that run faster. They do not always
work, but programmers have learned from experience that they usually do.
* Put variables that are referenced most often (such as array
subscripts and counters) at the beginning of the WORKING-STORAGE
SECTION in the main program and nondynamic subprograms. Put them
at the end of the WORKING-STORAGE SECTION in each dynamic
subprogram.
* Avoid conversion of data to different types. If two or more
variables are operands in the same operation, declare them to be
of the same type. It is more efficient to have operands of the
same type than to make one of several operands "faster." See the
examples below.
* If fields are often used together as operands in arithmetic
statements, it is more efficient to define them with the same
PICTURE clause.
[REV BEG]
* The following examples illustrate the above two points.
* The first of the following COMPUTE statements is faster
than the other two because no conversion is necessary and
the intermediate result is the same as the receiving
operand, DISPLAY-4. The second COMPUTE requires a
conversion from BINARY to DISPLAY because the receiving
item is DISPLAY. The third statement requires a conversion
from BINARY to PACKED-DECIMAL because the receiving item is
PACKED-DECIMAL. These conversions take many machine
instructions.
01 DISPLAY-4 PIC S9(4). 4 bytes long.
01 BINARY-4 PIC S9(4) BINARY. 2 bytes long.
01 DECIMAL-9 PIC S9(9) PACKED-DECIMAL. 5 bytes long.
:
COMPUTE DISPLAY-4 = DISPLAY-4 + DISPLAY-4.
COMPUTE DISPLAY-4 = BINARY-4 + BINARY-4.
COMPUTE DECIMAL-9 = BINARY-4 + BINARY-4.
* Calculations involving multiplication, division, and
exponentiation can require conversions for intermediate
results. When the intermediate results of BINARY operands
exceed 18 digits, the operands are converted to
PACKED-DECIMAL. This takes many extra instructions.
The first of the following COMPUTE statements is faster
then the second because the intermediate result is 18
digits. The second requires conversion to PACKED-DECIMAL
because the intermediate result is 20 digits. The result
of the multiplication is then converted back to BINARY.
01 BINARY-9 PIC S9(9) BINARY. 4 bytes long.
01 BINARY-10 PIC S9(10) BINARY. 8 bytes long.
:
COMPUTE BINARY-9 = BINARY-9 * BINARY-9.
COMPUTE BINARY-9 = BINARY-10 * BINARY-10.
* The first of the following COMPUTE statements is faster
because the intermediate result is 16 bits. The
intermediate result of the second COMPUTE is 32 bits so the
operands must be converted to 32-bit values.
01 BINARY-2 PIC S9(2) BINARY. 16 bits long.
01 BINARY-3 PIC S9(3) BINARY. 16 bits long.
01 BINARY-4 PIC S9(4) BINARY. 16 bits long.
:
COMPUTE BINARY-4 = BINARY-2 * BINARY-2.
COMPUTE BINARY-4 = BINARY-2 * BINARY-3.
* When you MOVE BINARY data items from a larger field to a
smaller field, many instructions are required to truncate
the data. The first of the following MOVE statements is
faster than the second because the data must be truncated
in the second MOVE but not the first.
01 BINARY-3 PIC S9(3) BINARY. 16 bits long.
01 BINARY-4 PIC S9(4) BINARY. 16 bits long.
:
MOVE BINARY-4 TO BINARY-4.
MOVE BINARY-4 TO BINARY-3.
[REV END]
* If a variable is a subscript or a varying identifier in a PERFORM
loop, declare it to be of the type PIC S9(9) BINARY SYNC.
* When coding the UNTIL condition in a loop, keep this in mind: the
comparisons equal and not equal are faster than the comparisons
less than and greater than.
* Compile subprograms with the SUBPROGRAM or ANSISUB control option.
Calls to such subprograms execute faster than calls to subprograms
compiled with PROGRAM-ID identifier IS INITIAL or the DYNAMIC
control option, which require that the subprogram data be
reinitialized whenever the program is called.
For less code, use the control option SUBPROGRAM instead of
ANSISUB. Performance is the same, but an ANSISUB subprogram
contains extra code that reinitializes its data if the main
program executes a CANCEL statement.
* If paragraphs are performed close together timewise, put them
close together physically in your source code.
* If a paragraph is performed from only one place, use an in-line
PERFORM statement for it. This applies especially to loops.
* Use NOT phrases to minimize checking. NOT AT END is especially
useful. See the example in "NOT Phrases."
* Do not pass parameters BY CONTENT.
[REV BEG]
* Do not specify the ON EXCEPTION or ON OVERFLOW phrase in a CALL
statement when you use a literal to specify the program name. If
you do, the program is not bound to the subprogram[REV END] until
run time. This slows the program by approximately .01 second per
CALL, and the loader cannot detect missing subprograms.
* Do not use NEXT SENTENCE as the ELSE clause in an IF statement.
Use CONTINUE or END-IF, or reverse the sense of the condition.
* Computation is fastest with the following types of operands,
listed from the fastest to the slowest:
[REV BEG]
* PIC S9(9) BINARY, SYNCHRONIZED.
* PIC S9(4) BINARY, SYNCHRONIZED.[REV END]
* PACKED-DECIMAL, the fewer digits the faster the
computation.
* Numeric DISPLAY, the fewer digits the faster the
computation.
[REV BEG]
Computation is faster with signed numbers than with unsigned numbers.[REV
END]
[REV BEG]
Coding Heuristics when Calling COBOL Functions.
The following are guidelines when your program calls COBOL functions.
For more information on the COBOL functions, see Chapter 10, "COBOL
Functions," in the HP COBOL II/XL Reference Manual.
* Some of the functions are implemented as calls to run-time
libraries. The rest are implemented simply as inline code.
Inline functions are generally faster than functions in the
run-time library. You might find that coding your own routine for
some library functions is faster than calling the COBOL function.
* The following functions convert the parameter values to floating
point values to calculate the function result. These functions
will execute faster on systems that have a floating point
coprocessor. Use the ROUNDED phrase when precision of these
function values is important.
* ACOS, ASIN, ATAN, COS, SIN, TAN.
* LOG, LOG10, RANDOM, SQRT, SUM.
* MAX, MIN (on numeric operands).
* NUMVAL, NUMVAL-C.
* ORD-MAX, ORD-MIN.
* ANNUITY, MEAN, MEDIAN, MIDRANGE, PRESENT-VALUE, RANGE,
STANDARD-DEVIATION, VARIANCE.
* The precision of functions that convert the parameter values to
floating point values is limited to 15 significant digits. Also,
fractional values may have rounding errors even if the total size
of the argument is less than or equal to 15 digits. Use of
equality comparisons, as shown below, are not recommended.
Not recommended:
IF FUNCTION COS(ANGLE-RADIANS) = 0.1 THEN
PERFORM P1
END-IF
Recommended alternative:
COMPUTE ROUNDED COS-NUM = FUNCTION COS(ANGLE-RADIANS)
IF COS-NUM > 0.1 THEN PERFORM P1.
where COS-NUM is defined as follows:
01 COS-NUM PIC S9V9 COMP.
Another alternative:
IF FUNCTION COS(ANGLE-RADIANS) >= .0999 AND <= 0.1001 THEN
PERFORM P1
END-IF
[REV END]
Control Options
These control options make a program run faster:
* OPTIMIZE=1, which invokes the optimizer. See the section, "The
Optimizer", for details.
* OPTFEATURES=LINKALIGNED[16], which generates code that accesses
variables in the LINKAGE SECTION more efficiently. (If the called
program specifies OPTFEATURES=LINKALIGNED[16], have the calling
program specify OPTFEATURES=CALLALIGNED[16].)
* SYNC32, which allows the compiler to align variables along the
optimum boundaries.
These control options make a program run more slowly:
* VALIDATE, which takes extra time to check that the digits[REV BEG]
and signs[REV END] of numeric items are valid.
* BOUNDS, which generates and executes extra code to check ranges.
* SYNC16, which specifies an alignment that is not optimum for
Series 900 systems.
* ANSISORT, which prevents SORT from reading or writing files
directly.
* SYMDEBUG, which generates Symbolic Debug information to be
executed in the program file.
(If a source program was compiled with SYMDEBUG, you can link the
object module with the NODEBUG option, causing the program to
ignore the Symbolic Debug information and improving its execution
speed).
This control option makes a program compile more slowly:
* CALLINTRINSIC, which causes the compiler to check the file SYSINTR
for each call literal.
The Optimizer
The optimizer is an optional part of the compiler that modifies your code
so that it uses machine resources more efficiently, using less space and
running faster. Note that the optimizer improves your code, not your
algorithm. Optimization is no substitute for improving your algorithm.
A good, unoptimized algorithm is always better than a poor, optimized
one.
You can compile your program with level one optimization by compiling it
with the control option OPTIMIZE=1.
The advantages of level one optimization are:
* The program is approximately 3.1% smaller.
* The program runs 2.8% to 4.4% faster. (Programs that use the
Central Processor Unit intensively and those that do relatively
little I-O benefit most.)
The disadvantages of level one optimization are:
* The program compiles approximately 10% more slowly.
* The symbolic debugger cannot be used with program.
* The statement numbers are not visible when using DEBUG[REV BEG] or
from trap messages (see the example in Chapter 7 for
details).[REV END]
Level two optimization is not available for HP COBOL II/XL programs for
the following reasons:
* The most common COBOL data type is the multibyte string, which is
difficult to optimize. (The easiest types to optimize are level
01 or 77, 16- or 32-bit, binary data types.)
* HP COBOL II/XL programs call millicode routines far more often
than non-COBOL programs do, and the optimizer does not optimize
across routine calls. Level two optimization for COBOL would not
have improved performance enough to be worth the effort, which was
spent on improving millicode routines and code generation instead.
Millicode Routines.
Millicode routines are assembly language routines that deliver high
performance for common COBOL operations such as move and compare. It
supports COBOL operations on MPE XL the way microcode supports them on
MPE V.
Millicode routines are very specialized, tuned to provide optimal
performance for specific data types of specific lengths. Based on
operation and data types and lengths, the COBOL compiler calls the
appropriate millicode routines.
The calling convention for a millicode routine differs from that of an
ordinary routine in the following ways:
* A millicode call requires fewer registers to be saved across a
call, making it faster than an ordinary call.
* A millicode call uses general register 31 as the return register,
rather than general register 2.
When to Use the Optimizer.
Compile your program with optimization only after you have debugged it.
The optimizer can transform legal programs only.
Once you have compiled your program with optimization, you cannot use the
symbolic debugger to debug it. This is because debug information will be
missing from it. The compiler does not generate debug information and
perform optimizations at the same time.
You can still use DEBUG on your program after you have compiled it with
optimization; however, the statement numbers will not appear in the code.
Transformations.
The five level one optimizer transformations are:
1. Basic block optimization.
The optimizer reads the machine code (specifically, the machine
instruction list). When it reads a branch instruction, it creates
a basic block of code. It joins two basic blocks into one
extended block in the following two cases:
a. If it can remove the branch instruction and append the
"branched to" block to the "branched from" block.
b. If the basic blocks are logically related and can be
optimized as a single unit.
Basic block optimization has these three components:
a. Removal of common expressions in basic blocks.
If more than one expression assigns the same value to the
same field within the block, the optimizer removes all but
the first expression.
b. Removal and optimization of load and copy instructions in
basic blocks.
c. Optimization of elementary branches in basic and extended
blocks. (Removal of unnecessary branches and more
efficient arrangement of necessary branches.)
2. Instruction scheduling.
Instruction scheduling reorders the instructions within a basic
block to accomplish the following:
a. Minimize load and store instructions.
b. Optimize the scheduling of instructions that follow
branches.
3. Dead code elimination.
Dead code elimination is the removal of all code that will never
be executed.
4. Branch optimization.
Branch optimization traces the flow of IF-THEN-ELSE-IF...statements
and reorganizes them more efficiently.
5. Peephole optimization.
Peephole optimization substitutes simpler, equivalent instruction
patterns for more complex ones.
MPE/iX 5.0 Documentation