HPlogo HP C/HP-UX Programmer's Guide: Workstations and Servers > Chapter 4 Optimizing HP C Programs

Controlling Specific Optimizer Features

» 

Technical documentation

Complete book in PDF

 » Table of Contents

 » Index

Most of the time, specifying optimization level 1, 2, 3, or 4 should provide you with the control over the optimizer that you need. Additional parameters are provided when you require a finer level of control.

At each level, you can turn on and off specific optimizations using the +O[no]optimization option. The optimization parameter is the name of a specific optimization technique described below. The optional prefix [no] disables the specified optimization.

The following section describes the optimizations that can be turned on or off, their defaults, and the optimization levels at which they may be used. The options listed in Table 4-4 “HP C Advanced Optimization Options ” are described below.

Table 4-4 HP C Advanced Optimization Options

Option

Option

+O[no]dataprefetch

+O[no]entrysched

+O[no]fail_safe

+O[no]fastaccess

+O[no]fltacc

+O[no]global_ptrs_unique

+O[no]initcheck

+O[no]inline

+Oinline_budget

+O[no]libcalls

+O[no]loop_transform

+O[no]loop_unroll

+O[no]moveflops

+O[no]parallel

+O[no]parallel_env

+O[no]parmsoverlap

+O[no]pipeline

+O[no]procelim

+O[no]ptrs_ansi

+O[no]ptrs_strongly_typed

+O[no]ptrs_to_globals

+O[no]regionsched

+O[no]regreassoc

+O[no]sideeffects

+O[no]signedpointers

+O[no]static_prediction

+O[no]vectorize

+O[no]volatile

+O[no]whole_program_mode

 

 

+O[no]dataprefetch

Optimization level(s): 2, 3, 4

Default: +Onodataprefetch

When +Odataprefetch is enabled, the optimizer will insert instructions within innermost loops to explicitly prefetch data from memory into the data cache. Data prefetch instructions will be inserted only for data structures referenced within innermost loops using simple loop varying addresses (that is, in a simple arithmetic progression). It is only available for PA-RISC 2.0 targets.

The math library contains special prefetching versions of vector routines. If you have a PA-RISC 2.0 application that contains operations on arrays larger than 1 megabyte in size, using +Ovectorize in conjunction with +Odataprefetch may improve performance substantially.

Use this option for applications that have high data cache miss overhead.

+O[no]entrysched

Optimization level(s): 1, 2, 3, 4

Default: +Onoentrysched

The +Oentrysched option optimizes instruction scheduling on a procedure's entry and exit sequences. Enabling this option can speed up an application. The option has undefined behavior for applications which handle asynchronous interrupts. The option affects unwinding in the entry and exit regions.

At optimization level +02 and higher (using dataflow information), save and restore operations become more efficient.

This option can change the behavior of programs that perform exception-handling or that handle asynchronous interrupts. The behavior of setjmp() and longjmp() is not affected.

+O[no]fail_safe

Optimization level(s): 1, 2, 3

Default: +Ofail_safe

The +Ofail_safe option allows compilations with internal optimization errors to continue by issuing a warning message and restarting the compilation at +O0.

You can use +Onofail_safe at optimization levels 1, 2, 3, or 4 when you want the internal optimization errors to abort your build.

This option is disabled when compiling for parallelization.

+O[no]fastaccess

Optimization level(s): 0, 1, 2, 3, 4

Default: +Onofastaccess at optimization levels 0, 1, 2 and 3, +Ofastaccess at optimization level 4

The +Ofastaccess option optimizes for fast access to global data items.

Use +Ofastaccess to improve execution speed at the expense of longer compile times.

+O[no]fltacc

Optimization level(s): 2, 3, 4

The +Onofltacc option allows the compiler to perform floating-point optimizations that are algebraically correct but that may result in numerical differences. For example, this option may change the order of expression evaluation as such: If a, b, and c are floating-point variables, the expressions (a + b) + c and a + (b + c) may give slightly different results due to roundoff. In general, these differences will be insignificant.

The +Onofltacc option also enables the optimizer to generate fused multiply-add (FMA) instructions, the FMPYFADD and FMPYNFADD. These instructions improve performance but occasionally produce results that may differ from results produced by code without FMA instructions. In general, the differences are slight. FMA instructions are only available on PA-RISC 2.0 systems.

Specifying +Ofltacc disables the generation of FMA instructions as well as some other floating-point optimizations. Use +Ofltacc if it is important that the compiler evaluate floating-point expressions as it does in unoptimized code. The +Ofltacc option does not allow any optimizations that change the order of expression evaluation and therefore may affect the result.

If you are optimizing code at level 2 or higher and do not specify +Onofltacc or +Ofltacc, the optimizer will use FMA instructions, but will not perform floating-point optimizations that involve expression reordering or other optimizations that potentially impact numerical stability.

The list below identifies the different actions taken by the optimizer according to whether you specify +Ofltacc, +Onofltacc, or neither option.

Optimization        Expression       FMA?
Options Reordering?

+02 No Yes
+02 +Ofltacc No No
+02 +Onofltacc Yes Yes

+O[no]global_ptrs_unique [=name1,name2, ...nameN]

Optimization level(s): 2, 3, 4

Default: +Onoglobal_ptrs_unique

Use this option to identify unique global pointers, so that the optimizer can generate more efficient code in the presence of unique pointers, for example by using copy propagation and common sub-expression elimination. A global pointer is unique if it does not alias with any variable in the entire program.

This option supports a comma-separated list of unique global pointer variable names.

Refer to your online HP C Online Reference for examples.

+O[no]initcheck

Optimization level(s): 2, 3, 4

Default: unspecified

The initialization checking feature of the optimizer has three possible states: on, off, or unspecified. When on (+Oinitcheck), the optimizer initializes to zero any local, scalar, non-static variables that are uninitialized with respect to at least one path leading to a use of the variable.

When off (+Onoinitcheck), the optimizer issues warning messages when it discovers definitely uninitialized variables, but does not initialize them.

When unspecified, the optimizer initializes to zero any local, scalar, non-static variables that are definitely uninitialized with respect to all paths leading to a use of the variable.

Use +Oinitcheck to look for variables in a program that may not be initialized.

+O[no]inline[=name1, name2, ...nameN]

Optimization level(s): 3, 4

Default: +Oinline

When +Oinline is specified without a name list, any function can be inlined. For inlining to be successful, follow prototype definition for function calls in the appropriate header file.

When specified with a name list, the named functions are important candidates for inlining. For example, saying

+Oinline=foo,bar +Onoinline

indicates that inlining be strongly considered for foo and bar; all other routines will not be considered for inlining, since +Onoinline is given.

When this option is disabled with a name list, the compiler will not consider the specified routines as candidates for inlining. For example, saying

+Onoinline=baz,x

indicates that inlining should not be considered for baz and x; all other routines will be considered for inlining, since +Oinline is the default.

The +Onoinline disables inlining for all functions or a specific list of functions.

Use this option when you need to precisely control which subprograms are inlined. Use of this option can be guided by knowledge of the frequency with which certain routines are called and may be warranted by code size concerns.

+Oinline_budget[=n]

Optimization level(s): 3, 4

Default: +Oinline_budget=100

where n is an integer in the range 1 - 1000000 that specifies the level of aggressiveness, as follows:

  • n = 100 Default level of inlining.

  • n > 100 More aggressive inlining. The optimizer is less restricted by compilation time and code size when searching for eligible routines to inline.

  • n = 1 Only inline if it reduces code size.

The +Onolimit and +Osize options also affect inlining. Specifying the +Onolimit option has the same effect as specifying +Oinline_budget=200. The +Osize option has the same effect as +Oinline_budget=1.

Note, however, that the +Oinline_budget option takes precedence over both of these options. This means that you can override the effect of +Onolimit or +Osize option on inlining by specifying the +Oinline_budget option on the same compile line.

+O[no]libcalls

Optimization level(s): 0, 1, 2, 3, 4

Default: +Onolibcalls

Use the +Olibcalls option to increase the runtime performance of code which calls standard library routines in simple contexts. The +Olibcalls option expands the following library calls inline:

  • strcpy()

  • sqrt()

  • fabs()

  • alloca()

Inlining will take place only if the function call follows the prototype definition the appropriate header file. Fast subprogram linkage is also emitted to tuned millicode versions of the math library functions sin, cos, tan, atan 2, log, pow, asin, acos, atan, exp, and log10. (See the HP-UX Floating-Point Guide for the most up-to-date listing of the math library functions.) The calling code must not expect to access ERRNO after the function's return.

A single call to printf() may be replaced by a series of calls to putchar(). Calls to sprintf() and strlen() may be optimized more effectively, including elimination of some calls producing unused results. Calls to setjmp() and longjmp() may be replaced by their equivalents _setjmp() and _longjmp(), which do not manipulate the process's signal mask.

Use +Olibcalls to improve the performance of selected library routines only when you are not performing error checking for these routines.

Using +Olibcalls with +Ofltacc will give different floating point calculation results than those given using +Ofltacc without +Olibcalls.

The +Olibcalls option replaces the obsolete -J option.

+O[no]loop_transform

Optimization level(s): 3, 4

Default: +Oloop_transform

The +O[no]loop_transform option enables [disables] transformation of eligible loops for improved cache performance. The most important transformation is the reordering of nested loops to make the inner loop unit stride, resulting in fewer cache misses.

+Onoloop_transform may be a helpful option if you experience any problem while using +Oparallel.

+O[no]loop_unroll[=unroll factor]

Optimization level(s): 2, 3, 4

Default: +Oloop_unroll

The +Oloop_unroll option turns on loop unrolling. When you use +Oloop_unroll, you can also use the unroll factor to control the code expansion. The default unroll factor is 4, that is, four copies of the loop body. By experimenting with different factors, you may improve the performance of your program.

+O[no]moveflops

Optimization level(s): 2, 3, 4

Default: +Omoveflops

Allows [or disallows] moving conditional floating point instructions out of loops. The +Onomoveflops option replaces the obsolete +OE option. The behavior of floating-point exception handling may be altered by this option.

Use +Onomoveflops if floating-point traps are enabled and you do not want the behavior of floating-point exceptions to be altered by the relocation of floating-point instructions.

+O[no]parallel

Optimization level(s): 3, 4

Default: +Onoparallel

When a program is compiled with the +Oparallel option, the compiler looks for opportunities for parallel execution in loops and generates parallel code to execute the loop on the number of processors set by the MP_NUMBER_OF_THREADS environment variable discussed in the section "Parallel Execution" at the end of this chapter.

The +Oparallel option should not be used for programs that make explicit calls to the kernel threads library /usr/lib/libpthread.sl.

+O[no]parallel (continued)

+Onoloop_transform and +Onoinline may be helpful options if you experience any problem while using +Oparallel.

You may use +Oparallel at optimization levels 3 and 4. The default is +Onoparallel at levels 0-4. +Oparallel disables +Ofailsafe.

Parallelization is incompatible with the prof tool, so the -p option is disabled by +Oparallel. Parallelization is compatible with gprof. Special *crt0.o startup files are required for programs compiled for a parallel environment. The parallel runtime library, libmp.a, must be linked in.

For additional information, see the section "Parallel Execution" at the end of this chapter.

NOTE: At the HP-UX 10.20 release, if a program made of multiple files had any of its files compiled with the +Oparallel option, then the remaining files had to be compiled with either the +Oparallel or +O[no]parallel_env option. The +Oparallel_env option ensured a consistent execution environment for all files in the program, including those not to be compiled for parallel execution. At the HP-UX 10.30 release, it is no longer necessary or permissible to use the +O[no]parallel_env option.

+O[no]parmsoverlap

Optimization level(s): 2, 3, 4

Default: +Oparmsoverlap

The +Oparmsoverlap option optimizes with the assumption that the actual arguments of function calls overlap in memory.

The +Onoparmsoverlap option replaces the obsolete +Om1 option.

Use +Onoparmsoverlap if C programs have been literally translated from FORTRAN programs.

+O[no]pipeline

Optimization level(s): 2, 3, 4

Default: +Opipeline

Enables [or disables] software pipelining. The +Onopipeline option replaces the obsolete +Os option.

Use +Onopipeline to conserve code space.

+O[no]procelim

Optimization level(s): 0, 1, 2, 3, 4

Default: +Onoprocelim at levels 0-3, +Oprocelim at level 4

When +Oprocelim is specified, procedures that are not referenced by the application are eliminated from the output executable file. The +Oprocelim option reduces the size of the executable file, especially when optimizing at levels 3 and 4, at which inlining may have removed all of the calls to some routines.

When +Onoprocelim is specified, procedures that are not referenced by the application are not eliminated from the output executable file.

The default is +Onoprocelim at levels 0-3, and +Oprocelim at level 4.

If the +Oall option is enabled, the +Oprocelim option is enabled.

+O[no]ptrs_ansi

Optimization level(s): 2, 3, 4

Default: +Onoptrs_ansi

Use +Optrs_ansi to make the following two assumptions, which the more aggressive +Optrs_strongly_typed does not make:

  • An int *p is assumed to point to an int field of a struct or union.

  • char * is assumed to point to any type of object.

When both are specified, +Optrs_ansi takes precedence over +Optrs_strongly_typed.

For more information about type aliasing see the section "Aliasing Options" later in this chapter.

+O[no]ptrs_strongly_typed

Optimization level(s): 2, 3, 4

Default: +Onoptrs_strongly_typed

Use +Optrs_strongly_typed when pointers are type-safe. The optimizer can use this information to generate more efficient code.

Type-safe (that is, strongly-typed) pointers are pointers to a specific type that only point to objects of that type, and not to objects of any other type. For example, a pointer declared as a pointer to an int is considered type-safe if that pointer points to an object only of type int, but not to objects of any other type.

Based on the type-safe concept, a set of groups are built based on object types. A given group includes all the objects of the same type.

The term type-inferred aliasing is a concept which means any pointer of a type in a given group (of objects of the same type) can only point to any object from the same group; it can not point to a typed object from any other group.

For more information about type aliasing see the section "Aliasing Options" later in this chapter.

Type casting to a different type violates type-inferring aliasing rules. See Example 2 below.

Dynamic casting is allowed. See Example 3 below.

+O[no]ptrs_strongly_typed (continued)

For finer detail, see the "[NO]PTRS_STRONGLY_TYPED pragma" section later in this chapter..

Example 1: How Data Types Interact

The optimizer generally spills all global data from registers to memory before any modification to global variables or any loads through pointers. However, you can instruct the optimizer on how data types interact so that it can generate more efficient code.

If you have the following:

1  int *p;
2 float *q;
3 int a,b,c;
4 float d,e,f;
5 foo()
6 {
7 for (i=1;i<10;i++) {
8 d=e
9 *p=..
10 e=d+f;
11 f=*q;
12 }
13 }

With +Onoptrs_strongly_typed turned on, the pointers p and q will be assumed to be disjoint because the types they point to are different types. Without type-inferred aliasing, *p is assumed to invalidate all the definitions. So, the use of d and f on line 10 have to be loaded from memory. With type-inferred aliasing, the optimizer can propagate the copy of d and f and thus avoid two loads and two stores.

This option can be used for any application involving the use of pointers, where those pointers are type safe. To specify when a subset of types are type-safe, use the [NO]PTRS_STRONGLY_TYPED pragma. The compiler issues warnings for any incompatible pointer assignments that may violate the type-inferred aliasing rules discussed in "Aliasing Options" later in this chapter.

+O[no]ptrs_strongly_typed (continued)

Example 2: Unsafe Type Cast

Any type cast to a different type violates type-inferred aliasing rules. Do not use +Optrs_strongly_typed with code that has these "unsafe" type casts. Use the [NO]PTRS_STRONGLY_TYPED pragma to prevent the application of type-inferred aliasing to the unsafe type casts.

struct foo{
int a;
int b;
} *P;

struct bar {
float a;
int b;
float c;
} *q;

P = (struct foo *) q;
/* Incompatible pointer assignment
through type cast */

Example 3: Generally Applying Type Aliasing

Dynamic cast is allowed with +Optrs_strongly_typed or +Optrs_ansi. A pointer dereference is called dynamic cast if a cast is applied on the pointer to a different type.

In the example below, type-inferred aliasing is applied on P generally, not just to the particular dereference. Type-aliasing will be applied to any other dereferences of P.

struct s {
short int a;
short int b;
int c;
} *P
* (int *)P = 0;

For more information about type aliasing see the section "Aliasing Options" at the end of this chapter.

+O[no]ptrs_to_globals[=name1, name2, ...nameN]

Optimization level(s): 2, 3, 4

Default: +Optrs_to_globals

By default global variables are conservatively assumed to be modified anywhere in the program. Use this option to specify which global variables are not modified through pointers, so that the optimizer can make your program run more efficiently by incorporating copy propagation and common sub-expression elimination.

This option can be used to specify all global variables as not modified via pointers, or to specify a comma-separated list of global variables as not modified via pointers.

Note that the on state for this option disables some optimizations, such as aggressive optimizations on the program's global symbols.

For example, use the command line option +Onoptrs_to_globals=a,b,c to specify global variables a, b, and c as not being accessed through pointers. No pointer can access these global variables. The optimizer will perform copy propagation and constant folding because storing to *p will not modify a or b.

int a, b, c;
float *p;
foo()
{
a = 10;
b = 20;
*p = 1.0;
c = a + b;
}

If all global variables are unique, use the following option without listing the global variables:

+Onoptrs_to_globals

+O[no]ptrs_to_globals (continued)

In the example below, the address of b is taken. This means b can be accessed indirectly through the pointer. You can still use +Onoptrs_to_globals as: +Onoptrs_to_globals +Optrs_to_globals=b.

int b,c;
int *p;

p=b;

foo()

For more information about type aliasing see the section "Aliasing Options" at the end of this chapter.

+O[no]regionsched

Optimization level(s): 2, 3, 4

Default: +Onoregionsched

Applies aggressive scheduling techniques to move instructions across branches. This option is incompatible with the linker -z option. If used with -z, it may cause a SIGSEGV error at run-time.

Use +Oregionsched to improve application run-time speed. Compilation time may increase.

+O[no]regreassoc

Optimization level(s): 2, 3, 4

Default: +Oregreassoc

If disabled, this option turns off register reassociation.

Use +Onoregreassoc to disable register reassociation if this optimization hinders the optimized application performance.

+O[no]sideeffects=[name1, name2, ...nameN]

Optimization level(s): 2, 3, 4

Default: assume all subprograms have side effects

Assume that subprograms specified in the name list might modify global variables. Therefore, when +Osideeffects is enabled the optimizer limits global variable optimization.

The default is to assume that all subprograms have side effects unless the optimizer can determine that there are none.

Use +Onosideeffects if you know that the named functions do not modify global variables and you wish to achieve the best possible performance.

+O[no]signedpointers

Optimization level(s): 0, 1, 2, 3, 4

Default: +Onosignedpointers

Perform [or do not perform] optimizations related to treating pointers as signed quantities. Applications that allocate shared memory and that compare a pointer to shared memory with a pointer to private memory may run incorrectly if this optimization is enabled.

Use +Osignedpointers to improve application run-time speed.

+O[no]static_prediction

Optimization level(s): 0, 1, 2, 3, 4

Default: +Onostatic_prediction

+Ostatic_prediction turns on static branch prediction for PA-RISC 2.0 targets.

PA-RISC 2.0 has two means of predicting which way conditional branches will go: dynamic branch prediction and static branch prediction. Dynamic branch prediction uses a hardware history mechanism to predict future executions of a branch from its last three executions. It is transparent and quite effective unless the hardware buffers involved are overwhelmed by a large program with poor locality.

With static branch prediction on, each branch is predicted based on implicit hints encoded in the branch instruction itself; the dynamic branch prediction is not used.

Static branch prediction's role is to handle large codes with poor locality for which the small dynamic hardware facility will prove inadequate.

Use +Ostatic_prediction to better optimize large programs with poor instruction locality, such as operating system and database code.

Use this option only when using PBO, as an amplifier to +P. It is allowed but silently ignored with +I, so makefiles need not change between the +I and +P phases.

+O[no]vectorize

Optimization level(s): 0, 1, 2, 3, 4

Default: +Onovectorize

+Ovectorize allows the compiler to replace certain loops with calls to vector routines.

Use +Ovectorize to increase the execution speed of loops.

When +Onovectorize is specified, loops are not replaced with calls to vector routines.

Because the +Ovectorize option may change the order of operations in an application, it may also change the results of those operations slightly. See the HP-UX Floating-Point Guide for details.

The math library contains special prefetching versions of vector routines. If you have a PA2.0 application that contains operations on very large arrays (larger than 1 megabyte in size), using +Ovectorize in conjunction with +Odataprefetch may improve performance substantially.

You may use +Ovectorize at levels 3 and 4. +Onovectorize is also included as part of +Oaggressive and +Oall.

This option is only valid for PA-RISC 1.1 and 2.0 systems.

+O[no]volatile

Optimization level(s): 1, 2, 3, 4

Default: +Onovolatile

The +Ovolatile option implies that memory references to global variables cannot be removed during optimization.

The +Onovolatile option implies that all globals are not of volatile class. This means that references to global variables can be removed during optimization.

The +Ovolatile option replaces the obsolete +OV option.

Use this option to control the volatile semantics for all global variables.

+O[no]whole_program_mode

Optimization level(s): 4

Default: +Onowhole_program_mode

The +Owhole_program_mode option enables the assertion that only the files that are compiled with this option directly reference any global variables and procedures that are defined in these files. In other words, this option asserts that there are no unseen accesses to the globals.

When this assertion is in effect, the optimizer can hold global variables in registers longer and delete inlined or cloned global procedures.

All files compiled with +Owhole_program_mode must also be compiled with +O4. If any of the files were compiled with +O4 but were not compiled with +Owhole_program_mode, the linker disables the assertion for all files in the program.

The default, +Onowhole_program_mode, disables the assertion.

Use this option to increase performance speed, but only when you are certain that only the files compiled with +Owhole_program_mode directly access any globals that are defined in these files.

© Hewlett-Packard Development Company, L.P.