HPlogo HP C/HP-UX Programmer's Guide: Workstations and Servers > Chapter 5 Programming for Portability

General Portability Considerations

» 

Technical documentation

Complete book in PDF

 » Table of Contents

 » Index

This section summarizes some of the general considerations to take into account when writing portable HP C programs. Some of the features listed here may be different on other implementations of C. Differences between Series 300/400 versus workstations and servers implementations are also noted in this section.

Data Type Sizes and Alignments

Table 2-1 in Chapter 2 shows the sizes and alignments of the C data types on the different architectures.

Differences in data alignment can cause problems when porting code or data between systems that have different alignment schemes. For example, if you write a C program on Series 300/400 that writes records to a file, then read the file using the same program on HP 9000 workstations and servers, it may not work properly because the data may fall on different byte boundaries within the file due to alignment differences. To help alleviate this problem, HP C provides the HP_ALIGN pragma, which forces a particular alignment scheme, regardless of the architecture on which it is used. The HP_ALIGN pragma is described in Chapter 2.

Accessing Unaligned Data

The HP 9000 workstations and servers like all PA-RISC processors require data to be accessed from locations that are aligned on multiples of the data size. The C compiler provides an option to access data from misaligned addresses using code sequences that load and store data in smaller pieces, but this option will increase code size and reduce performance. A bus error handling routine is also available to handle misaligned accesses but can reduce performance severely if used heavily.

Here are your specific alternatives for avoiding bus errors:

  1. Change your code to eliminate misaligned data, if possible. This is the only way to get maximum performance, but it may be difficult or impossible to do. The more of this you can do, the less you'll need the next two alternatives.

  2. Use the +ubytes compiler option available at 9.0 to allow 2-byte alignment. However, the +ubytes option, as noted above, creates big, slow code compared to the default code generation which is able to load a double precision number with one 8-byte load operation. Refer to the HP C/HP-UX Reference Manual for more information.

  3. Finally, you can use allow_unaligned_data_access() to avoid alignment errors. allow_unaligned_data_access() sets up a signal handler for the SIGBUS signal. When the SIGBUS signal occurs, the signal handler extracts the unaligned data from memory byte by byte.

    To implement, just add a call to allow_unaligned_data_access() within your main program before the first access to unaligned data occurs. Then link with -lhppa. Any alignment bus errors that occur are trapped and emulated by a routine in the libhppa.a library in a manner that will be transparent to you. The performance degradation will be significant, but if it only occurs in a few places in your program it shouldn't be a big concern.

Whether you use alternative 2 or 3 above depends on your specific code.

The +ubytes option costs significantly less per access than the handler, but it costs you on every access, whether your data is aligned or not, and it can make your code quite a bit bigger. You should use it selectively if you can isolate the routines in your program that may be exposed to misaligned pointers.

There is a performance degradation associated with alternative 3 because each unaligned access has to trap to a library routine. You can use the unaligned_access_count variable to check the number of unaligned accesses in your program. If the number is fairly large, you should probably use 2. If you only occasionally use a misaligned pointer, it is probably better just use the allow_unaligned_data_access handler. There is a stiff penalty per bus error, but it doesn't cause your program to fail and it won't cost you anything when you operate on aligned data.

The following is a an example of its use within a C program:

extern int unaligned_access_count;
/* This variable keeps a count
of unaligned accesses. */

char arr[]="abcdefgh";
char *cp, *cp2;
int i=99, j=88, k;
int *ip; /* This line would normally result in a
bus error on workstations or servers */
main()
{
allow_unaligned_data_access();
cp = (char *)&i;
cp2 = &arr[1];
for (k=0; k<4; k++)
cp2[k] = * (cp+k);
ip = (int *)&arr[1];
j = *ip;
printf("%d\n", j);
printf("unaligned_access_count is : %d\n", unaligned_access_count);
}

To compile and link this program, enter

cc filename.c -lhppa

This enables you to link the program with allow_unaligned_data_access() and the int unaligned_access_count that reside in /usr/lib/libhppa.a.

Note that there is a performance degradation associated with using this library since each unaligned access has to trap to a library routine. You can use the unaligned_access_count variable to check the number of unaligned accesses in your program. If the number is fairly large, you should probably use the compiler option.

Checking for Alignment Problems with lint

If invoked with the -s option, the lint command generates warnings for C constructs that may cause portability and alignment problems between Series 300/400 and Series 9000 workstations and servers, and vice versa. Specifically, lint checks for these cases:

  • Internal padding of structures. lint checks for instances where a structure member may be aligned on a boundary that is inappropriate according to the most-restrictive alignment rules. For example, given the code

     struct s1 { char c; long l; }; 

    lint issues the warning:

     warning:  alignment of struct 's1' may not be portable 
  • Alignment of structures and simple types. For example, in the following code, the nested struct would align on a 2-byte boundary on Series 300/400 and an 8-byte boundary on HP 9000 workstations and servers:

     struct s3 { int i; struct { double d; } s; }; 

    In this case, lint issues this warning about alignment:

    warning:  alignment of struct 's3' may not be portable
  • End padding of structures. Structures are padded to the alignment of the most-restrictive member. For example, the following code would pad to a 2-byte boundary on Series 300/400 and a 4-byte boundary for HP 9000 workstations and servers:

     struct s2 { int i; short s; }; 

    In this case, lint issues the warning:

    warning:  trailing padding of struct/union 's2' may not be portable

Note that these are only potential alignment problems. They would cause problems only when a program writes raw files which are read by another system. This is why the capability is accesible only through a command line option; it can be switched on and off.

lint does not check the layout of bit-fields.

Ensuring Alignment without Pragmas

Another solution to alignment differences between systems would be to define structures in such a way that they are forced into the same layout on different systems. To do this, use padding bytes — that is, dummy variables that are inserted solely for the purpose of forcing struct layout to be uniform across implementations. For example, suppose you need a structure with the following definition:

struct S {
char c1;
int i;
char c2;
double d;
};

An alternate definition of this structure that uses filler bytes to ensure the same layout on Series 300/400 and workstations and servers would look like this:

struct S {
char c1; /* byte 0 */
char pad1,pad2,pad3; /* bytes 1 through 3 */
int i; /* bytes 4 through 7 */
char c2; /* byte 8 */
char pad9,pad10,pad11, /* bytes 9 */
pad12,pad13,pad14, /* through */
pad15; /* 15 */
double d; /* bytes 16 through 23 */
};

Casting Pointer Types

Before understanding how casting pointer types can cause portability problems, you must understand how HP 9000 workstations and servers align data types. In general, a data type is aligned on a byte boundary equivalent to its size. For example, the char data type can fall on any byte boundary, the int data type must fall on a 4-byte boundary, and the double data type must fall on an 8-byte boundary. A valid location for a data type would then satisfy the following equation:

location mod sizeof(data_type) == 0

Consider the following program:

#include <string.h>
#include <stdio.h>
main()
{
struct chStruct {
char ch1; /* aligned on
an even boundary */
char chArray[9]; /* aligned on
an odd byte boundary */
} foo;

int *bar; /* must be aligned
on a word boundary */

strcpy(foo.chArray, "1234"); /* place a value
in the ch array */
bar = (int *) foo.chArray; /* type cast */
printf("*bar = %d\n",*bar); /* display the value */
}

Casting a smaller type (such as char) to a larger type (such as int) will not cause a problem. However, casting a char* to an int* and then dereferencing the int* may cause an alignment fault. Thus, the above program crashes on the call to printf() when bar is dereferenced.

Such programming practices are inherently non-portable because there is no standard for how different architectures reference memory. You should try to avoid such programming practices.

As another example, if a program passes a casted pointer to a function that expects a parameter with stricter alignment, an alignment fault may occur. For example, the following program causes an alignment fault on the HP 9000 workstations and servers:

void main (int argc, char *argv[])
{
char pad;
char name[8];

intfunc((int *)&name[1]);
}

int intfunc (int *iptr)
{
printf("intfunc got passed %d\n", *iptr);
}

Type Incompatibilities and typedef

The C typedef keyword provides an easy way to write a program to be used on systems with different data type sizes. Simply define your own type equivalent to a provided type that has the size you wish to use.

For example, suppose system A implements int as 16 bits and long as 32 bits. System B implements int as 32 bits and long as 64 bits. You want to use 32 bit integers. Simply declare all your integers as type INT32, and insert the appropriate typedef on system A:

typedef long INT32;

The code on system B would be:

typedef int INT32;

Conditional Compilation

Using the #ifdef C preprocessor directive and the predefined symbols __hp9000s300, __hp9000s700, and __hp9000s800, you can group blocks of system-dependent code for conditional compilation, as shown below:

#ifdef  __hp9000s300
.
.
    .
Series 300/400-specific code goes here...
.
.
.
#endif

#ifdef __hp9000s700
.
.
.
Series 700-specific code goes here...
.
.
.
#endif

#ifdef __hp9000s800
.
.
.
Series 700/800-specific code goes here...
.
.
.
#endif

If this code is compiled on a Series 300/400 system, the first block is compiled; if compiled on a Series 700 system, the second block is compiled; if compiled on either the Series 700 or Series 800, the third block is compiled. You can use this feature to ensure that a program will compile properly on either Series 300/400 or workstations or servers.

If you want your code to compile only on the Series 800 but not on the 700, surround your code as follows:

#if (defined(__hp9000s800) && !defined(__hp9000s700))
.
.
.
Series 800-specific code goes here...
.
.
.
#endif

Isolating System-Dependent Code with include Files

#include files are useful for isolating the system-dependent code like the type definitions in the previous section. For instance, if your type definitions were in a file mytypes.h, to account for all the data size differences when porting from system A to system B, you would only have to change the contents of file mytypes.h. A useful set of type definitions is in /usr/include/model.h.

NOTE: If you use the symbolic debugger, xdb, include files used within union, struct, or array initialization will generate correct code. However, such use is discouraged because xdb may show incorrect debugging information about line numbers and source file numbers.

Parameter Lists

On the Series 300/400, parameter lists grow towards higher addresses. On the HP 9000 workstations and servers, parameter lists are usually stacked towards decreasing addresses (though the stack itself grows towards higher addresses). The compiler may choose to pass some arguments through registers for efficiency; such parameters will have no stack location at all.

ANSI C function prototypes provide a way of having the compiler check parameter lists for consistency between a function declaration and a function call within a compilation unit. lint provides an option (-Aa) that flags cases where a function call is made in the absence of a prototype.

The ANSI C <stdarg.h> header file provides a portable method of writing functions that accept a variable number of arguments. You should note that <stdarg.h> supersedes the use of the varargs macros. varargs is retained for compatibility with the pre-ANSI compilers and earlier releases of HP C/HP-UX. See varargs(5) and vprintf(3S) for details and examples of the use of varargs.

The char Data Type

The char data type defaults to signed. If a char is assigned to an int, sign extension takes place. A char may be declared unsigned to override this default. The line:

unsigned char   ch;

declares one byte of unsigned storage named ch. On some non-HP-UX systems, char variables are unsigned by default.

Register Storage Class

The register storage class is supported on Series 300/400 and workstation and servers, and if properly used, can reduce execution time. Using this type should not hinder portability. However, its usefulness on systems will vary, since some ignore it. Refer to the HP-UX Assembler and Supporting Tools for Series 300/400 for a more complete description of the use of the register storage class on Series 300/400.

Also, the register storage class declarations are ignored when optimizing at level 2 or greater on all Series.

Identifiers

To guarantee portable code to non-HP-UX systems, the ANSI C standard requires identifier names without external linkage to be significant to 31 case-sensitive characters. Names with external linkage (identifiers that are defined in another source file) will be significant to six case-insensitive characters. Typical C programming practice is to name variables with all lower-case letters, and #define constants with all upper case.

Predefined Symbols

The symbol __hp9000s300 is predefined on Series 300/400; the symbols __hp9000s800 and __hppa are predefined on Series 700/800; and __hp9000s700 is predefined on Series 700 only. The symbols __hpux and __unix are predefined on all HP-UX implementations.

This is only an issue if you port code to or from systems that also have predefined these symbols.

Shift Operators

On left shifts, vacated positions are filled with 0. On right shifts of signed operands, vacated positions are filled with the sign bit (arithmetic shift). Right shifts of unsigned operands fill vacated bit positions with 0 (logical shift). Integer constants are treated as signed unless cast to unsigned. Circular shifts are not supported in any version of C. Shifts greater than 32 bits give an undefined result.

The sizeof Operator

The sizeof operator yields an unsigned int result, as specified in section 3.3.3.4 of the ANSI C standard (X3.159-1989). Therefore, expressions involving this operator are inherently unsigned. Do not expect any expression involving the sizeof operator to have a negative value (as may occur on some other systems). In particular, logical comparisons of such an expression against zero may not produce the object code you expect as the following example illustrates.

main()
{
int i;
i = 2;
if ((i-sizeof(i)) < 0) /* sizeof(i) is 4,
but unsigned! */
printf("test less than 0\n");
else
printf("an unsigned expression cannot be less than 0\n");
}

When run, this program will print

an unsigned expression cannot be less than 0

because the expression (i-sizeof(i)) is unsigned since one of its operands is unsigned (sizeof(i)). By definition, an unsigned number cannot be less than 0 so the compiler will generate an unconditional branch to the else clause rather than a test and branch.

Bit-Fields

The ANSI C definition does not prescribe bit-field implementation; therefore each vendor can implement bit-fields somewhat differently. This section describes how bit-fields are implemented in HP C.

Bit-fields are assigned from most-significant to least-significant bit on all HP-UX and Domain systems.

On all HP-UX implementations, bit-fields can be signed or unsigned, depending on how they are declared.

On the Series 300/400, a bit-field declared without the signed or unsigned keywords will be signed in ANSI mode and unsigned in compatibility mode by default.

On the workstations and servers, plain int, char, or short bit-fields declared without the signed or unsigned keywords will be signed in both compatibility mode and ANSI mode by default.

On the HP 9000 workstations and servers, and for the most part on the Series 300/400, bit-fields are aligned so that they cannot cross a boundary of the declared type. Consequently, some padding within the structure may be required. As an example,

struct foo
{
unsigned int a:3, b:3, c:3, d:3;
unsigned int remainder:20;
};

For the above struct, sizeof(struct foo) would return 4 (bytes) because none of the bit-fields straddle a 4 byte boundary. On the other hand, the following struct declaration will have a larger size:

struct foo2
{
unsigned char a:3, b:3, c:3, d:3;
unsigned int remainder:20;
};

In this struct declaration, the assignment of data space for c must be aligned so it doesn't violate a byte boundary, which is the normal alignment of unsigned char. Consequently, two undeclared bits of padding are added by the compiler so that c is aligned on a byte boundary. sizeof(struct foo2) returns 6 (bytes) on Series 300/400, and 8 on workstations and servers. Note, however, that on Domain systems or when using #pragma HP_ALIGN NATURAL, which uses Domain bit-field mapping, 4 is returned because the char bit-fields are considered to be ints.)

Bit-fields on HP-UX systems cannot exceed the size of the declared type in length. The largest possible bit-field is 32 bits. All scalar types are permissible to declare bit-fields, including enum.

Enum bit-fields are accepted on all HP-UX systems. On Series 300/400 in compatibility mode they are implemented internally as unsigned integers. On workstations and servers, however, they are implemented internally as signed integers so care should be taken to allow enough bits to store the sign as well as the magnitude of the enumerated type. Otherwise your results may be unexpected. In ANSI mode, the type of enum bit-fields is signed int on all HP-UX systems.

Floating-Point Exceptions

HP C on workstations and servers, in accordance with the IEEE standard, does not trap on floating point exceptions such as division by zero. By contrast, when using HP C on Series 300/400, floating-point exceptions will result in the run-time error message Floating exception (core dumped). One way to handle this error on workstations and servers is by setting up a signal handler using the signal system call, and trapping the signal SIGFPE. For details, see signal(2), signal(5), and "Advanced HP-UX Programming" in HP-UX Linker and Libraries Online User Guide.

For full treatment of floating-point exceptions and how to handle them, see HP-UX Floating-Point Guide.

Integer Overflow

In HP C, as in nearly every other implementation of C, integer overflow does not generate an error. The overflowed number is "rolled over" into whatever bit pattern the operation happens to produce.

Overflow During Conversion from Floating Point to Integral Type

HP-UX systems will report a floating exception - core dumped at run time if a floating point number is converted to an integral type and the value is outside the range of that integral type. As with the error described previously under "Floating Point Exceptions," a program to trap the floating-point exception signal (SIGFPE) can be used. See signal(2) and signal(5) for details.

Structure Assignment

The HP-UX C compilers support structure assignment, structure-valued functions, and structure parameters. The structs in a struct assignment s1=s2 must be declared to be the same struct type as in:

struct s  s1,s2;

Structure assignment is in the ANSI standard. Prior to the ANSI standard, it was a BSD extension that some other vendors may not have implemented.

Structure-Valued Functions

Structure-valued functions support storing the result in a structure:

s = fs();

All HP-UX implementations allow direct field dereferences of a structure-valued function. For example:

x = fs().a;

Structure-valued functions are ANSI standard. Prior to the ANSI standard, they were a BSD extension that some vendors may not have implemented.

Dereferencing Null Pointers

Dereferencing a null pointer has never been defined in any C standard. Kernighan and Ritchie's The C Programming Language and the ANSI C standard both warn against such programming practice. Nevertheless, some versions of C permit dereferencing null pointers.

Dereferencing a null pointer returns a zero value on all HP-UX systems. The workstations and servers C compiler provides the -z compile line option, which causes the signal SIGSEGV to be generated if the program attempts to read location zero. Using this option, a program can "trap" such reads.

Since some programs written on other implementations of UNIX rely on being able to dereference null pointers, you may have to change code to check for a null pointer. For example, change:

if (*ch_ptr != '\0')

to:

if ((ch_ptr != NULL) && *ch_ptr != '\0')

Writes of location zero may be detected as errors even if reads are not. If the hardware cannot assure that location zero acts as if it was initialized to zero or is locked at zero, the hardware acts as if the -z flag is always set.

Expression Evaluation

The order of evaluation for some expressions will differ between HP-UX implementations. This does not mean that operator precedence is different. For instance, in the expression:

x1 = f(x) + g(x) * 5;

f may be evaluated before or after g, but g(x) will always be multiplied by 5 before it is added to f(x). Since there is no C standard for order of evaluation of expressions, you should avoid relying on the order of evaluation when using functions with side effects or using function calls as actual parameters. You should use temporary variables if your program relies upon a certain order of evaluation.

Variable Initialization

On some C implementations, auto (non-static) variables are implicitly initialized to 0. This is not the case on HP-UX and it is most likely not the case on other implementations of UNIX. Don't depend on the system initializing your local variables; it is not good programming practice in general and it makes for nonportable code.

Conversions between unsigned char or unsigned short and int

All HP-UX C implementations, when used in compatibility mode, are unsigned preserving. That is, in conversions of unsigned char or unsigned short to int, the conversion process first converts the number to an unsigned int. This contrasts to some C implementations that are value preserving (that is, unsigned char terms are first converted to char and then to int before they are used in an expression).

Consider the following program:

main()
{
int i = -1;
unsigned char uc = 2;
unsigned int ui = 2;

if (uc > i)
printf("Value preserving\n");
else
printf("Unsigned preserving\n");
if (ui < i)
printf("Unsigned comparisons performed\n");
}

On HP-UX systems in compatibility mode, the program will print:

Unsigned preserving
Unsigned comparisons performed

In contrast, ANSI C specifies value preserving; so in ANSI mode, all HP-UX C compilers are value preserving. The same program, when compiled in ANSI mode, will print:

Value preserving
Unsigned comparisons performed

Temporary Files ($TMPDIR)

All HP-UX C compilers produce a number of intermediate temporary files for their private use during the compilation process. These files are normally invisible to you since they are created and removed automatically. If, however, your system is tightly constrained for file space these files, which are usually generated on /tmp or /usr/tmp, may exceed space requirements. By assigning another directory to the TMPDIR environment variable you can redirect these temporary files. See the cc manual page for details.

Input/Output

Since the C language definition provides no I/O capability, it depends on library routines supplied by the host system. Data files produced by using the HP-UX calls write(2) or fwrite(3) should not be expected to be portable between different system implementations. Byte ordering and structure packing rules will make the bits in the file system-dependent, even though identical routines are used. When in doubt, move data files using ASCII representations (as from printf(3)), or write translation utilities that deal with the byte ordering and alignment differences.

Checking for Standards Compliance

In order to check for standards compliance to a particular standard, you can use the lint program with one of the following -D options:

  • -D_XOPEN_SOURCE

  • -D_POSIX_SOURCE

For example, the command

lint -D_POSIX_SOURCE file.c

checks the source file file.c for compliance with the POSIX standard.

If you have the HP Advise product, you can also check for C standard compliance using the apex command.

© Hewlett-Packard Development Company, L.P.