HPlogo PA-RISC Procedure Calling Conventions Reference Manual > Chapter 5 Inter-Module Procedure Calls

5.1 External Calls on MPE XL

MPE documents

Complete PDF
Table of Contents

As mentioned earlier, there are three types of procedure calls that can be executed, and they can be classified into two groups: intra-module (local) and inter-module (external) calls. Basically, a local call is one in which the caller and the callee reside in the same load module, and an external call is one in which that is not necessarily the case. There is one exception to this definition, however: calls which cross privilege level boundaries are always treated as external calls, even if the caller and callee reside in the same load module. Although external calls are closely related to local calls, several notable differences exist in storage and access conventions; these differences are explained in the following material.

The inter-module (external) calling sequence is distinguished from the intra-module (local) calling sequence to take advantage of the system-wide global virtual addressing of the PA-RISC processor and to implement code sharing. Unlike most conventional systems where each process has a private virtual address space, all processes in a PA-RISC system share a finite, although large, number of spaces (2^16 currently, 2^32 potentially). Therefore, it is undesirable to assign virtual spaces to inactive program images (i.e. those on disk). Generally, the assignments of virtual spaces to a program will be delayed until the program is activated.

In order to avoid extensive linking at process creation time, it is desirable to have a central data structure (for each process) that will hold the SIDs assigned to the load modules used by the process. This data structure is the XRT (Inter-Module Cross Reference Table). Only the entries in the XRT need to be updated when virtual spaces are finally assigned to load modules.

Another use for the XRT is the location of the global data area of a load module. The offset of the global data area in the private data space of a process depends on the combination of load modules called by the process. Although each load module's code is assigned a unique SID, its data is placed at a process-dependent offset within the process' single private data space. In order to share a load module between multiple processes, all references to the global data area must be relative to the base register, DP (data pointer). The value of DP is stored in a load module's entry in the XRT. The XRT will be discussed and diagramed in detail below.

An external call uses the same code sequence as a local call, with the addition of a "calling stub" (a.k.a. Import Stub) attached to the calling code, and a "called stub" (a.k.a. Export Stub) attached to the called code. Unlike a local call, execution is not transferred directly from the caller to the callee; in an external call, a millicode sequence (CALLX) is employed to locate and branch to the target procedure. These stubs and the millicode sequence are discussed in greater detail below.

5.1.1 Requirements of an External Call on MPE XL

An external procedure call requires several extra steps in addition to those necessary for a local call, as outlined below:
  1. The base register pointing to the global data area (DP), the SID contained in sr4, and value in gr2 (RP) must all be saved in the caller's allocated frame area. This RP value is referred to as RP' to distinguish it from the usual RP value associated with a local call. The difference between RP and RP' is as follows:

    RP is renamed RP' during the calling stub execution, at which time RP becomes the location in the called stub to which the target procedure (callee) must return. Finally, RP' is renamed back to RP, and DP and the SID are restored.

  2. The called load module's entry in the XRT must be located.

  3. The short pointer from the XRT entry must be loaded into DP (this is the DP value for the called load module).

  4. The SID from the XRT entry must be loaded into sr4, and the offset of the callee must be obtained from the XRT.

  5. An external branch and link to the called procedure must be performed. (Actually, an external branch to the CALLX millicode routine that locates the called stub and a branch to the called stub of the callee, which then does a local branch and link to the callee. Stubs, CALLX, and linking will be discussed further in the section, Import/Export Stubs.)

5.1.2 Requirements of an External Return

In order to return from an external call, it is required that DP, sr4, and RP be restored. The saved value of RP' will be loaded into RP (gr2), and the return to the caller will be via that register. The DP and SID of the caller will be restored from the caller-save area.

5.1.3 Control Flow of an MPE XL External Call

Figure 5-1 shows a simplified external procedure call. It uses the same code sequence as the local, but a "calling stub" is attached to the calling code (the caller) and a "called stub" is attached to the entry point of the called code (the callee). Execution is transferred from the calling code to the called code by executing a millicode sequence (CALLX) which uses an XRT to locate and branch to the target procedure. Note: All of these elements of an external call are covered in more detail in the following subsections.

To read the diagram, follow the arrows and numbers, beginning with number 1.

Figure 5-1 Simplified External Procedure Call

[Figure 5-1]

5.1.4 Calling Code

The calling code generated by the compiler to perform a procedure call will be the same regardless of whether the call is local or external. If the linker locates the procedure being called within the current executable load module, it will make the call local by patching the BL instruction to directly reference the entry point of the procedure. If the linker is unable to locate the procedure, it will make the call external by attaching a calling stub to the calling code, and patch the BL instruction to branch to the stub.

Before the call, the calling code must save any caller-saves registers that contain active data. The parameter list for the callee is stored in the current stack frame between the register spill area and the frame marker. As in a local call, the parameter list is stored in reverse order, such that the first word is at SP-36, the second is at SP-40, etc. Note: the first four words of the parameter list are passed in registers, but the space for the argument list is allocated in the stack frame, even if the first four words are unused. Also, by default all parameters are passed in general registers, with the linker including any necessary relocation stubs. See Chapter 3, Parameter Relocation.

5.1.5 Called Code

The called code is responsible for allocating a new stack frame on the top of the stack (the frame must be 64-byte aligned); the actual size of the frame will be determined by the compiler, and will be the summation of:
  • The amount of space needed by the register allocator for the register spill area;

  • The amount of space needed for the local variables of the current procedure;

  • The amount of space needed to store the longest parameter list of any procedure called by this procedure; and

  • The 32-byte frame marker.

If this procedure is callable by a less-privileged procedure, each page of the stack frame must be PROBEd (a privilege-checking mechanism) before any information is stored into the frame. The PROBE instructions must be generated by the compiler. (This is not currently implemented).

When a procedure is entered, gr2 (RP) will contain the offset portion of the return address. Whether the procedure was called locally or externally, the two low order bits of RP will always contain the execution level of the caller. From the source code level it is difficult, if not impossible, to determine if RP is valid or if it has been stored into the stack frame. Therefore, compilers that support multiple privilege levels need to provide a mechanism for returning the execution level of the caller.

If the current procedure calls another procedure, RP must be saved sometime before the call, usually in the procedure entry sequence. The called code is responsible for insuring the validity of its own input. sr5, sr6, and sr7 can only be set by privileged code and, therefore, can be assumed to be correct at all times. In addition, DP, LP (linkage pointer), and sr4 are not changed during a local call, or they are set by the procedure call millicode during an external call and therefore may also be assumed to be correct. The value of SP and the parameters passed must be validated by the called code.

Only those fields necessary in the frame marker of the current procedure will contain valid data; others will be undefined. For example, during a local call, contents of the external return link pointer field will be undefined.

5.1.6 Import and Export Stubs

As previously mentioned, the compiler only generates a single type of procedure call (local), a characteristic that is made possible by the use of stubs. Stubs are pieces of code that are attached to the caller and/or callee that enable the original calling and called code to remain unchanged through the external call process. There are two types of stubs used in this procedure calling convention: Import (calling) and Export (called). These are defined here, and explained in detail in the following sections.
  1. Import Stub (Calling Stub) – a locally-linked stub that enables inter-module and OS/Subsystem calls to appear (to the compiler) as local calls. If the linker determines that a call is external, it will attach a calling stub to the procedure and will patch the BL (Branch and Link) instruction to branch to the stub. There is usually one calling stub for each procedure (and privilege level combination) referenced in the module (which can accommodate all calls to a specific procedure), but it is possible to have a separate stub for each CALL to a procedure.

  2. Export Stub (Called Stub) – enables a called routine to avoid the problem of having multiple return sequences (i.e. different for local and external calls). There is one called stub for each external procedure of a load module. Inter-module calls will enter the called stub, which in turn will enter the called procedure (callee). Thus, the callee can return to its called stub (which is local) rather than being concerned with the external return. The calling stubs can be generated by the linker (or obtained from a "stub library") and then linked to their respective routines.

5.1.7 Import Stubs

The import stub will load gr1 with a pointer to the procedure XRT table entry (XRT pointer) of the called procedure and then branch to an external procedure call millicode sequence. Since the location of the XRT for a load module may be different for separate executions of the load module, the XRT entry pointer will be computed in the import stub. The XRT entry pointer is computed by adding the XRT entry offset to the value of the LP, which is stored at DP-4, pointing to the base of the XRT for the current load module.

For permanently bound calls to the operating system, an import stub is not necessary; instead, the BL instruction in the call is replaced with a BLE instruction that branches to a system entry point branch table. This eliminates much of the linking that is normally performed when a load module is loaded. (This is currently not implemented.)

Although the external procedure call diagram (Figure 5-5) shows that DP, RP', and sr4 are saved by the CALLX millicode (see next section for discussion of CALLX), DP and RP' will actually be saved in the import stub, and sr4 will be copied to a general register in the import stub. This is done to eliminate two interlocks and fill a branch delay slot that would otherwise be left unused. The code sequence of the calling stub will be similar to that shown below:

  LDW         -4(DP),grl          ; Load LP
  STW         DP,-32(SP)          ; Save DP
  ADDIL *     L'XRToff,grl        ; Add XRT offset to LP
  LDO   *     R'XRToff(grl),grl   ;
  LDW         16(grl),gr20        ; Load address of CALLX
  STW         RP,-24(SP)          ; Save RP'
  BE          (sr7,gr20)          ; Branch to CALLX
  MFSP        sr4,gr21            ; Move sr4 to gr21

* Can be eliminated by the linker in cases where they would effectively be NOPs. In these cases the total size of the stub is padded to 8 words because unwind descriptors assume fixed-length stubs.

5.1.8 External Procedure Call Millicode (CALLX)

The CALLX millicode sequence is primarily a transition mechanism that facilitates the successful location and access of the desired external routine. The address of the CALLX routine is obtained from the XRT entry, and is assigned by the loader. Seven variations of CALLX are available, depending on the possible privilege promotions. CALLX is called from the calling stub, and operates as follows:
  1. Saves DP, RP', and sr4 (if necessary).

  2. Alters the privilege level if necessary (Gateway).

  3. Checks the XRT pointer to insure that it points to a valid XRT table entry.

  4. Loads the LP, DP, Offset and sr4 (of new procedure) values.

  5. Branches to the called stub in the external module.

5.1.9 Export Stubs

An export stub is used to allow the compiler to generate the same exit code sequence regardless of whether the procedure will be called locally or externally. If the linker determines that a procedure can be called from another load module, it will attach a called stub to the procedure. The stub will be the external entry and return point for the procedure. Local calls to the procedure will be unaffected.

Although the stub is the external entry point, its primary purpose is to be executed during an external procedure call exit/return. The stub is entered before the procedure so that RP can be set to the address of the stub, which will cause the local return in the procedure to exit to the stub. When the stub is executed during the return, it will restore DP and sr4, and return to the caller.

The stub executes at the caller's execution level.

The code sequence for the called stub will be similar to the following:


  BL         disp,gr2         ; Branch to local entry point
  DEP        gr31,31,2,gr2    ; Deposit caller's Exec. Level in link
  LDW        -28(SP),gr21     ; Restore sr4 (part 1)
  LDW       -24(SP),gr2       ; Restore return address (RP')
  MTSP      gr21,sr4          ; Restore sr4 (part 2)
  BE        0(sr4,gr2)        ; Branch back to caller
  LDW       -32(SP),gr27      ; Restore DP

5.1.10 Inter-Module Cross Reference Table (XRT)

The Cross Reference Table (XRT) is used to link the external procedure calls of a load module. Every process has an XRT area reserved from its process space (pointed to by sr5). The table contains a sub-table for each load module referenced during process execution. Each sub-table for a load module contains entries for all the procedures called by that load module. A sample XRT is shown in Figure 5-2 (in this example, the process has two load modules: 'A' and 'B'. 'A' calls procedures B1, B2 and B3; 'B' calls procedures A1 and A2).

5.1.11 The Layout of the XRT

One XRT might be visualized as shown in Figure 5-2. (This diagram corresponds to the calling situation described in the last sentence of the previous section.)

Figure 5-2 Layout of the XRT

[Figure 5-2]

5.1.12 Sub-table Header

Each sub-table of a load module in the XRT contains an eight word header used to locate unwind tables for the module. See Chapter 7, Stack Unwinding. The sub-table header, as shown in Figure 5-3, contains the following information:
  • (The first four words are presently undefined and are reserved for future use.)

  • The starting address of the Unwind table.

  • The starting address of the Linker Stub Unwind table.

  • The starting address of the Recover table.

  • The starting address of the Auxiliary Unwind table.

Figure 5-3 Sub-table Header

[Figure 5-3]

If a load module contains no external references, its sub-table in the XRT will contain only the header.

An entry for a procedure within a sub-table of a load module in the XRT (e.g. the entry for B1) is eight words long, and contains the following information:
  1. The SID of the module to which it belongs.

  2. The entry offset for the procedure. This is a 32-bit offset, and is the address of the entry point (relative to the base of the SID of its load module) of the procedure's called stub. The last two bits (30 and 31) of this word must be zero in order to insure word alignment of the address.

  3. The DP value for the load module to which it belongs (the value of the base register pointing to the load module's global data area).

  4. The LP value of the module in which the called procedure is contained. This is a pointer to the beginning of the XRT sub-table of that load module.

  5. The address of the CALLX millicode routine.

  6. 7. 8. (These three words are presently undefined and are reserved for future use.)

The XRT entry for procedure B1 (which is called from A and would appear in A's sub-table of the XRT) is shown in Figure 5-4.

Figure 5-4 One XRT Entry

[Figure 5-4]

** The last two bits of the Entry Point Offset must be set to zero in order to ensure word alignment.

5.1.13 Linkage Pointer

A single value, the Linkage Pointer, resides in the word directly below the global data area of a load module, at the location pointed to by DP-4. This pointer is private to the load module, and is a short pointer to the beginning of the load module's XRT sub-table. An entry for a called procedure in the XRT is pointed to as follows:
  1. The LP points to the beginning of the sub-table in the XRT of the load module containing the called procedure.

  2. The import stub for the caller has the offset to the called procedure's entry relative to the XRT sub-table of the caller's load module. This offset, added to the LP value, provides a pointer to the called procedure's entry. This LP-relative XRT offset is assigned by the linker.

  3. The reason for the indirection employed by using the LP is that load modules can be shared by different processes whose XRTs may also be different. To allow the same code to reference the same load module in different processes' XRTs, it is necessary to provide a uniform interface to the XRT entries; this is provided by the LP.

In addition to the XRT area in the process space, there is an XRT area in the system space (pointed to by sr7) that is reserved for the XRTs of system load modules. Like any other load module, a system load module also uses LP to locate its XRT. The system XRT area can also contain a special XRT that is used for calling system procedures by intrinsic number.

5.1.14 System Security and the XRT

The XRT will be set up by the loader. The values in the XRT will be supplied by the loader, based on mapping the files relevant to the process into virtual memory (i.e. SID allocation, the data offsets in sr5 space, etc.). The linker may provide some of the values that are to be contained in sr5, based on the information it may have at link time concerning the specific load modules that are involved in the process' executable image.

When a process is loaded, the loader will protect all the pages in the XRT to read level 3, write level 0. Although it is not necessary, the process protection ID will be assigned to the pages of the process XRT area. Since the LP is located with the process' global data area, its protection is the same as that load module's global data area. It is not necessary to validate LP because it will be set by CALLX at every external call or privilege level change.

The XRT of every process and the system XRT must be at the same offset of their corresponding quadrant, and every XRT must be the same length. These two restrictions allow the procedure call millicode (CALLX) to use a very simple masking algorithm to perform bounds checking on any XRT pointer used with an external call. The location and size of the area can be changed when the system is restarted, but the new values must be reflected in the procedure call millicode (because it uses constant values to do bounds checking on the XRT entry pointer).

5.1.15 Interface Between Import and Export Stubs

The exact distribution of all operations between the import and export stubs, and whether the export stub uses a centralized system routine to accomplish these tasks, is not architected. Much more important, and specified in detail, is the work that the export stub expects to have been done before it is entered. Adhering to these requirements facilitates the loading of DP, LP, and SID (if desired) of the called load module. These requirements are as follows:
  1. gr1 must contain a pointer to the called procedure's XRT entry. (This is actually the called procedure's XRT entry in the sub-table of the calling load module.)

    Recall that the caller's LP points to the caller's XRT sub-table, which contains entries for all of the routines that may be called. The offset into that sub-table, which indexes to the called routine's entry, is bound as an immediate in the import stub. The pointer to the specific XRT entry is calculated as follows:

    (caller's LP value) + (offset to called procedure's entry) = (pointer to callee's entry in XRT sub-table of caller)

    This pointer is the value that should be found in gr1 when the export stub is entered.

  2. The SID of the called load module must be loaded into sr4. (The export stub is free to check the validity of that SID, to reload it, or to leave it as is. The important point is the assumption that it has already been loaded, and DOES NOT need to be checked.)

5.1.16 Summary of an External Procedure Call

Figure 5-5 shows a detailed picture of the flow of control associated with an external procedure call.

To read the diagram, begin at the upper left-hand corner ("Calling Code"), and read downward; whenever an arrow extends from a line, follow it, and continue downward in the box from the point where the arrow ends.

Figure 5-5 Summary of External Procedure Call

[Figure 5-5]

5.1.17 Dynamic Linking

Dynamic linking is the process of run-time linking to routines. Dynamic linking is required when the target procedure is unknown at compile time, or the target of a procedure call can change while the code is executing. Dynamic linking is carried out through explicit protocols (e.g. the HPGETPROCPLABEL intrinsic on MPE XL).

If the dynamically linked routine resides in a load module that has not yet been loaded, the load module is loaded dynamically. In order to dynamically load a load module, a global data area for it may need to be allocated in sr5 space. This data space is allocated by the loader, and may be allocated from any unused virtual space in sr5.

5.1.18 Procedure Labels

A procedure label is a specially-formatted variable that is used to link dynamic procedure calls. The format of a procedure label is shown below:

Procedure Label

[Procedure Label]

The X field in the address section of the procedure label is the XRT flag, which is used by compilers to determine if the procedure label is local (off) or external (on). In the case of a local procedure label, the address part will be a pointer to the entry point of the procedure, while in the external case, the address part will be a pointer to an XRT entry for the procedure.

For external procedure labels, the linker supplies an LP offset. Generated code must optionally add LP to this value to produce an XRT address.

The L field in the address part of the procedure label is the shared library flag (used only on HP-UX).

The C field is mentioned for completeness. It is only relevant when the X field is on and it is used to indicate a Compatibility Mode procedure label. CM plabels are never called directly from native code.

The following is an example of the code generated by compilers to obtain the address of a procedure (a procedure plabel):

  LDIL      L'func,1          ; get the address of the function
  LDO       R'func(1) ,31     ;
  EXTRU,=   31,31,1,19        ; check the X-field to determine
                              ;   if XRT address
  LDW       -4(0,27) ,19      ; if it's an XRT address
  ADD       31,19,20          ; add in LP

In the current implementation on MPE XL, the L-field is never turned "on" and is, therefore, effectively unused. In the future, either the specification or the implementation may change to use this field.

The dynamic procedure call millicode, $$dyncall, determines if a procedure label is local or external, and takes the appropriate action. (A local procedure label can only be used to call procedures within the current load module.) The following pseudo-code sequence demonstrates the process used for dynamic calls. Note the similarity between this sequence and the import stub sequence:

    IF (X-field in Plabel) = 0 THEN
      Branch Vectored using Plabel
    ELSE BEGIN
      Clear X-field;
      Save DP;
      Load address of CALLX;
      Save RP';
      Move sr4 to gr21;
      Branch to CALLX;
    END.

The X and L flags must be zero during an external call, or they will cause a misaligned data reference trap when accessing the XRT. (As mentioned earlier, the L flag is currently unused on MPE XL, so it is assumed to be zero.)

An external procedure label can be used in conjunction with the external procedure call millicode to call any procedure within the process or the operating system (subject to XLeast checking to insure adequate execution level). The procedure call millicode only uses the address part of the procedure label, but it may point to either the process or system XRT.

The intrinsic HPGETPROCPLABEL may be used to get an external procedure label for any level 1 procedure in a process. If the compiler or linker determines the need for an external label, it is communicated to the loader by a normal import request or an explicit call to HPGETPROCPLABEL.

Although a procedure label pointing to a system XRT entry is valid for all processes, it will be unloaded when its reference count drops to zero. Therefore, these procedure labels should not be considered as global procedure labels. The procedure HPGETSYSPLABEL will return a global procedure label for any procedure in a system load module, but it requires privilege level 1 to be called.




Chapter 5 Inter-Module Procedure Calls


5.2 External Calls on HP-UX