PA-RISC Procedure Calling Conventions Reference Manual > Appendix A Standard Procedure Calls

A.5 Code Description

	MPE documents
	Complete PDF
	Table of Contents

The relevant assembler directives are summarized in Appendix B, and the other compiler-generated information is briefly explained following the code comments.

(Numbers below correspond to those accompanying the blocks of code; they appear in the order in which they would be executed. In other words, the code documentation follows the program flow of control. The code listed in section A.4 does not adhere to the requirement that stack frames be multiples of 64 bytes in size.)

The beginning of the main program block (note that the main program is handled in much the same manner as a standard procedure). Because other procedures will be subsequently called, it is necessary to store the Return Address and allocate a stack frame. The Return Pointer (RP), which is currently in gr2, is first stored onto the stack at SP-20, and then SP (gr30) is incremented (by 64 bytes) in order to create the new frame. A zero value is stored in the previous SP field of the frame marker in order to signify the termination point for stack unwinding. (In the compiled C code, this initialization would not appear because the outer block is handled differently.)
CALL to the procedure one. The return pointer (RP), which is the address of the second instruction following the BL, is put into gr2. The delay slot (i.e. the instruction following the branch) is NOP because there is no operation that the compiler could have inserted there.
ENTRY to procedure one. Again, this is a non-leaf procedure, so it is necessary to store RP onto the stack at SP-20 and then allocate a new frame by incrementing SP (this time the increment is 64 bytes in order to accommodate the local variables).
The immediate values 5 and 10 are loaded into gr22 and gr1 respectively, and these registers are stored onto the stack at SP-60 and SP-64. This block correspond to statements a:=5 and b:=10.
Loading arguments (into caller-saves registers). This can be divided into three categories. First, the values stored on the stack at SP-60 and SP-64 (corresponding to a and b) are loaded into arg0 and arg1 (gr26 and gr25). Second, the addresses SP-68 and SP-72 are loaded into arg2 and arg3 (gr24 and gr23). Corresponding to variables c and d, these two arguments are loaded with addresses rather than actual values due to the fact that they are being passed by reference (i.e. VAR parameters). Third, the values stored on the stack at SP-76 and SP-80 (corresponding to e and f) are loaded into gr31 and gr19 respectively (these two are serving as scratch registers), and then stored onto the stack at SP-52 and SP-56. Note that these two parameters must be stored onto the stack because the argument registers have already been filled. (In the compiled FORTRAN code, it would become evident that all parameters are passed by reference, as in the Second category above, as is dictated by the FORTRAN language.)
CALL to procedure proca. Note that the .CALL directive is followed by a note indicating that arguments will be passed to the procedure in gr23-26. The delay slot is filled with a NOP, although it could have been filled with another operation (e.g. one of the preceding STW or LDW instructions). As with all BL instructions, the return address is simultaneously loaded into gr2 (or gr31 for millicode).
ENTRY to procedure proca. As before, this is a non-leaf procedure, so it is necessary to store RP at (SP-20) and allocate an additional stack frame by incrementing SP (64 bytes in this case).
The values held in the four argument registers (gr26-23) are stored onto the stack in the fixed arguments area of the PREVIOUS (caller's) frame. This is determined by subtracting the size of the current frame (48 bytes) from the offset (84, 88, ..), and using the result as the offset into the previous frame. These words correspond to the parameters a through d. (Note that these Store operations are actually unnecessary, and would probably be removed by the optimizer.)
The words at SP-84 and SP-88 (parameters a and b) are loaded into gr1 and gr31 respectively and the add operation (a+b) is performed, with the result being put into gr19. After SP-92 (which contains the address of c) is loaded into gr20, the gr19 value is stored at that address.
CALL to function mul. After the two parameters (a and b) are loaded into arg0 and arg1 (gr26 and gr25), the branch is made to mul, and the return address is put into gr2. Note that in this case, the delay slot is filled with an operation (the loading of the b value).
ENTRY to function mul and CALL to millicode routine muloI. Although frame is allocated because the function return value will later be temporarily stored onto the stack, but RP is not stored onto the stack because no additional procedure calls will be made. (This temporary frame is actually unnecessary, and would be removed by the optimizer.) The two arguments (a and b, in arg0 and arg1 are stored onto the stack, and then reloaded into registers in order to be sent to the millicode routine that will perform the multiply operation. Then the branch is made to the millicode routine ($$muloI), with the return address being stored in gr31 (MRP).
This code could be further optimized by accessing the arguments directly from the registers in which they enter mul, thereby eliminating the argument stores and loads.
The millicode return value (in gr29) is stored onto the stack, and subsequently loaded into gr28, which is the procedure return register (ret0). This sequence would probably be optimized to be a simple COPY 29,28 instruction.
EXIT from function mul. Deallocate the local frame, and return back to the caller (proca). The BV (Branch Vectored) instruction, which also uses gr2 as the return pointer, accomplishes this return.
RETURN from mul to proca, The value stored at SP-96 (the address of d) is loaded into gr21, and then the return value (in gr28) is stored at that address.
EXIT from procedure proca. The return address is loaded into gr2 from the RP field of the Previous frame, and the branch is made to that address. The delay slot is filled with the instruction that deallocates the local frame by decrementing SP.
RETURN from proca to one. The values stored at SP-68 and SP-72 (the current values of c and d) are loaded into gr20 and gr21, and the add operation (c+d) is performed, with the result being put into gr22. This result is then stored onto the stack at SP-76, which is the location assigned to e. Finally, the value stored in the e word is reloaded (into grl), and then stored into SP-80, which is the location of f. (This is the f:=e operation.)
EXIT from one. The return address is taken from its memory location (SP-100) and loaded into gr2, the local frame is deallocated by decrementing SP, and the branch is taken to the return point in the main program.
EXIT from main program. The return address (i.e. to the system) is loaded into gr2, the local frame is deallocated by decrementing SP, and the branch is made to the system address.

A.4 Assembly Listing

A.6 Other Compiler-Generated Information

A.5 Code Description

MPE documents