The beginning of the main program block (note that the main
program is handled in much the same manner as a standard procedure).
Because other procedures will be subsequently called, it is necessary
to store the Return Address and allocate a stack frame. The Return
Pointer (RP), which is currently in gr2, is first stored onto the
stack at SP-20, and then SP (gr30) is incremented (by 64 bytes) in
order to create the new frame. A zero value is stored in the previous
SP field of the frame marker in order to signify the termination
point for stack unwinding. (In the compiled C code, this
initialization would not appear because the outer block is handled
differently.)
CALL to the procedure one. The return pointer (RP), which
is the address of the second instruction following the BL, is put
into gr2. The delay slot (i.e. the instruction following the branch)
is NOP because there is no operation that the compiler could have
inserted there.
ENTRY to procedure one. Again, this is a non-leaf procedure,
so it is necessary to store RP onto the stack at SP-20 and then
allocate a new frame by incrementing SP (this time the increment is
64 bytes in order to accommodate the local variables).
The immediate values 5 and 10 are loaded into gr22 and gr1
respectively, and these registers are stored onto the stack at SP-60
and SP-64. This block correspond to statements a:=5 and
b:=10.
Loading arguments (into caller-saves registers). This can be
divided into three categories. First, the values stored on the stack
at SP-60 and SP-64 (corresponding to a and b) are
loaded into arg0 and arg1 (gr26 and gr25). Second, the addresses
SP-68 and SP-72 are loaded into arg2 and arg3 (gr24 and gr23).
Corresponding to variables c and d, these two arguments
are loaded with addresses rather than actual values due to the fact
that they are being passed by reference (i.e. VAR parameters). Third,
the values stored on the stack at SP-76 and SP-80 (corresponding to
e and f) are loaded into gr31 and gr19 respectively
(these two are serving as scratch registers), and then stored onto
the stack at SP-52 and SP-56. Note that these two parameters must be
stored onto the stack because the argument registers have already
been filled. (In the compiled FORTRAN code, it would become evident
that all parameters are passed by reference, as in the Second
category above, as is dictated by the FORTRAN language.)
CALL to procedure proca. Note that the .CALL directive is
followed by a note indicating that arguments will be passed to the
procedure in gr23-26. The delay slot is filled with a NOP, although
it could have been filled with another operation (e.g. one of the
preceding STW or LDW instructions). As with all BL instructions, the
return address is simultaneously loaded into gr2 (or gr31 for
millicode).
ENTRY to procedure proca. As before, this is a non-leaf
procedure, so it is necessary to store RP at (SP-20) and allocate an
additional stack frame by incrementing SP (64 bytes in this
case).
The values held in the four argument registers (gr26-23) are
stored onto the stack in the fixed arguments area of the PREVIOUS
(caller's) frame. This is determined by subtracting the size of the
current frame (48 bytes) from the offset (84, 88, ..), and using the
result as the offset into the previous frame. These words correspond
to the parameters a through d. (Note that these Store
operations are actually unnecessary, and would probably be removed by
the optimizer.)
The words at SP-84 and SP-88 (parameters a and b)
are loaded into gr1 and gr31 respectively and the add operation
(a+b) is performed, with the result being put into gr19. After
SP-92 (which contains the address of c) is loaded into gr20,
the gr19 value is stored at that address.
CALL to function mul. After the two parameters (a
and b) are loaded into arg0 and arg1 (gr26 and gr25), the
branch is made to mul, and the return address is put into gr2.
Note that in this case, the delay slot is filled with an operation
(the loading of the b value).
ENTRY to function mul and CALL to millicode routine
muloI. Although frame is allocated because the function return
value will later be temporarily stored onto the stack, but RP is not
stored onto the stack because no additional procedure calls will be
made. (This temporary frame is actually unnecessary, and would be
removed by the optimizer.) The two arguments (a and b,
in arg0 and arg1 are stored onto the stack, and then reloaded into
registers in order to be sent to the millicode routine that will
perform the multiply operation. Then the branch is made to the
millicode routine ($$muloI), with the return address being
stored in gr31 (MRP).
This code could be further optimized by accessing the arguments
directly from the registers in which they enter mul, thereby
eliminating the argument stores and loads.
The millicode return value (in gr29) is stored onto the stack, and
subsequently loaded into gr28, which is the procedure return register
(ret0). This sequence would probably be optimized to be a simple
COPY 29,28 instruction.
EXIT from function mul. Deallocate the local frame, and
return back to the caller (proca). The BV (Branch Vectored)
instruction, which also uses gr2 as the return pointer, accomplishes
this return.
RETURN from mul to proca, The value stored at SP-96
(the address of d) is loaded into gr21, and then the return
value (in gr28) is stored at that address.
EXIT from procedure proca. The return address is loaded
into gr2 from the RP field of the Previous frame, and the branch is
made to that address. The delay slot is filled with the instruction
that deallocates the local frame by decrementing SP.
RETURN from proca to one. The values stored at
SP-68 and SP-72 (the current values of c and d) are
loaded into gr20 and gr21, and the add operation (c+d) is
performed, with the result being put into gr22. This result is then
stored onto the stack at SP-76, which is the location assigned to
e. Finally, the value stored in the e word is reloaded
(into grl), and then stored into SP-80, which is the location of
f. (This is the f:=e operation.)
EXIT from one. The return address is taken from its memory
location (SP-100) and loaded into gr2, the local frame is deallocated
by decrementing SP, and the branch is taken to the return point in
the main program.
EXIT from main program. The return address (i.e. to the system) is
loaded into gr2, the local frame is deallocated by decrementing SP,
and the branch is made to the system address.