HPlogo HP-UX Process Management: White Paper > Chapter 1 Process Management

Context Switching

» 

Technical documentation

Complete book in PDF

 » Table of Contents

In a thread-based kernel, the kernel manages context switches between kernel threads, rather than processes. Context switching occurs when the kernel switches from executing one thread to executing another. The kernel saves the context of the currently running thread and resumes the context of the next thread that is scheduled to run. When the kernel preempts a thread, its context is saved. Once the preempted thread is scheduled to run again, its context is restored and it continues as if it had never stopped.

The kernel allows context switch to occur under the following circumstances:

  • Thread exits

  • Thread's time slice has expired and a trap is generated.

  • Thread puts itself to sleep, while awaiting a resource.

  • Thread puts itself into a debug or stop state

  • Thread returns to user mode from a system call or trap

  • A higher-priority thread becomes ready to run

If a kernel thread has a higher priority than the running thread, it can preempt the current running thread. This occurs if the thread is awakened by a resource it has requested. Only user threads can be preempted. HP-UX does not allow preemption in the kernel except when a kernel thread is returning to user mode.

In the case where a single process can schedule multiple kernel threads (1 x 1 and M x N), the kernel will preempt the running thread when it is executing in user space, but not when it is executing in kernel space (for example, during a system call).

The swtch() Routine

The swtch() routine finds the most deserving runnable thread, takes it off the run queue, and starts running it. The following figure shows the routines used by swtch().

Figure 1-38 process scheduling -- swtch()

[process scheduling -- swtch()]

Table 1-26 swtch() routines

RoutinePurpose
swidle()(asm_utl.c)Performs an idle loop while waiting to take action. Checks for a valid kt_link. On a uniprocessor machine without a threadlock thread, goes to spl7. Finds the thread's spu. Decrements the count of threads on run queues. Updates ndeactivated, nready_free, nready_locked in the mpinfo() structure.Removes the thread from its run queue. Restores the old spl level. Updates RTSCHED counts.
save()(resume.s)Routine called to save states.Saves the thread's process control block (pcb) marker
find_thread_my_spu()(pm_policy.c)For the current CPU, find the most deserving thread to run and remove the old. Search starts at bestq, an index into the table of run queues. When found, set up the new thread to run. Mark the interval timer in the spu's mpinfo.Set the processor state as MPSYS. Remove the thread from its run queue. Verify that it is runnable (kt_stat== TSRUN). Set the EIRR to MPSCHED_INT_ENABLE.
Set the thread context bit to TSRUNPROC to indicate the thread is running.
resume()(resume.s)Restores the register context from pcb and transfers control to enable the thread to resume execution.

 

Process and Processor Interval Timing

Timing intervals are used to measure user, system, and interrupt times for threads and idle time for processors. These measurements are taken and recorded in machine cycles for maximum precision and accountability. The algorithm for interval timing is described in pm_cycles.h.

Each processor maintains its own timing state by criteria defined in struct mpinfo, found in mp.h.

Table 1-27 Processor timing states

Timing statePurpose
curstateThe current state of the processor (spustate_t)
starttimeStart time (CR16) of the current interval
prevthreadpThread to attribute the current interval.
idlecyclesTotal cycles the SPU has spent idling since boot (cycles_t)

 

Processor states are shown in the next table.

Table 1-28 Processor states

SPU stateMeaning
SPUSTATE_NONEProcessor is booting and has not yet entered another state
SPUSTATE_IDLEProcessor is idle.
SPUSTATE_USERProcessor is in user mode
SPUSTATE_SYSTEMProcessor is in syscall() or trap.

 

Time spent processing interrupts is attributed to the running process as user or system time, depending on the state of the process when the interrupt occurred. Each time the kernel calls wakeup() while on the interrupt stack, a new interval starts and the time of the previous interval is attributed to the running process. If the processor is idle, the interrupt time is added to the processor's idle time.

State Transitions

A thread leaves resume(), either from another thread or the idle loop. Protected by a lock, the routine resume_cleanup() notes the time, attributes the interval to the previous thread if there was one or the processor's idle time if not, marks the new interval's start time, and changes the current state to SPUSTATE_SYSTEM.

When the processor idles, the routine swtch(), protected by a currently held lock, notes the time, attributes the interval to the previous thread, marks the new interval as starting at the noted time, and changes the current state to SPUSTATE_IDLE.

Figure 1-39 A user process makes a system call.

[A user process makes a system call]

A user process running in user-mode at (a) makes a system call at (b). It returns from the system call at (e) to run again in user-mode. Between (b) and (e) it is in running in system-mode. Toward the beginning of syscall() at (c), a new system-mode interval starts. The previous interval is attributed to the thread as user time. Toward the end of syscall() at (d), a new user-mode interval starts. The previous interval is attributed to the thread as system-time.

For timing purposes, traps are handled identically, with the following exceptions:

  • (c) and (d) are located in trap(), not syscall(), and

  • whether or not (d) starts a user- or system-mode interval depends on the state of the thread at the time of the trap.

Figure 1-40 An interrupt occurs

[An interrupt occurs]

Interrupts are handled much like traps, but any wakeup that occurs while on the interrupt stack (such as w1 and w2 in the figure above) starts a new interval and its time is attributed to the thread being awakened rather than the previous thread.

Interrupt time attributed to processes is stored in the kt_interrupttime field of the thread structure. Concurrent writes to this field are prevented because wakeup is the only routine (other than allocproc()) that writes to the field, and it only does so under the protection of a spinlock. Reads are performed (by pstat() and others) without locking, by using timecopy() instead.

Conceptually, the work being done is on behalf of the thread being awakened instead of the previously running thread.