Context Switching

In a thread-based kernel, the kernel manages context switches between kernel threads, rather than processes. Context switching occurs when the kernel switches from executing one thread to executing another. The kernel saves the context of the currently running thread and resumes the context of the next thread that is scheduled to run. When the kernel preempts a thread, its context is saved. Once the preempted thread is scheduled to run again, its context is restored and it continues as if it had never stopped.

The kernel allows context switch to occur under the following circumstances:

Thread exits
Thread's time slice has expired and a trap is generated.
Thread puts itself to sleep, while awaiting a resource.
Thread puts itself into a debug or stop state
Thread returns to user mode from a system call or trap
A higher-priority thread becomes ready to run

If a kernel thread has a higher priority than the running thread, it can preempt the current running thread. This occurs if the thread is awakened by a resource it has requested. Only user threads can be preempted. HP-UX does not allow preemption in the kernel except when a kernel thread is returning to user mode.

In the case where a single process can schedule multiple kernel threads (1 x 1 and M x N), the kernel will preempt the running thread when it is executing in user space, but not when it is executing in kernel space (for example, during a system call).

The `swtch()` Routine

The swtch() routine finds the most deserving runnable thread, takes it off the run queue, and starts running it. The following figure shows the routines used by swtch().

Figure 1-38 process scheduling -- swtch()

Table 1-26 swtch() routines

Routine	Purpose
`swidle()(asm_utl.c)`	Performs an idle loop while waiting to take action. Checks for a valid `kt_link`. On a uniprocessor machine without a threadlock thread, goes to `spl7`. Finds the thread's spu. Decrements the count of threads on run queues. Updates `ndeactivated, nready_free, nready_locked` in the `mpinfo()` structure.Removes the thread from its run queue. Restores the old `spl` level. Updates `RTSCHED` counts.
`save()(resume.s)`	Routine called to save states.Saves the thread's process control block (`pcb`) marker
`find_thread_my_spu()(pm_policy.c`)	For the current CPU, find the most deserving thread to run and remove the old. Search starts at `bestq`, an index into the table of run queues. When found, set up the new thread to run. Mark the interval timer in the spu's `mpinfo`.Set the processor state as `MPSYS`. Remove the thread from its run queue. Verify that it is runnable (`kt_stat== TSRUN`). Set the EIRR to `MPSCHED_INT_ENABLE.` Set the thread context bit to `TSRUNPROC` to indicate the thread is running.
`resume()(resume.s)`	Restores the register context from `pcb` and transfers control to enable the thread to resume execution.

Process and Processor Interval Timing

Timing intervals are used to measure user, system, and interrupt times for threads and idle time for processors. These measurements are taken and recorded in machine cycles for maximum precision and accountability. The algorithm for interval timing is described in pm_cycles.h.

Each processor maintains its own timing state by criteria defined in struct mpinfo, found in mp.h.

Table 1-27 Processor timing states

Timing state	Purpose
`curstate`	The current state of the processor (`spustate_t`)
`starttime`	Start time (`CR16`) of the current interval
`prevthreadp`	Thread to attribute the current interval.
`idlecycles`	Total cycles the SPU has spent idling since boot (`cycles_t`)

Processor states are shown in the next table.

Table 1-28 Processor states

SPU state	Meaning
`SPUSTATE_NONE`	Processor is booting and has not yet entered another state
`SPUSTATE_IDLE`	Processor is idle.
`SPUSTATE_USER`	Processor is in user mode
`SPUSTATE_SYSTEM`	Processor is in `syscall()` or trap.

Time spent processing interrupts is attributed to the running process as user or system time, depending on the state of the process when the interrupt occurred. Each time the kernel calls wakeup() while on the interrupt stack, a new interval starts and the time of the previous interval is attributed to the running process. If the processor is idle, the interrupt time is added to the processor's idle time.

State Transitions

A thread leaves resume(), either from another thread or the idle loop. Protected by a lock, the routine resume_cleanup() notes the time, attributes the interval to the previous thread if there was one or the processor's idle time if not, marks the new interval's start time, and changes the current state to SPUSTATE_SYSTEM.

When the processor idles, the routine swtch(), protected by a currently held lock, notes the time, attributes the interval to the previous thread, marks the new interval as starting at the noted time, and changes the current state to SPUSTATE_IDLE.

Figure 1-39 A user process makes a system call.

A user process running in user-mode at (a) makes a system call at (b). It returns from the system call at (e) to run again in user-mode. Between (b) and (e) it is in running in system-mode. Toward the beginning of syscall() at (c), a new system-mode interval starts. The previous interval is attributed to the thread as user time. Toward the end of syscall() at (d), a new user-mode interval starts. The previous interval is attributed to the thread as system-time.

For timing purposes, traps are handled identically, with the following exceptions:

(c) and (d) are located in trap(), not syscall(), and
whether or not (d) starts a user- or system-mode interval depends on the state of the thread at the time of the trap.

Figure 1-40 An interrupt occurs

Interrupts are handled much like traps, but any wakeup that occurs while on the interrupt stack (such as w1 and w2 in the figure above) starts a new interval and its time is attributed to the thread being awakened rather than the previous thread.

Interrupt time attributed to processes is stored in the kt_interrupttime field of the thread structure. Concurrent writes to this field are prevented because wakeup is the only routine (other than allocproc()) that writes to the field, and it only does so under the protection of a spinlock. Reads are performed (by pstat() and others) without locking, by using timecopy() instead.

Conceptually, the work being done is on behalf of the thread being awakened instead of the previously running thread.

Context Switching

Technical documentation

» Table of Contents

The `swtch()` Routine

Process and Processor Interval Timing

State Transitions

Context Switching

Technical documentation

» Table of Contents

The swtch() Routine

Process and Processor Interval Timing

State Transitions

The `swtch()` Routine