HPlogo HP-UX Process Management: White Paper > Chapter 1 Process Management

Run Queues

» 

Technical documentation

Complete book in PDF

 » Table of Contents

A process must be on a queue of runnable processes before the scheduler can choose it to run.

Figure 1-27 Run queues of thread lists waiting to run

[Run queues of thread lists waiting to run]

Processes get linked into the run queue based on the process's priority, set in the process table. Run queues are link-listed in decreasing priority. The scheduler chooses the process with the highest priority to run for a given time-slice.

Each process is represented by its header on the list of run queue headers; each entry in the list of run queue headers points to the process table entry for its respective process.

The kernel maintains separate queues for system-mode and user-mode execution. System-mode execution takes precedence for CPU time. User-mode priorities can be preempted -- stopped and swapped out to secondary storage; kernel-mode execution cannot. Processes run until they have to wait for a resource (such as data from a disk, for example), until the kernel preempts them when their run time exceeds a time-slice limit, until an interrupt occurs, or until they exit. The scheduler then chooses a new eligible highest-priority process to run; eventually, the original process will run again when it has the highest priority of any runnable process.

When a timeshare process is not running, the kernel improves the process's priority (lowers its number). When a process is running, its priority worsens. The kernel does not alter priorities on real-time processes. Timeshared processes (both system and user) lose priority as they execute and regain priority when they do not execute

Figure 1-28 Run queue, showing priorities

[Run queue, showing priorities]

Run Queue Initialization

Run queues are initialized by the routine rqinit(), which is called from init_main.c after system monarch processor is established and before final kernel initialization.

rqinit examines all potential entries in the system global per-processor information structure (struct mpinfo), gets the run queue information and pointers to the linked list of running threads. It then clears the run queue data in bestq (an index into the array of run queue points which points to the highest priority non-empty queue), newavg_on_rq (the run queue average for the processor), nready_locked and nready_free (sums provided the total threads in the processor's run queues). rqinit then sets the current itimer value for all run queues, links the queue header as the sole element, and sets up the queue.

Next, the RTSCHED-related global run data structures are initialized with the global structure rtsched_info (defined in pm_rtsched.h), which describes the RTSCHED run queues.

Table 1-23 Entries in rtsched_info

EntryPurpose
rts_nreadyTotal number of threads on queues
rts_bestqHint of which queue to find threads
rts_numpriNumber of RTSCHED priorities
rts_rr_timesliceGlobal timeslice for SCHED_RR threads
*rts_timeslicepRound-robin timeslices for each priority (used by SCHED_RR2 threads)
*rts_qpPointer to run queues
*rts_lockSpinlock for the run queues

 

The tunable parameter rtsched_numpri determines how many run queues exist:

  • The minimum value allowed is 32, imposed by the POSIX.4 specification and defined as RTSCHED_NUMPRI_FLOOR.

  • The maximum supported value of 512 is a constant of the implementation, defined as RTSCHED_NUMPRI_CEILING and set equal to MAX_RTSCHED_PRI. If a higher maximum is required, the latter definition must be changed.

malloc() is called to allocate space for RTSCHED run queues. (rtsched_numpri * sizeof (struct mp_threadhd)) bytes are required. The resulting pointer is stored in rtsched_info.rts_qp.

Timeslice is checked to ensure that it is set to a valid value, which may be either -l (meaning no timeslicing) or positive integers. If it is invalid, it is set to the default, HZ/10. rtsched_info.rts_rr_timeslice is set to timeslice, which round-robins with that many clock ticks. For each of the rtsched_numpri run queues, the struct mp_threadhd header block is linked circularly to itself. Finally, a spinlock is allocated to lock the run queue.

Figure 1-29 Run queue initialization

[Run queue initialization]

Note, there is one RTSCHED run queue systemwide, though separate track is kept for each processor. The queue for given thread is based on how the scheduling policy is defined. One global set of run queues is maintained for RTSCHED (SCHED_FIFO, SCHED_RR, SCHED_RR2) threads. Run queues are maintained for each SPU for SCHED_TIMESHARE and SCHED_RTPRIO threads.

RTSCHED Run Queue

The following figure shows threads set to run at various RTSCHED priorities.

The global RTSCHED run queues are searched for the strongest (most deserving) thread to run; the best candidate is returned as a kthread_t. Each priority has one thread list. Any runnable thread may be in any thread list. Multiple scheduling policies are provided. Each nonempty list is ordered, and contains a head (th_link) at one end of its order and a tail (th_rlink) at the other.

  • rtsched_info.rts_qp points to the strongest RTSCHED queue.

  • rtsched_info.rts_bestq points to the queue to begin the search.

The search (by the routine find_thread_rtsched()) proceeds from rts_bestq downwards looking for non-empty run queues. When the first non-empty queue is found, its index is noted in the local first_busyq. All threads in that queue are checked to determine if they are truly runnable or blocked on a semaphore.

  • If there is a runnable thread, the rts_bestq value is updated to the present queue and a pointer to the thread found is returned to the caller.

  • If no truly runnable thread is found, threads blocked on semaphores are considered.

If first_busyq is set, the rts_bestq value is updated to it and the thread at the head of that queue is returned to the caller. If first_busyq did not get set in the loop, the routine panics, because it should be called only if rtsched_info.rts_nready is non-zero.

Figure 1-30 RTSCHED run queue detail

[RTSCHED run queue detail]

Although the threads scheduler is set to a default value of 32 (RTSCHED_NUMPRI_FLOOR), it can be expanded to a system limit of PRTSCHEDBASE (a value of 0).

Figure 1-31 Threads scheduler

[Threads scheduler]

The Combined SCHED_RTPRIO and SCHED_TIMESHARE Run Queue

As shown in the following figure, the SCHED_RTPRIO and SCHED_TIMESHARE priorities use the same queue.

The SCHED_RTPRIO and SCHED_TIMESHARE queue is searched with the same technique as the RTSCHED queue. The most deserving thread is found to run on the current processor. The search starts at bestq, which is an index into the table of run queues. There is one thread list for each priority. Any runable thread may be in any thread list. Multiple scheduling policies are provided. Each nonempty list is ordered, and contains a head (th_link) as one end of its order and a tail (th_rlink) as the other.

Figure 1-32 SCHED_RTPRIO and SCHED_TIMESHARE queue

[SCHED_RTPRIO and SCHED_TIMESHARE queue]

The mp_rq structure constructs the run queues by linking threads together. The structure qs is an array of pointer pairs that act as a doubly linked list of threads. Each entry in qs[] represents a different priority queue. sized by NQS, which is 160. The qs[].th_link pointer points to the first thread in the queue and the qs[].th_rlink pointer points to the tail.

RTPRIO Run Queue

Figure 1-33 SCHED_RTPRIO (HP-UX REAL TIME) run queue

[SCHED_RTPRIO (HP-UX REAL TIME) run queue]

Priorities 0 (highest realtime priority) through 127 (least realtime priority) are reserved for real time threads. The real time priority thread will run until it sleeps, exits, or is preempted by a higher priority real time thread. Equal priority threads will be run in a round robin fashion.

The rtprio(1) command may be used to give a thread a real time priority. To use the rtprio(1) command a user must belong in the PRIV_RTPRIO privilege group or be superuser (root). The priorities of real time threads are never modified by the system unless explicitly requested by a user (via a command or system call). Also a real time thread will always run before a time share thread.

The following are a few key points regarding a real-time thread:

  • Priorities are not adjusted by the kernel

  • Priorities may be adjusted by a system call

  • Real-time priority is set in kt_pri

  • The p_nice value has no effect

SCHED_TIMESHARE Run Queue

Figure 1-34 SCHED_TIMESHARE run queue

[SCHED_TIMESHARE run queue]

Timeshare threads are grouped into system priorities (128 through 177) and user priorities (178 through 255). The queues are four priorities wide. The system picks the highest priority timeshare thread, and lets it run for a specific period of time (timeslice). As the thread is running its priority decreases. At the end of the time slice, a new highest priority is chosen.

Waiting threads gain priority and running threads lose priority in order to favor threads that perform I/O and give lesser attention to compute-bound threads.

SCHED_TIMESHARE priorities are grouped as follows:

  • Real-time priority thread: range 0-127

  • Time-share priority thread: range 128-255

  • System-level priority thread: range 128-177

  • User-level priority thread: range 178-255

RTSCHED priority queues are one priority wide; timeshare priority queues are four priorities wide.