sched_yield() is misbehaving.

the current implementation does the following to 'give up' the CPU: - it decreases its priority by 1 until it reaches the lowest level - it queues the task to the end of the priority queue this scheme works fine in most cases, but if sched_yield()-active tasks are mixed with CPU-using processes then it's quite likely that the CPU-using process is in the expired array. In that case the yield()-ing process only requeues itself in the active array - a true context-switch to the expired process will only occur once the timeslice of the yield()-ing process has expired: in ~150 msecs. This leads to the yield()-ing and CPU-using process to use up rougly the same amount of CPU-time, which is arguably deficient. i've fixed this problem by extending sched_yield() the following way: + * There are three levels of how a yielding task will give up + * the current CPU: + * + * #1 - it decreases its priority by one. This priority loss is + * temporary, it's recovered once the current timeslice + * expires. + * + * #2 - once it has reached the lowest priority level, + * it will give up timeslices one by one. (We do not + * want to give them up all at once, it's gradual, + * to protect the casual yield()er.) + * + * #3 - once all timeslices are gone we put the process into + * the expired array. + * + * (special rule: RT tasks do not lose any priority, they just + * roundrobin on their current priority level.) + */

sched_yield() is misbehaving.
the current implementation does the following to 'give up' the CPU: - it decreases its priority by 1 until it reaches the lowest level - it queues the task to the end of the priority queue this scheme works fine in most cases, but if sched_yield()-active tasks are mixed with CPU-using processes then it's quite likely that the CPU-using process is in the expired array. In that case the yield()-ing process only requeues itself in the active array - a true context-switch to the expired process will only occur once the timeslice of the yield()-ing process has expired: in ~150 msecs. This leads to the yield()-ing and CPU-using process to use up rougly the same amount of CPU-time, which is arguably deficient. i've fixed this problem by extending sched_yield() the following way: + * There are three levels of how a yielding task will give up + * the current CPU: + * + * #1 - it decreases its priority by one. This priority loss is + * temporary, it's recovered once the current timeslice + * expires. + * + * #2 - once it has reached the lowest priority level, + * it will give up timeslices one by one. (We do not + * want to give them up all at once, it's gradual, + * to protect the casual yield()er.) + * + * #3 - once all timeslices are gone we put the process into + * the expired array. + * + * (special rule: RT tasks do not lose any priority, they just + * roundrobin on their current priority level.) + */
18cb13a6 · Ingo Molnar · 6986d71d · 18cb13a6
Commit 18cb13a6 authored Jun 18, 2002 by Ingo Molnar
Hide whitespace changes
Inline Side-by-side

Showing with 25 additions and 15 deletions

kernel/sched.c kernel/sched.c +25 -15

No files found.
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1398,25 +1398,35 @@ asmlinkage int sys_sched_getaffinity(pid_t pid, unsigned int len,
 asmlinkage long sys_sched_yield(void)
 {
-	runqueue_t *rq;
+	runqueue_t *rq = rq_lock(rq);
-	prio_array_t *array;
+	prio_array_t *array = current->array;
-	rq = rq_lock(rq);
 	/*
-	 * Decrease the yielding task's priority by one, to avoid
+	 * There are three levels of how a yielding task will give up
-	 * livelocks. This priority loss is temporary, it's recovered
+	 * the current CPU:
-	 * once the current timeslice expires.
 	 *
-	 * If priority is already MAX_PRIO-1 then we still
+	 *  #1 - it decreases its priority by one. This priority loss is
-	 * roundrobin the task within the runlist.
+	 *       temporary, it's recovered once the current timeslice
-	 */
+	 *       expires.
-	array = current->array;
+	 *
-	/*
+	 *  #2 - once it has reached the lowest priority level,
-	 * If the task has reached maximum priority (or is a RT task)
+	 *       it will give up timeslices one by one. (We do not
-	 * then just requeue the task to the end of the runqueue:
+	 *       want to give them up all at once, it's gradual,
+	 *       to protect the casual yield()er.)
+	 *
+	 *  #3 - once all timeslices are gone we put the process into
+	 *       the expired array.
+	 *
+	 *  (special rule: RT tasks do not lose any priority, they just
+	 *  roundrobin on their current priority level.)
 	 */
-	if (likely(current->prio == MAX_PRIO-1 || rt_task(current))) {
+	if (likely(current->prio == MAX_PRIO-1)) {
+		if (current->time_slice <= 1) {
+			dequeue_task(current, rq->active);
+			enqueue_task(current, rq->expired);
+		} else
+			current->time_slice--;
+	} else if (unlikely(rt_task(current))) {
 		list_del(&current->run_list);
 		list_add_tail(&current->run_list, array->queue + current->prio);
 	} else {