We have already learned a lot of details about the Linux scheduler inner workings, so there is not so much left for us. To make the whole picture complete in this chapter we will take a look at 2 important scheduler entry points:
- scheduler_tick() function, which is called at each timer interrupt.
- schedule() function, which is called each time when the current task needs to be rescheduled.
The third major thing that we are going to investigate in this chapter is the concept of context switch. A context switch is the process that suspends the current task and runs another task instead - this process is highly architecture specific and closely correlates with what we have been doing when working with RPi OS.
This function is important for 2 reasons:
- It provides a way for the scheduler to update time statistics and runtime information for the current task.
- Runtime information then is used to determine whether the current task needs to be preempted, and if so
schedule()
function is called.
As well as most of the previously explored functions, scheduler_tick
is too complex to be fully explained - instead, as usual, I will just highlight the most important parts.
-
The main work is done inside CFS method task_tick_fair. This method calls entity_tick for the
sched_entity
corresponding to the current task. When looking at the source code, you may be wondering why instead of just callingentry_tick
for the currentsched_entry
,for_each_sched_entity
macro is used instead?for_each_sched_entity
doesn't iterate over allsched_entry
in the system. Instead, it only traverses thesched_entry
inheritance tree up to the root. This is useful when tasks are grouped - after updating runtime information for a particular task,sched_entry
corresponding to the whole group is also updated. -
entity_tick does 2 main things:
-
Calls update_curr, which is responsible for updating task's
vruntime
as well as runqueue'smin_vruntime
. An important thing to remember here is thatvruntime
is always based on 2 things: how long task has actually been executed and tasks priority. -
Calls check_preempt_tick, which checks whether the current task needs to be preempted. Preemption happens in 2 cases:
- If the current task has been running for too long (the comparison is made using normal time, not
vruntime
). link - If there is a task with smaller
vruntime
and the difference betweenvruntime
values is greater than some threshold. link
In both cases the current task is marked for preemption by calling resched_curr function.
- If the current task has been running for too long (the comparison is made using normal time, not
We have already seen in the previous chapter how calling resched_curr
leads to TIF_NEED_RESCHED
flag being set for the current task and eventually schedule
being called.
That's it about schedule_tick
now we are finally ready to take a look at the schedule
function.
We have already seen so many examples were schedule
is used, so now you are probably anxious to see how this function actually works. You will be surprised to know that the internals of this function are rather simple.
- The main work is done inside __schedule function.
__schedule
calls pick_next_task which redirect most of the work to the pick_next_task_fair method of the CFS scheduler.- As you might expect
pick_next_task_fair
in a normal case just selects the leftmost element from the red-black tree and returns it. It happens here. __schedule
calls context_switch, which does some preparation work and calls architecture specific __switch_to function, where low-level arch specific task parameters are prepared to the switch.__switch_to
first switches some additional task components, like, for example, TLS (Thread-local Store) and saved floating point and NEON registers.- Actual switch takes place in the assembler cpu_switch_to function. This function should be already familiar to you because I copied it almost without any changes to the RPi OS. As you might remember, this function switches callee-saved registers and task stack. After it returns, the new task will be running using its own kernel stack.
Now we are done with the Linux scheduler. The good thing is that it appears to be not so difficult if you focus only on the very basic workflow. After you understand the basic workflow you probably might want to to make another path through the schedule code and pay more attention to the details, because there are so many of them. But for now, we are happy with our current understanding and ready to move to the following lesson, which describes user processes and system calls.