linux_rt_issuesx - do my homework

reelingripehalfSoftware and s/w Development

Dec 14, 2013 (3 years and 10 months ago)

80 views


Week 11

Linux Internals

Processes, scheduling

Lecture organization


Kernel Structure


Process structure


Process creation


Signals


Process management,
scheduling


Linux for embedded systems

References

Linux
Internals (to
the power of
-
1)

Simone
Demblon
, Sebastian
Spitzner

http://
www.tutorialized.com/view/tutorial/Linux
-
kernel
-
internals
-
from
-
Process
-
Birth
-
to
-
Death/40955

Linux Knowledge Base and Tutorial

http
://
linux
-
tutorial.info/modules.php?name=MContent&pageid=224


Inside
the Linux
scheduler IBM Developer Works

http://www.ibm.com/developerworks/linux/library/l
-
scheduler
/




The Linux Kernel

A
Unix kernel fulfills 4 main management tasks:

• Memory management


Process management


File
system management

• IO
management

For our study of real time implications, we will
examine Process management

A Rather brief look at the kernel








Part 1

Process structure

Process data structure

A

process is represented by a rather large structure called
task_struct
.


It contains
all of the necessary data to represent the
process, along with
data
for accounting and to maintain
relationships with other processes (parents and children
).



The actual structure of
the
task_struct

is
many pages long


A short sample of
task_struct

is shown in
the next slide


Task_struct

Pointers to open files

Memory map

Signals: received, masked

Register contents

Everything defining the state of the
computation





Task_struct

detail

The sample contains
the state of execution, a stack, a set of flags, the parent process,
the thread of execution (of which there can be many), and open
files.



The
state
variable: the
state of the task.

Typical states:

the
process is running

I
n
a run queue about to be running (TASK_RUNNING)
,

sleeping
(TASK_INTERRUPTIBLE),

sleeping
but unable to be woken up (TASK_UNINTERRUPTIBLE),

stopped
(TASK_STOPPED), or a few others
.



Flags: the
process is being created (PF_STARTING) or exiting (PF_EXITING), or
currently
allocating memory (PF_MEMALLOC).


The
comm

(command)
field:
t
he
name of the
executable


Priority:
(called
static_prio
).
T
he
actual priority
is
determined dynamically based on
loading and other factors.

More

Task_struct


The tasks field
is a linked
-
list


mm
and
active_mm

fields
The
process's address
space. mm
represents the process's memory
descriptors, while the
active_mm

is the previous
process's memory
descriptors



T
he
thread_struct

identifies the stored state of
the
process: The CPU state (
hardware registers,
program
counter, etc.).

Part 2

Process creation

System call
functions

user
-
space tasks and kernel tasks,
rely
on a function
called
do_fork

to create the new process.


In
the case of creating a kernel thread, the kernel calls a
function called
kernel_thread



In
user
-
space, a program calls fork, which results in a
system call to the kernel function called
sys_fork




The function relationships are shown graphically in
the
next slide.


Function
hierarchy for
process creation





do_fork



The
do_fork

function begins with a call to
alloc_pidmap
,
which allocates a new PID.


The
do_fork

function then
calls
copy_process
, passing the
flags, stack, registers, parent process, and newly allocated
PID
.


The
copy_process

function is where the new process is
created as a copy of the parent. This function performs all
actions except for starting the
process

Copying the process

Next,
dup_task_struct

allocates
a new
task_struct

and copies the current
process's descriptors into it.


After
a new thread stack is set up,
control
returns to
copy_process
.
A
sequence of copy functions is then invoked
to copy, open
file
descriptors,
signal information, process
memory
and
finally
the thread.


The new task is then assigned to a
processor.
T
he
priority of the new process
inherits the priority of the
parent,
and control returns to
do_fork
. At this
point, your new process exists but is not yet running.


The
do_fork

function fixes this with a call to
wake_up_new_task
. This
function places
the new process in a run queue, then wakes it up for
execution.


Finally
, upon returning to
do_fork
, the PID value is returned to the caller and
the process is complete.


Part 3

Process management

Process
Management

A
program is an executable
file,
whereas a process
is a
running program
.

A process is an instance of a
program
in memory,
executing instructions on
the processor
.

The only way to run a program on a Unix/Linux
system is to request the kernel
to execute
it via an
exec() system call.

Remember that the only things that can make
system calls are processes (
binary programs
that
are executing.)

Scheduler

The
scheduler
manages
processes and the
fair
distribution
of CPU time between them
.


The
kernel classifies processes as being in one of
two possible queues at any
given time
: the sleep
queue and the run queue.

Run
Queue

Processes
in the run queue compete for access to the CPU.


The
processes in the run queue compete for the processor(s). It is the
schedulers'
job to
allocate a time slice to each process, and to let each
process run on the
processor for
a certain amount of time in turn
.


Each time slice is so short (fractions of a second
) that each
process in the run
queue gets
to run often every
second.
I
t
appears as though all of these
processes
are "running
at the same time". This is called round robin
scheduling
.


On
a uniprocessor system, only one process can
execute instructions
at any
one time.


Only
on a multiprocessor system can
true multiprocessing
occur, with more
than one process (as many as there are
CPUs) executing
instructions
simultaneously.



Classes
of scheduling

There are different classes of scheduling besides
round
-
robin.


An
example would
be real
-
time scheduling.



Different Unix systems have different scheduling
classes and features, and Linux
is no
exception.

Sleep
queue

Processes waiting
for a resource
wait
on the sleep
queue.
These processes do not
take up a slot on the run queue


Once
the resource becomes
available, it is
reserved by

the process
, which is then moved back onto the run
queue to wait for a turn on
the processor.


E
very
process
waiting for a resource is placed on the
sleep queue, including new processes, whose resources
still have to be
allocated,
even if
they are readily
available.

Queue management







Time
-
slicing

Each
10
milli
-
seconds (This may change with the HZ
value) an Interrupt comes
on IRQ0
, which helps us
in a multitasking environment
.


The interrupt signal to the CPU comes from PIC
(Programmable
Interrupt Controller).


With each Time
-
slice
the current
process execution
is interrupted (without task switching
), and the
processor does housekeeping then the
current
process is re
-
activated unless a higher priority
process is waiting.

When does switching occur
?

Some
examples
are:

• When a Time Slice ends the scheduler gives access to
another process

• If needing a resource, the process will have to go back
into the sleep queue
to wait
for or to be given access to
that resource, and only then would it be ready
to be
scheduled access to the processor again.

• If we have a process waiting for information from
another process in the form
of piped
information. That
process would have to run before this process
can
continue
, so the other process would be given
priority
for
the processor.

Process
destruction

Process
destruction can be driven by several
events:

normal
process termination,

through
a
signal

through
a call to the exit function.

However
process exit is driven, the process ends
through a call to the kernel function
do_exit


This
process is shown graphically in
the next
slide


Function hierarchy for process destruction

do_exit

do_exit

removes
all references to the current process from the
operating system



The cycle of detaching the process from the various resources that it
attained during its life is performed through a series of calls, including
exit_mm

(to remove memory pages) to
exit_keys

(which disposes of
per
-
thread session and process security keys).


The
do_exit

function performs various accountings for the disposal of
the process, then a series of
notifications



Finally
, the process state is changed to PF_DEAD, and the schedule
function is called to select a new process to execute.

Time slice

When
a process's time slice has run out or for
some other reason another process gets to
run,
it is suspended (placed on the sleep or run
queue).


It’s state is stored in
task_struct
, so that when it
gets
a turn again.

This enables
the process
to
return to the exact place where it left off.

System Processes

In addition to user
processes, there
are system processes
running.


Some deal
with managing memory and scheduling turns on
the
CPU
. Others deal with delivering mail,
printing.


In
principle, both of these kinds of processes are identical.
However, system processes can run at much higher priorities
and therefore run more often than user processes.


Typically a system process of this kind is referred to as a
daemon process
because
they
ru
n

without user intervention.

Filesystems

Under Linux, files and directories are grouped
into units called
filesystems
.
Filesystems

exist
within a section of the hard disk called a
partition
.


Each
hard disk can be broken down into multiple
partitions and
a
filesystem

is created within the
partition. (Some
dialects of UNIX allow multiple
filesystems

within a partition
.)


The Life Cycle of Processes

http://
linux
-
tutorial.info/modules.php?name=MContent&pageid=84

From
the time a process is created with a fork() until it has completed its job
and disappears from the process table, it goes through many different states.
The state a process is in changes many times during its "life." These changes
can occur, for example, when the process makes a system call, it is someone
else's turn to run, an interrupt occurs, or the process asks for a resource that
is currently not available.


A commonly used model shows processes operating in one of six separate
states:

executing in user mode

executing in kernel mode

ready to run

sleeping

newly created, not ready to run, and not sleeping

issued exit system call (zombie)


Device Files


In
Linux
,
almost
everything is either treated as a file or is only accessed
through files.



For
example, to read the contents of a data file, the operating system must
access the hard disk. Linux treats the hard disk as if it were a file. It opens
it like a file, reads it like a file, and closes it like a file. The same applies to
other hardware such as tape drives and printers. Even memory is treated
as a file. The files used
to
access the physical hardware are
device files.



When the
Linux

wants
to access any hardware device, it first opens a file
that "points" toward that device (the device node). Based on information
it finds in the
inode
,
the operating system determines what kind of device
it is and can therefore access it in the proper manner. This includes
opening, reading, and closing, just like any other file.


Process states

State Diagram

State transitions

A newly created process enters the system in state 5
.


If
an exec() is made, then this process will end up in
kernel mode (2).


When
a process is running, an interrupt may be
generated (more often than not, this is the system clock)
and the currently running process is pre
-
empted (3).


This
is the same state as state 3 because it is still ready to
run and in main memory.

More Transitions

When the process makes a system call while in user mode (1), it moves
into state 2 where it begins to run in kernel mode.


Assume
at this point that the system call made was to read a file on
the hard disk. Because the read is not carried out immediately, the
process goes to sleep, waiting on the
event

that the system has read
the disk and the data is ready. It is now in state 4.


When
the data is ready, the process is awakened. This does not mean
it runs immediately, but rather it is once again ready to run in main
memory (3).


If a process that was asleep is awakened (perhaps when the data is
ready), it moves from state 4 (sleeping) to state 3 (ready to run). This
can be in either user mode (1) or kernel mode (2).


End

A process can end its life by either explicitly calling the exit() system call or
having it called for them.


The
exit() system call releases all the data structures that the process was
using.


One
exception is the slot in the process table, which is the responsibility of
the
init

process. T
he
slot in the process table is used for the exit code of the
exiting process. This can be used by the parent process to determine whether
the process did what it was supposed to do or whether it ran into problems.


The
process shows that it has terminated by putting itself into state 8, and it
becomes a "zombie." Once here, it can never run again because nothing
exists other than the entry in the process table.


This is why you cannot "kill" a zombie process. There is nothing there to kill.
To kill a process, you need to send it a signal (more on signals later). Because
there is nothing there to receive or process that signal, trying to kill it makes
little sense. The only thing to do is to let the system clean it up.


Access to a resource


P
rocesses waiting
for
a common resource all wait
on the same wait channel. All are woken up when
the
resource is ready
.
Only one process gets the
resource. The others put themselves back
to
sleep on the same wait channel.


When a process puts itself to sleep, it
voluntarily

gives
up the CPU. It may be that this process had
just started its turn when it
couldn’t access the
resource
it needed.

Signals

Signals
are a way of sending simple messages to
processes.


Most
of these messages are already defined and
can be found in <
linux
/
signal.h
>.
Most are
generated in the kernel



Signals
can only be processed when the process is
in user mode. If a signal has been sent to a process
that is in kernel mode, it is dealt with immediately
on returning to user mode.


Signals
-
2

Signals
are
used to signal asynchronous events to one or more
processes. A signal could be generated by a keyboard
interrupt or an error condition such as the process attempting
to access a non
-
existent location in its virtual memory.


Signals
are also used by the shells to signal job control
commands to their child processes.


There are a set of defined signals that the kernel can generate
or that can be generated by other processes in the system,
provided that they have the correct privileges.

Signals
-

3

Linux implements signals using information stored in the
task_struct

for the process.
The number of supported signals is limited to the word size of the processor.


The
currently pending signals are kept in the signal field with a mask of blocked signals
held in blocked. With the exception of SIGSTOP and SIGKILL, all signals can be blocked.


If
a blocked signal is generated, it remains pending until it is unblocked.


Linux
also holds information about how each process handles every possible signal
and this is held in an array of
sigaction

data structures pointed at by the
task_struct

for
each process. Amongst other things it contains either the address of a routine that will
handle the signal or a flag which tells Linux that the process either wishes to ignore
this signal or let the kernel handle the signal for it.


The
process modifies the default signal handling by making system calls and these calls
alter the
sigaction

for the appropriate signal as well as the blocked mask.


Problems with earlier Linux schedulers


Before the 2.6 kernel, the scheduler had a significant limitation when
many tasks were active. This was due to the scheduler being implemented
using an algorithm with O(n)
complexity.
In other words, the more tasks
(n) are active, the longer it takes to schedule a task. At very high loads, the
processor can be consumed with scheduling and devote little time to the
tasks themselves.



The pre
-
2.6 scheduler also used a single
runqueue

for all processors in a
symmetric multiprocessing system (SMP). This meant a task could be
scheduled on any processor
--

which can be good for load balancing but
bad for memory
caches



Finally,
preemption

wasn't possible in the earlier scheduler; this meant
that a lower priority task could execute while a higher priority task waited
for it to complete.

Introducing the Linux 2.6
scheduler

The
2.6 scheduler resolves the primary three issues found in
the earlier scheduler (O(n) and SMP scalability issues), as well
as other problems. Now we'll explore the basic design of the
2.6 scheduler
.


Major scheduling structures

Each
CPU has a
runqueue

made up of 140 priority lists that
are serviced in FIFO order.

Tasks
that are scheduled to execute are added to the end of
their respective
runqueue's

priority list.

Each
task has a time slice that determines how much time it's
permitted to execute.

The
first 100 priority lists of the
runqueue

are reserved for
real
-
time tasks, and the last 40 are used for user tasks (see
next slide).

Priority lists

expired



active



Runqueues

In addition to the CPU's
runqueue
, which is called the
active
runqueue
, there's also an expired
runqueue
.


When
a task on the active
runqueue

uses all of its time slice,
it's moved to the
expired
runqueue
. During the move, its time
slice is
recalculated.



If no tasks exist on the active
runqueue

for a given priority,
the pointers for the active and expired
runqueues

are
swapped, thus making the expired priority list the active one
.


The job of the scheduler is simple: choose the task on the
highest priority list to execute.

http://
www.linuxjournal.com/article/7477

By

Aseem

R.
Deshpande

With
the release of kernel 2.6, Linux now poses serious
competition to major RTOS
vendors in
the embedded
market space.


Linux
2.6 introduces many new features that make it an
excellent operating system for embedded computing.


Among
these new features are enhanced real
-
time
performance, easier porting to new computers, support
for large memory models, support for microcontrollers
and an improved I/O system.

Characteristics of Embedded
Systems


E
mbedded
systems often need to meet timing
constraints reliably.


,embedded
systems have access to far fewer resources
than does a normal PC. They have to squeeze
maximum value out of whatever is available.



.
The OS should perform reliably and efficiently, if
possible, under the cases of extreme load
.




If a crash occurs in one part of the module, it should
not effect other parts of the system. Furthermore,
recovery from crashes should be graceful.

How Linux 2.6 Satisfies the
Requirements


Although
Linux 2.6 is not yet a true real
-
time
operating system, it does contain
improvements that make it a worthier
platform than previous kernels when
responsiveness is desirable.


Three
significant improvements are
preemption

points in the kernel, an efficient
scheduler and improved synchronization.


Kernel
Preemption


As
with most general
-
purpose operating systems, Linux
always has forbidden the process scheduler from running
when a process is executing in a system call. Therefore,
once a task is in a system call, that task controls the
processor until the system call returns, no matter how long
that might take.



As
of kernel 2.6, the kernel is
preemptible
. A kernel task
now can be
preempted
, so that some important user
application can continue to run
.





In
Linux 2.6, kernel code has been laced with
preemption

points
, instructions that allow the scheduler to run and
possibly block the current process so as to schedule a
higher priority process.


Kernel
Preemption


Linux 2.6 avoids unreasonable delays in system calls by periodically
testing a
preemption

point. During these tests, the calling process
may block and let another process run.



Thus, under Linux 2.6, the kernel now can be interrupted mid
-
task,
so other applications can continue to run even when something
low
-
level and complicated is going on in the background.



Embedded software often has to meet deadlines that renders it
incompatible with virtual memory demand paging, in which slow
handling of page faults would ruin responsiveness.



To eliminate this problem, the 2.6 kernel can be built with no virtual
memory system. Of course, it then becomes the software
designer's responsibility to ensure enough real memory always is
available to get the job done.


An Efficient
Scheduler

The
process scheduler has been rewritten in the 2.6 kernel to
eliminate the slow algorithms of previous versions.


Formerly
, in order to decide which task should run next, the
scheduler had to look at each ready task and make a
computation to determine that task's relative importance.


After
all computations were made, the task with the highest
score would be chosen.


Because
the time required for this algorithm varied with the
number of tasks, complex multitasking applications suffered
from slow scheduling.

2.6 improvements

The scheduler in Linux 2.6 no longer scans all tasks every time. Instead, when
a task becomes ready to run, it is sorted into position on a queue, called the
current queue.


Then
, when the scheduler runs, it chooses the task at the most
favorable

position in the queue. As a result, scheduling is done in a constant amount of
time.


When
the task is running, it is given a time slice, or a period of time in which
it may use the processor, before it has to give way to another thread.


When
its time slice has expired, the task is moved to another queue, called
the expired queue.


The
task is sorted into this expired
queue
according to its priority. This new
procedure is substantially faster than the old one, and it works equally as well
whether there are many tasks or only a few in queue. This new scheduler is
called the O(1) scheduler.



New Synchronization
Primitives

Applications
involving the use of shared resources, such as shared memory or
shared devices, have to be developed carefully to avoid race conditions.


The
solution implemented in Linux, called
Mutex
, ensured that only one task
is using the resource at a time.
Mutex

involved a system call to the kernel to
decide whether to block the thread or allow it to continue executing.


But
when the decision is to continue, the time
-
consuming system call was
unnecessary. The new implementation in Linux 2.6 supports Fast User
-
Space
Mutexes

(
Futex
). These functions can check from user space whether
blocking is necessary and perform the system call to block the thread only
when it is required.


When
blocking is not required, avoiding the unneeded system call saves time.
It also supports setting priorities to allow applications or threads of higher
priority to have first access to the contested resource.

Improved Threading Model and Support for
NPTL


threading in
2.6 is based on a 1:1 threading model, one kernel thread for one user
thread. It also includes in
-
kernel support for the new Native
Posix

Threading
Library (NPTL).



NPTL
brings an eight
-
fold improvement over its predecessor. Tests conducted by its
authors have shown that Linux, with this new threading, can start and stop
100,000 threads simultaneously in about two seconds. This task took 15 minutes
on the old threading model.



Along with POSIX threads, 2.6 provides POSIX signals and POSIX high
-
resolution
timers as part of the mainstream kernel. POSIX signals are an improvement over
UNIX
-
style signals, which were the default in previous Linux releases. Unlike UNIX
signals, POSIX signals cannot be lost and can carry information as an argument.
Also, POSIX signals can be sent from one POSIX thread to another, rather than only
from process to process, like UNIX signals.



Embedded systems often need to poll hardware or do other tasks on a fixed
schedule. POSIX timers make it easy to arrange any task to be scheduled
periodically. The clock that the timer uses can be set to tick at a rate as fine as one
kilohertz, so software engineers can control the scheduling of tasks with precision.

Subarchitecture

Support


Hardware
designs in the embedded world often are customized for special
applications


For
example, a purpose
-
built board may use different IRQ management
than what a similar reference design uses
.



In order to run on the new board, Linux has to be ported or altered to
support the new hardware. This porting is easier if the operating system is
made of components that are well separated, making it necessary to
change only the code that has to change.


The
components of Linux 2.6 that are likely to be altered for a custom
design have been refactored with a concept called
Subarchitecture
.
Components are separated clearly and can be modified or replaced
individually with minimal impact on other components of the board
support package.



By formalizing Linux's support for the slightly different hardware types, the
kernel can be ported more easily to other systems, such as dedicated
storage hardware and other components that use industry
-
dominant
processor types.