click link to review - Users Muohio

prettybadelyngeΛογισμικό & κατασκευή λογ/κού

18 Νοε 2013 (πριν από 3 χρόνια και 8 μήνες)

175 εμφανίσεις

Operating
Systems

Synchronization

Multi
-
processing/Multi
-
threading


Improve efficient use of computing resources


Non
-
interactive programs


Minimize time to complete a task by using multiple
cores to accelerate computation.


Video encoding


Web
-
server processing


Effectively overlap and I/O operations


Harder class of problems to deal with


Interactive programs


Permit long tasks to run in background without
impacting user experience


Essentially overlap CPU and I/O operations

Concurrency


Concurrency is the ability to perform
independent tasks simultaneously


A bit different that parallelism with requires two or
more compute units


A compute unit could be a CPU, Core, or GPU


However in most discussions the terms
concurrency and parallelism are used
interchangeably.


In this course we will intentionally not distinguish
between the two.

Broad classification of concurrency


Data parallelism


Each thread/process
performs same
computation


The data for each
thread/process is different


Task parallelism


Each thread/process
performs different
computations


The data for each
thread/process is the same

D1

D1’

t1

D2

D2’

t2

D3

Dn


t
n

•••

Input data is logically partitioned into
independent subsets.

Each thread process a subset of data
to generate logically independent
subset of outputs

D1’

t1

D2’

t2

D

Dn


t
n

•••

Same data but different processing

Each thread processes same data
but differently generating different
outputs

Data parallelism example

Part 1/2

#include

<thread>

#include

<vector>

#include

<algorithm>


bool

isPrime
(
const

int

num);


void

primeCheck
(
const

std::vector<
int
>& numbers,


std::vector<
bool
>& result,


const

int

startIdx
,
const

int

count) {


int

end = (
startIdx

+ count);


for
(
int

i

=
startIdx
; (
i

< end);
i
++) {


result[
i
] =
isPrime
(numbers[
i
]);


}

}


Data parallelism example

Part 2/2

void

primeCheck
(
const

std::vector<
int
>& numbers,
std::vector<
bool
>& result,
const

int

startIdx
,
const

int

count);


int

main() {


std::vector<
int
> numbers(10000);


std::vector<
bool
>
isPrime
(10000);


std::
generate_n
(
numbers.begin
(), 10000, rand);


// Create 10 threads to process subset of numbers.


// Each thread processes 1000 numbers.


std::vector<std::thread>
threadGroup
;


for
(
int

i = 0; (i < 10); i++) {


threadGroup.push_back
(


std::thread(
primeCheck
,
std::ref
(numbers),
std::ref
(isPrime),


i * 1000, 1000
));


}


std::
for_each
(
threadGroup.begin
(),
threadGroup.end
(),


[](std::thread& t){
t.join
();});


return

0;

};


Task Parallelism Example

Part 1/

#include

<thread>

#include

<vector>

#include

<algorithm>


void

isPrime
(
const

int

num,
bool
& result);

void

isPallindrome
(
const

int

num,
bool
& result);

void

isEuclid
(
const

int

num,
bool
& result);


int

main() {


const

int

num = rand();


// Create threads to perform various processing on 1 number.


std::vector<std::thread>
threadGroup
;


bool

result1, result2, result3;


threadGroup.push_back
(std::thread(
isPrime
, num,
std::ref
(result1)));


threadGroup.push_back
(std::thread(
isPallindrome
, num,
std::ref
(result2)));


threadGroup.push_back
(std::thread(
isEuclid
, num,
std::ref
(result3)));


// Wait for the threads to finish


std::
for_each
(
threadGroup.begin
(),
threadGroup.end
(),


[](std::thread& t){
t.join
();});


return

0;

};

Straightforward parallelism


Several problems easily lend themselves to run using
multiple process or threads


Data parallel


Use multiple threads/processes to change separate pixels in a
image


Process multiple files concurrently using many threads or
processes


Task Parallel


Run various data mining algorithms on a given piece of data using
multiple threads


Update different indexes and relations in a database when a new
record is added.


Convert a video to many different formats and resolutions using
multiple threads or processes.


Searching for a given search
-
term on multiple indexes using
several threads or processes to provide instant search results.

Parallelism in Reality


Both data and task parallel systems have to ultimately
coordinate concurrent execution


Primarily because humans deal with serial information


The threads have to be coordinated to generate final information


Data parallel: A image has to be fully converted to be displayed


Task parallel: Search results may need to be combined to prioritize higher
quality results


Concurrency in practice


Many applications involve a combination of data and task
parallel operations


Applications may switch their mode of operation


Programs require exchange of information between concurrent
processes or threads


Multiple processes or threads may be used for effectively using
hardware


Perform I/O on a different thread than the one performing
computation


However the threads need to coordinate to generate final results

Cooperating
Processes & Threads


Multiple
threads or processes
share resources


Most typical scenario in real world


Control


Two
(or more) threads/processes
need to alternate running


Data


Threads share
data


Either using the same object
instance passed when thread is created


Using
static or global objects


Processes share data using Inter Process Communication (IPC)


Using shared memory


Using message queues


However, all shared data including IPC
mechanisms need to be coordinated to ensure
consistent operation.

Synchronization: Coordinating
Concurrency


The task of coordinating multiple processes
or threads is called “
synchronization



Synchronization is necessary to


Consistently access shared resources or data


Control/coordinate operations between multiple
threads


Synchronization is a necessary overhead


Different strategies are used in different situations to
manage overheads better


The strategies essentially tradeoff


CPU busy waiting


CPU idling

Example of incorrect Multithreading

#include

<thread>

#include

<vector>

#include

<algorithm>

#include

<
iostream
>


#define

THREAD_COUNT 50


int

num

= 0;


void

threadMain
() {


for
(
int

i = 0; (i < 1000); i++) {


num
++;


}

}

int

main() {


std::vector<std::thread>
threadGroup
;


for
(
int

i

= 0; (
i

< THREAD_COUNT);
i
++) {


threadGroup.push_back
(std::thread(
threadMain
));


}


std::
for_each
(
threadGroup.begin
(),
threadGroup.end
(),


[](std::thread& t){
t.join
();});


std::
cout

<<
"Value of num = "

<< num << std::
endl
;


return

0;

}

Output from multiple runs:

$ ./
thread_test

Value of num = 50000

$ ./
thread_test

Value of num = 50000

$ ./
thread_test

Value of num = 49000

$ ./
thread_test

Value of num = 49913

$ ./
thread_test

Value of num = 49884

$ ./
thread_test

Value of num = 49000

$ ./
thread_test

Value of num = 50000

g++
-
std=
c++
0x
-
g
-
Wall thread_test.cpp
-
o
thread_test

-
lpthread

Variable
num

is read
and modified by
multiple threads.

Problem with
Code on previous
slide

Thread 1

Thread 2

num

1

num++

num++

2

Reads 1

Reads 1

Writes2

2

Writes2

Thread 1

Thread n

num

1

num++

num++

2

Reads 1

Reads 1

3

Expected Behavior

Unexpected Behavior

(aka Race Condition)

Not running

Running

Race Condition


Race condition is the term used to denote
inconsistent operation of multi
-
process/multi
-
threaded programs


Race conditions occur due to:


Inconsistent Sharing of
control


Invalid assumption that another thread
will/will
-
not
run


Inconsistent Sharing of data


Overlapping reads and writes to shared
data from multiple threads


Typical symptoms of race conditions


Program runs fine most of the time


Occasionally, the program does not run correctly


Root
cause of race conditions


Non
-
deterministic thread scheduling


Invalid assumptions in
program

Race Conditions & Scheduling


Threads are scheduled by OS


Threads take turn to use the CPUs


Typically,
number of threads
running
in parallel is equal to number of
cores on a computer


Modern OS are
preemptive


Threads run for a maximum of quantum of time


Threads are forcibly context
switched immaterial of
the operation they are performing


Context switches occur at instruction level


Each C++
statement maps to 1 or more
instructions.


Consequently,
context switches can occur in the
middle of executing a C++ statement
!


Seemingly straightforward code can suffer from race
conditions when incorrectly used.

Scheduling Cues


Thread API includes method for providing
scheduling cues



yield()

: Thread voluntarily relinquishes CPU


Does not use a full quantum of
time.


Suggestion to OS to run some other thread



sleep_for
()

&
sleep_until
()
:
Thread does not need
or use CPU
for given time


Will be rescheduled after time
elapses



Pthread

library permits setting relative priority for threads


Higher priority threads are scheduled more frequently


OS may ignore scheduling cues


No guarantees on which thread runs
next


Scheduling cues
do not prevent race conditions
!


How to avoid race conditions
?

Critical Section (CS
)

Concept to avoid race conditions


CS: Part of code where sharing occurs


Control is yielded


Shared data is modified



Four rules
to
avoid race conditions

1.
No 2 threads in same CS simultaneously

2.
No assumptions about speed or number of
cores/CPUs

3.
No thread outside
a CS
may block a thread in
the CS

4.
No thread should wait forever to enter its CS

Satisfying the 4 Conditions


Responsibility lies with the programmer


Have to carefully design and implement your code


Several different
approaches/solutions to achieve
critical sections:


Hardware approaches


Used for multiple processor scenarios


Typically not
directly controllable from a standard program


OS use
them internally


Software approaches


Applied to threads
and processes running
on
same machine


OS exposes necessary API to help with coordination


Various languages provide additional API for ease


Implemented using OS API


Normally a combination of hardware and software
approaches are used together


Hardware approaches for multi
-
core/multi
-
CPU machines


Software approaches to facilitate coordination between
multiple processes/threads.

Disabling
Interrupts

(Hardware approach)


Context switches occur using interrupts


Disabling interrupts


Disable interrupts before entering CS


No other thread can now run


Quickly complete CS


Re
-
enable interrupts


Usage


Used only by the OS


Particularly when there are multiple CPUs


When performing very critical operations


CS is very small and fast


Typically, no I/O in CS

Test
-
Set
-
Lock (TSL)
Instruction

(Hardware approach)


Special instruction in processor


Used in multi
-
processor systems


Guarantees only 1 instruction access memory


Other processors have are stalled


Busy
-
wait strategy: wastes CPU cycles


X86 Instruction set uses LOCK prefix


Can be added to selected instructions


Instructions are longer and different!


Consequently need different software for single & multiprocessor
systems


Typically OS performs this task


Consequently different
kernels for
single & multiprocessor systems

Strict
Alternation

(Software approach)


Software solution


Threads take turns to use CPU


Other thread does a busy
-
wait (wastes CPU
)


Busy waiting is often involves using a “spin lock”


The process spins in a tight loop waiting for the lock to open/release.


See example below:


Spin lock is achieved
using a shared (
turn
)
variable


Changing the turn variable usually requires special instruction on multi
-
core
machines

Thread 1

Thread 2

while
(
true
)
{


while

(turn !=
0
);


//critical section


turn = 1
;


//non
-
critical section

}

while

(
true
)
{


while

(turn !=
1
);


//critical section


turn = 0
;


//non
-
critical section

}

Strict
Alternation (Contd.)


Usage


Critical sections take same time on both threads


Non
-
critical sections take same time on both
threads


Negative


Busy waiting strategy burns CPU
cycles


Does not scale efficiently to many threads


Advantage


Straightforward to implement

Peterson

s Solution


Combination of earlier methods


Shared (lock) variables + alternation


Resolves issue with different CS timings


Faster threads get more turns using CS


Threads do not necessarily alternate


Threads first indicate interest to enter CS


Threads interested in entering CS take turns


In the classical description of Peterson’s solution
threads busy wait


Busy waiting
burns CPU cycles

Sleep &
Wakeup

(Modification to Peterson’s solution)


Avoids busy
waiting to efficiently use CPU


Multiple threads try to enter Critical Section (CS)


But only one thread can enter the CS


If thread cannot enter CS it
blocks by calling
wait()


In this context, the
wait()

method is a conceptual API


Blocked threads do not use CPU


Thread in
critical section
wakes up sleeping
threads by calling
notify()


In this context, the
notify()

method is a conceptual API


As part of leaving CS


Sleep & Wakeup needs
special support from
OS


For threads to block when they cannot enter critical section


For threads to be notified when they can enter critical section


Generic programming/usage example:


Producer
-
Consumer problems


One thread generates data


Another thread uses the data

Problems
with Sleep & Wakeup


Notifies or Wake
-
ups
may get lost


Thread1
first enters critical section


Thread2 tries to enter critical section but cannot


Thread1 meanwhile leaves critical section calling
notify()


No threads are waiting as Thread2 has not yet called
wait()


Thread2 calls
wait
()


Thread2 will
never
get
notified


The key problem is that
check
-
and
-
wait is not performed
as a single atomic operation
.


Priority Inversion: Low priority thread blocks high priority
thread!


Assume 2 threads H (high priority) and L (low priority)


Assume scheduling rule: H is scheduled if it is ready!


Priority inversion case:


L is in critical section


H becomes ready and is trying to enter critical section (CS)


Scheduler keeps scheduling H (but not L). So L does not leave the CS and H
cannot enter CS

Semaphores


Address problems with lost wakeups


Semaphore


Shared variable
that counts number of wakeups


Processes/threads
check semaphore before waiting


Check & wait is performed as
one
indivisible step


Threads wait only if semaphore is 0


Wakeups/notifies
increment semaphore


And un
-
block one or more
threads


Linux provides semaphores that can be used by
multiple processes


Mutex

(A binary Semaphore)


A binary
valued semaphore (only one thread can be in the
critical section)


Typically used with conceptual
lock

and
unlock

methods to
increment and decrement the binary semaphore

Mutex

Classes in C++


The C++ (2010) standard includes several different types of
Mutex

classes


See
http://en.cppreference.com/w/cpp/thread


The
std::
mutex

class


Simple
mutex

with indefinitely blocking
lock

method


The
unlock

method is used to unlock the
mutex
.


The
std::
timed_mutex


Adds to methods in
std::
mutex

by including methods that enable
timed/non
-
blocking lock method.


The
std::
recursive_mutex


This class enables a thread to repeatedly lock the same
mutex
. The locks are
blocking.


The number of locks and unlocks must match.


The
std::
recursive_timed_mutex


The most comprehensive
mutex

class that permits repeated locking and
timed/non
-
blocking locks.


The number of locks and unlocks must match.


These classes also provide several types of locking strategies to
ease developing program with different requirements.

std::
lock_guard


The number of locks and unlocks must match


Immaterial of any exceptions that may arise in critical
sections


If the locks and unlocks don’t match then the
program will deadlock.


The
std::
lock_guard

class ease this logic


The
mutex

is locked in the constructor


The
mutex

is unlocked in the destructor which is
always invoked immaterial of exceptions or what path
the control flows


Such use of constructor and destructor is a common
design pattern in C++ that is called
RAII

“Resource
Acquisition Is Initialization”


A simple multi
-
threaded example

#include

<vector>

#include

<algorithm>

#include

<
iostream
>

#include

<thread>

#include

<
mutex
>


#define

THREAD_COUNT 50


int

num = 0;


// A
mutex

to synchronize access to num

std::
mutex

gate;


void

threadMain
() {


// Automatically lock & unlock


std::
lock_guard
<std::
mutex
>
guard(gate);


for
(
int

i = 0; (i < 1000); i++) {


num++;


}

}


int

main() {


std::vector<std::thread>
threadGroup
;


for
(
int

i

= 0;


(
i

< THREAD_COUNT);
i
++) {


threadGroup.push_back
(


std::thread(
threadMain
));


}


std::
for_each
(


threadGroup.begin
(),


threadGroup.end
(),


[](std::thread& t)


{
t.join
();});




std::
cout

<<
"Value of num = "



<< num << std::
endl
;


return

0;

}

Producer
-
Consumer Model


Many multi
-
threaded programs fall into a
Producer
-
Consumer model


A shared, finite
-
size queue is used for interacting
between producers and consumers


Shared queue enables producers and consumers to
operate at varying speeds


Producer adds entries (to be processed) to the queue


If the queue is full the producer has to wait until there is
space in the queue


Typically the consumer notifies the producer to add more
entries.


Consumer removes entries from the queue and
process it.


If the queue is empty then the consumer has to wait until
some data is available to be processed

Producer
-
Consumer (Part 1/2)

#include

<
iostream
>

#include

<thread>

#include

<
mutex
>

#include

<queue>


// A shared queue

std::queue<
int
> queue;

//
Mutex

to
sychronize

// access to the queue

std::
mutex

queueMutex
;

// Max entries in the queue

const

size_t

MaxQSize

= 5;


void

producer(
const

int

num);

void

consumer(
const

int

num);


int

main() {


std::thread prod(producer, 500);


std::thread con(consumer, 500);


prod.join
();


con.join
();


return

0;

}


queue

producer

consumer

queueMutex

maxQsize

== 5

critical

section

Producer
-
Consumer (Part
2/
2)

void

producer(
const

int

num) {


long

idle = 0;


int

i

= 0;


while

(
i

< num) {


queueMutex.lock
();


if

(
queue.size
()<
MaxQSize
) {


queue.push
(rand()%10000);


i
++;


}
else

{


idle++;


}


queueMutex.unlock
();


}


std::
cout

<<
"Producer idled "



<< idle <<
" times."



<< std::
endl
;

}

void

consumer(
const

int

num) {


long

idle = 0;


int

i

= 0;


while

(
i

< num) {


int

val

=
-
1;


queueMutex.lock
();


if

(!
queue.empty
()) {


val

=
queue.front
();


queue.pop();


i
++;


}
else

{


idle++;


}


queueMutex.unlock
();


if

(
val

> 0) {


// Process the value?


usleep
(
val
);


}


}


std::
cout

<<
"Consumer idled "



<< idle <<
" times
\
n“;

}


Critical section

Critical section

Shortcomings of previous producer
-
consumer solution


The producer
-
consumer example in the previous 2
slides works correctly without race conditions

1.
No 2 threads in same CS
simultaneously


The threads use a single lock to ensure only one thread is in the
critical section at any given time.

2.
No assumptions about speed or number of cores/
CPUs


No such assumptions in the code

3.
No thread outside a CS may block a thread in the
CS


There is only a single
mutex

and a single CS. Consequently, a
thread outside a CS cannot block thread in the CS

4.
No thread should wait forever to enter its
CS


That is why the
usleep

(representing work being done) is not
inside critical section.


However, the solution is not efficient!


There is wasted CPU cycles when:


The producer thread spins in the loop if queue is full


The consumer thread spins in a loop if queue is empty

Eliminating wasted CPU cycles


Standard solution of using
mutexs

to share data is
inefficient


Threads have to busy wait for suitable operating conditions


In the previous example, producer has to wait if queue is full


In the previous example, the consumer has to wait if the queue is
empty


Busy waiting burns CPU cycles, degrading efficiency of the
system


The CPU could be doing other tasks


Not was energy performing the same checks


How to improve efficiency of data sharing?


Waiting for suitable operating conditions cannot be controlled


Avoid busy
waiting


Solution


Instead provide a blocked waiting mechanism


However, the mechanism needs to streamline managing
critical sections

Monitors


Address problems with
busy waiting


Use condition variables to exit from blocking waits


Reduce overhead on programmers


Provide special language constructs


Streamlines program development


Compiler or standard library
handles other overheads


In collaboration with
OS


Monitors are higher level concepts than
Mutex


Monitors do need a
Mutex

to operate


C++ provides a
std::
condition_variable

object that
can be used as a monitor


It is used with
std::
unique_lock


Java implementation of Monitor


Via
synchronized

keyword (that provides a
mutex
)


Each
java.lang.Object

instance has a monitor that
synchronized

code uses for achieving critical sections


Additional methods
wait

and
notify

are used to release and
reacquire locks as needed.

std
::
condition_variable


Synchronization mechanism to conditionally
block threads until:


A notification is received from another thread


A timeout value has been specified for waiting


A spurious wakeup occurs (which is rare)


A
std
::
condition_variable

requires
two pieces of information


A
std
::
unique_lock

on a
std
::
mutex

guaranteeing it is being used in a critical section


A predicate that indicates the wait condition

Producer
-
Consumer (Part 1/2)

#include

<
iostream
>

#include

<thread>

#include

<
mutex
>

#include

<queue>


// A shared queue

std::queue<
int
> queue;

// Condition variable to avoid spin
-
locks

std::
condition_variable

data_cond
;

//
Mutex

to
sychronize

access to the queue

std::
mutex

queueMutex
;

// Max entries in the queue

const

size_t

MaxQSize

= 5;


void

producer(
int
);

void

consumer(
int
);


int

main() {


std::thread prod(producer, 500);


std::thread con(consumer, 500);


prod.join
();


con.join
();


return

0;

}

The monitor constructor that:

1.
Avoid busy waiting

2.
Enables blocking until a
condition is met.

3.
Notifies other waiting
threads about potential
change in wait status.

4.
Requires an already locked
mutex

for operation.

Producer
-
Consumer (Part 1/2)

void

producer(
int

num) {


for
(
int

i = 0; (i < num); i++){


std::
unique_lock
<std::
mutex
>
lock(
queueMutex
);


data_cond.wait
(lock,


[]{
return

queue.size
() <


MaxQSize
; });


queue.push
(rand() % 10000);


data_cond.notify_one
();


}

}


void

consumer(
const

int

num) {


for
(
int

i = 0; (i < num); i++){


std::
unique_lock
<std::
mutex
>
lock(
queueMutex
);


data_cond.wait
(lock,


[]{
return

!
queue.empty
();});


int

val

=
queue.front
();


queue.pop();


data_cond.notify_one
();


queueMutex.unlock
();


if

(
val

> 0) {


// Process the value?


usleep
(
val
);


}


}

}


Operation of
wait

method


The
wait

method causes the calling thread to block until


T
he condition variable is notified (by another thread)
AND


An optional predicate (method that returns a Boolean value) is
satisfied


Specifically, the
wait

method

1.
Atomically releases lock on a given locked,
mutex

2.
Adds calling thread to the list of threads waiting on the
condition variable (namely
*
this
)

3.
Blocks the current thread until
notify_all

or
notify_one

is called on the condition variable (namely
*
this
) by another
thread.

4.
When notified, the thread unblocks, the lock on the
mutex

is
atomically reacquired

5.
An optional predicate is checked and if the predicate returns
false the
wait

method repeats from step 1. Otherwise the
wait method returns control back.

Interaction between wait & notify

std
::
mutex

mutex

std
::
condition_variable

cv

The
std
::
mutex

and
condition_variable

object that is shared
by multiple threads

Block

Release lock on
mutex

and blocks
thread until
notified.

Is
predicate
true?

NO

Reacquire lock on
Mutex

Done

Waiting

THREAD
α

std
::
unique_lock
<
std
::
mutex
>
lock(
mutex
);

cv.wait
(lock, predicate);

THREAD
α

std
::
unique_lock
<
std
::
mutex
>
lock(
mutex
);

Blocks

to acquire lock

cv.notify
();

Lock acquired

mutex.unlock
();

Passing results between threads


Threads cannot return values directly


Methods that return values can be run in a separate
thread. However, there is no intrinsic mechanism to
obtain return values


Have to use some shared intermediate storage to obtain
return values


The shared storage needs to be suitably guarded to avoid
race conditions


Sharing values between threads can be a bit
cumbersome


Solution:
std
::future


Provides a multi
-
threading safe (MT
-
safe) storage to
exchange values between threads


Futures can be created in two ways:

1.
Using
std
::
async

method

2.
Using
std
::promise
class

std
::
async


The
method
std
::
async

runs
a given function
f

asynchronously
(potentially in a separate thread) and returns a
std
::future
that will eventually hold the result of that function call
.


The
std
::future
class can also report exceptions just as
-
if it was
a standard method call.


#include

<future>

#include

<
iostream
>


int

gameOfLife
(
std
::
string

s) {


sleep
(3);


std
::
cout

<< s <<
": finished
\
n"
;


return

20;

}


int

main() {


std
::future<
int
> result =



std
::
async
(
std
::launch::
async
,
gameOfLife
,
"
async
"
);


sleep
(5);
// Pretend to do something important


std
::
cout

<<
"Result = "

<<
result.get
() <<
std
::
endl
;


return

0;

}

Other values include:

std
::launch::sync



Method is run only
if get() method is called on future.

s
td
::launch::any


System decides if it
runs as synchronous or asynchronously.

std
::promise


The
std
::
async

method provides a
mechanism to intercept and return values of
methods


It does not provide a placeholder for setting and
then getting values.


The
std
::promise
class provides a
placeholder


Placeholder is multi
-
thread safe (MT
-
safe)


One thread can set a value


Another thread can get the value via a
std
::future
.


Using
std
::promise

class

#include

<future>

#include

<
iostream
>

#include

<
cmath
>


// Returns highest prime number between

// 2 to max

int

getPrime
(
const

int

max);


void

thread1(
int

max,



std
::promise<
int
>& promise) {


int

prime1 =
getPrime
(max);


std
::
cout

<<
"prime1 = "




<<
prime1 <<
std
::
endl
;


promise.set_value
(prime1);

}

int

thread2(
int

max,




std
::promise<
int
>& promise) {


int

prime2 =
getPrime
(max);


std
::
cout

<<
"prime2 = "




<<
prime2 <<
std
::
endl
;


int

prime1 =




promise.get_future
().get();


return

prime2 * prime1;

}


int

main() {


std
::promise<
int
> prom;


std
::
async
(
std
::launch::
async
,


thread1
, 99999,



std
::ref(prom));


std
::future<
int
> result =



std
::
async
(
std
::launch::
async
,



thread2
, 50000,
std
::ref(prom));


// Do
some work here!


std
::
cout

<<
"Result = "




<<
result.get
()


<<
std
::
endl
;


return

0;

}

promise

thread1

set_value

future

thread2

get

Wait for value
to be ready

std
::atomic


The
std
::atomic
class provides atomic
multi
-
threading safe (MT
-
safe)
primitive

types


Examples


std
::atomic<
int
>
atInt

= ATOMIC_VAR_INIT(123
);


std
::atomic<
bool
>
atBool

= ATOMIC_VAR_INIT(
false
);


std
::atomic<
double
>
atDouble

= ATOMIC_VAR_INIT(M_PI
);


Specializations are provided for many primitive
types


Specialization may provide lock
-
free MT
-
safe
implementations


It can be used with objects that provide
necessary operator overloading

Example of
std
::atomic

#include

<vector>

#include

<algorithm>

#include

<
iostream
>

#include

<thread>

#include

<atomic>


#define

THREAD_COUNT 50


std
::atomic<
int
>
num

= ATOMIC_VAR_INIT(0);


void

threadMain
() {


for

(
int

i = 0; (i < 1000); i++) {


num
++;


}

}


int

main() {


std
::vector<
std
::thread>
threadGroup
;


for

(
int

i

= 0; (
i

< THREAD_COUNT);
i
++) {


threadGroup.push_back
(
std
::thread(
threadMain
));


}


std
::
for_each
(
threadGroup.begin
(),
threadGroup.end
(),


[](
std
::thread& t){
t.join
();});


std
::
cout

<<
"Value of
num

= "

<<
num

<<
std
::
endl
;


return

0;

}

Increment, decrement, and assignment
operations on atomic types are
guaranteed to be MT
-
safe. Refer to API
documentation for more methods in
std
::atomic
(
http://
en.cppreference.com/w/cpp/ato
mic/atomic
)

Multi
-
process semaphore


So far we have studied multi
-
threaded
semaphores and monitors


Linux supports following APIs for semaphore
operations between processes on the same
machine:


semget
: Allocate one or more semaphores and
obtain
key
(a integer value) for further operations.


semop

and
semtimedop
: Increment, decrement
or wait for semaphore value to become zero.


semctl
: Perform various control operations on
semaphores, including deleting them
immediately.

semget



shmget

allocates one or more semaphores



int

shmget
(
int

key,
int

nsems
,
int

sem_flags
)


Same
key

value is used by all processes. A special key value of
IPC_PRIVATE

is handy to share semaphores between child
processes.


The
nsems

parameter indicates number of semaphores to be
allocated.


The
shm_flags

can be



0

(zero): Get existing semaphores starting with
key

key
. If semaphores are
not found, then
semget

returns
-
1 indicating error.



IPC_CREATE
: Use existing semaphores with
key

key
. If semaphores do not
exist, then create a new ones.



IPC_CREATE | IPC_EXCL
: Create semaphores starting with
key

key
. If
semaphores already exists then return with
-
1 indicating error.


The flags must include
S_IRUSR | S_IWUSR
flags to enable read & write
permissions for user creating the shared memory


The flags may also include flags to enable read & write permissions for users
in your group and others (rest of the world). See man pages for various flags.


Return value: A non
-
negative
key

value (integer) for use with other
shared memory system calls.


On errors this call returns
-
1 and
errno

is set to indicate cause of error

semop


This system call can be used to perform
following operations:


Add positive value to semaphore (never blocks)


Add negative
-
value to semaphore but block if result
will be negative


int

semop
(
int

semid
,
struct

sembuf

*sops,
shm_flag

nsops
)


The
semid

value must be a valid key returned by
semget

syscall
. This is starting semaphore id value.


The
sops

parameter is array of
struct

sembuf

that
contain information about type of operation to be performed


The
nsops

indicates the number of semaphore operations
to be performed.


Return value: On success returns 0 and
-
1 on error.


On error
errno

is set to indicate cause of error

semctl


This system call to control or delete
semaphores


int

semctl
(
int

semid
,
int

semnum
,
int

cmd
, …)


The
semid

value must be a valid key returned by
semget

syscall
. This is starting semaphore id value.


semnum

indicates the number of consecutive
semaphores to be operated on.


cmd

indicates the command to be performed, such as
IPC_SET or IPC_RMID.


Return value: On success the return value is 0 (zero).
On errors, the return value is
-
1 and
errno

is set to
indicate cause of error.

Multi
-
process Semaphore (1/2)

#include

<
iostream
>

#include

<sys/
types.h
>

#include

<sys/
ipc.h
>

#include

<sys/
sem.h
>

#include

<sys/
wait.h
>


void

source(
int

semID
);

void

sink(
int

semID
);


int

main() {


const

int

semID

=
semget
(IPC_PRIVATE, 1, IPC_CREAT | 0600);


const

int

pid

= fork();


if

(
pid

== 0) {


sink(
semID
);


}
else

{


source(
semID
);


int

exitCode
;


wait(&
exitCode
);


semctl
(
semID
, 1, IPC_RMID);


}


return

0;

}

Multi
-
process Semaphore (2/2)

void

source(
int

semID
) {


for
(
int

i=0; (i<100);
i++) {


struct

sembuf

sops[1]


= {{0,
+1
, 0}};


// Produce


semop
(semID,sops,1);


std::
cout



<<
"+Process("


<<
getpid
()


<<
") produced "



<<
i

<< std::
endl
;


}

}

void

sink(
int

semID
) {


for
(
int

i=0; (i<100);
i++) {


struct

sembuf

sops[1]


= {{0,
-
1
, 0}};


// Consume!


semop
(semID,sops,1);


std::
cout



<<

-
Process("


<<
getpid
()


<<
") consumed "



<<
i

<< std::
endl
;


}

}