Pitfalls in Teaching Development and Testing of Concurrent Programs and How to Overcome them

coleslawokraSoftware and s/w Development

Dec 1, 2013 (3 years and 4 months ago)

54 views

Pitfalls in Teaching Development and Testing of

Concurrent Programs and How to Overcome them


Eitan Farchi

IBM Labs in Haifa

2

Contest

Objectives of the course I wanted to teach


Background


The process abstraction, mutual exclusion and conditional synchronization, scheduling policies
and fairness, the process life cycle, synchronization primitives (semaphores, monitors), message
passing, logical time, examples,…


Design the protocol through an abstraction


Use atomic and atomic wait primitives


(c1, s1)


s2 => (c1 || c2, s1)


(c2, s2)


(c1, s1)


s2 => (<c1>, s1)


s2


(b, s1)


true and (c, s1)


s2 => (<await b


c>, s1)


s2


The use of higher abstraction level synchronization primitives lead to


Lower number of possible interleavings


Mistakes are less likely


Design is validated through


Reviewing the important interleavings


Formal reasoning (invariants, proofs, model checking,…)


Higher abstraction level synchronization primitives are correctly translated to lower abstraction level
synchronization primitives


For example, an atomic primitive is carefully translated to locks and unlocks


Bug patterns are used to avoid mistakes


The implementation is tested using ConTest


At this stage a good test plan is readily available from the previous development phaes




IBM Labs in Haifa

3

Contest

I thought this course for several years in various formats


To third year computer science students


To professional programmers


With and without experience in development of concurrent programs


At least first degree in computer science


To testers with various degree of programming skills


IBM Labs in Haifa

4

Contest

Real world description of the ticket algorithm (start with
something concrete

)


Some stores/government offices employ the following method to ensure
that customers are serviced in order of arrival


Upon entering the store, a customer draws a number that is larger
than the number held by any other customer


The customer then waits until all customers holding smaller numbers
have been serviced


This algorithm is implemented by a number dispenser and by a
display indicating which customer is being served


If the store has one employee behind the service counter, customers
are served one at a time in their order of arrival

IBM Labs in Haifa

5

Contest

High level implementation of the ticket algorithm

var number := 1, next := 1, turn[1:n] := ([n], 0)

P[1:1..n]:: do true
-
>


<turn[i] := number, number := number + 1>


<await turn[i] == next>


critical section


<next := next +1>


non
-
critical section


od

IBM Labs in Haifa

6

Contest

Mapping of the previous two abstraction levels (real world and
high level descriptions)


<turn[i] := number, number := number + 1> // customer obtains a ticket


<await turn[i] = next> // customers wait their turn


<next := next +1> // call for next customer

IBM Labs in Haifa

7

Contest

Testing/validating the protocol


Even if the synchronization primitives are high level there are typically too many
interleavings to review


This is addressed by inductive proof, invariants


Assuming process i entered the critical section then


turn[i] == next right after <await turn[i] == next>.


It is easy to prove that turn[i] <> turn[j] if i <> j and turn[i] <>
0
and turn[j] <>
0


Thus, as long as the critical section is not exited, any process that will reach
<await turn[i] == next> will have to wait and


at most one process can enter the critical section.


Students




“We don’t like mathematics and we don’t like proofs, in fact, we hate them”





“And by the way


the ticket algorithm is ridiculously simple


its only a loop
with for lines of code”


Maybe they don’t understand there is an exponential space of possible
interleavings?



IBM Labs in Haifa

8

Contest

Objectives of the course
-

updated


Background


The process abstraction, mutual exclusion and conditional synchronization, scheduling policies and fairness, the
process life cycle, synchronization primitives (semaphores, monitors), message passing, logical time, examples,…


Design the protocol through an abstraction


Use atomic and atomic wait primitives


(c1, s1)


s2 => (c1 || c2, s1)


(c2, s2)


(c1, s1)


s2 => (<c1>, s1)


s2


(b, s1)


true and (c, s1)


s2 => (<await b


c>, s1)


s2


The use of higher abstraction level synchronization primitives lead to


Lower number of possible interleavings


Mistakes are less likely


Design is validated through


Systematically represent the set of possible interleavings


Typically through the use of Cartesian product models


Reviewing the important interleavings


Higher abstraction level synchronization primitives are correctly translated to lower abstraction level synchronization
primitives


For example, an atomic primitive is carefully translated to locks and unlocks


Bug patterns are used to avoid mistakes


The implementation is tested using ConTest


At this stage a good test plan is readily available from the previous development phases




IBM Labs in Haifa

9

Contest

Helping the students realize that there is an exponential
interleaving space



First attempt
-

counting


The number of possible interleavings is enormous


For (a;b;c;e;f;g)||(h;I;j;k;l;m) of none blocking atomic actions the number
of possible traces is 12!/(6!*6!) = 924


Second attempt


riddles


100 threads are executing x++ on a shared variable initialized to 0,
what are the possible outcomes?


Students


“OK there are many things happening together in parallel and
they can occur in many ways


but it is hard, too hard, to think about
things happening in parallel”


IBM Labs in Haifa

10

Contest

Serialization helps understand the algorithm

Process
1

Process 2

number

next

turn[1]

turn[2]

1

1

0

0

<turn[1] := number,
number := number + 1>

2

1

1

0

<await turn[1] = next>

<turn[2] := number,
number := number + 1>

3

1

1

2

<await turn[2] = next>

blocks

critical section

<next := next +
1
>

3

2

1

2

IBM Labs in Haifa

11

Contest

Serialization helps understand the algorithm (Continued)

Process 1

Process 2

number

next

turn[1]

turn[2]

returns

critical section

<next := next + 1>

3

3

1

2

IBM Labs in Haifa

12

Contest

Next we implement the protocol


Students




“Locks are easy to use


no need to read the instructions”

IBM Labs in Haifa

13

Contest

Avoid errors by understanding the synchronization primitives
[precise
-
java]


In Java each object is associated with a lock


Consider the following class

class Conflict {


Conflict(…){ synchronized(Conflict.class){…}; };


synchronized static void f(…){….};


synchronized void g(…){….};


void h(…){


synchronized(this){….};


};


void r(…){…};

};


Which of the following pairs of methods when executing concurrently can cause
a conflict?


f || g, f || h, f || r, g || h, g || r, h || r


Pairs of the constructor method and one of the other methods




IBM Labs in Haifa

14

Contest

Translating from abstract to concrete
-

implementation pitfalls
are explained


Difference between atomicity and locking


What is the protection provide by


synchronized(o){x++} occurring in parallel to x++?


When translating from an atomic block to locks/unlocks we need to identify all program locations that
contened on the shared resource


Check that the lock was obtained


this is not good


lock()

unlock()


Check that the lock was released along all error paths


What happens if a signal is taken while in the critical section (pthreads)


What happens if an interrupt exception is taken while in wait()?


try{



synchronized(o){




o.wait();



}

}catch(Exception e){

}





When atomic conditional wait is implemented we typically introduce a race and we need to recheck the
condition once in the critical section


Teaching pitfalls is highly effective in reducing the learning curve

IBM Labs in Haifa

15

Contest

Hiding the protocol implementation


Prepare a general synchronization
services for the system located in a
separate class (see picture on the right)


Students
-

“OK but we’ll implement the
protocol all over the place any way”


Hard to teach without real life large
systems experience



Hard to suggest to engineers that
maintain an existing system that is not
like that



If its not broken don’t fix it…






IBM Labs in Haifa

16

Contest

Testing


Running many times a test that has a concurrency problem does not necessarily
produce it


Especially in unit test environments


Easy to demonstrate through examples


Create an “empty test” in which the synchronization primitives used are mapped
to no
-
ops and shoe that the protocol “works fine”


Best practice


your test should at least expose a problem with the “empty
implementation”


Running black box tests that have the required contention (e.g., customers
accessing the ticketing system simultaneously) does not necessarily produce the
white box contention you are after




The blocking in <await turn[i] == next> to occur and not occur


A context switch to occur right before and right after <await turn[i] == next>


Defining the coverage tasks you are after and checking their “coverage”
helps



E.g., ConTest synchronization coverage


BACKUP

IBM Labs in Haifa

18

Contest

Exercises
-

knowing the synchronization primitives (Java)


100
threads execute i++ where i is a global variable. Describe all possible outcomes


The following thread is interrupted while waiting at the blue statement below

try{


synchronized(foo){


foo.wait();


}

}catch(Exception e)
{};


Is the thread still holding the lock and is the thread interrupt bit turned on at the red

statement above? What are the answers to the same questions if we change the

program to:



synchronized(foo){


try{


foo.wait();


}catch(Exception e)
{};


}


IBM Labs in Haifa

19

Contest

Exercises
-

knowing the synchronization primitives (Java)


What happens if one thread executes the following method recursively,
e.g., by excecuting factorial(
7
)


synchronized int factorial(int i){


if(i ==
0
)


return(
1
);


else


return(i * factorial(i
-
1
));

}

IBM Labs in Haifa

20

Contest

Will Parallel Programming Become Common Knowledge and the Parallel
Programmer the Programmer of the future?




It is hard to teach parallel programming development and verification to novices


Comprehending the space of possible interleavings is hard


Accurately and correctly defining the behavior of many threads acting in
parallel is hard


With the introduction of multi
-
core, there is an increasing need for programmers
who are able to reliably develop parallel programs


But maybe a different solution is possible?


Can we avoid the need for the parallel programmer?


Can we have the compiler or the programming language encapsulate the
difficulties of parallelism and return the genie to the bottle?


Will parallel programming become common knowledge and the parallel
programmer the agent of the next revolution in programming paradigms?



IBM Labs in Haifa

21

Contest

Will Parallel Programming Become Common Knowledge and the Parallel
Programmer the Programmer of the future?

(continued)


How will future multi
-
core systems be programmed? How well does
existing primitives address various application domains and how well do
they coexist? (
3
)


What is the role of high level primitivies (e.g., the trasaction model). Can it
hide perforomance? (
3
)


Is the major difficulty in programming parallel programs testing them (
2
)?


How do we address students huge difficulties in predicting possible
interleavings and, most special, the unwanted/undesired ones (
2
)?


What courses should be added to the curriculum and what should be taught
on the job? (
2
)


What is the minimum knowledge one needs if the underlying program is
parallel? To be more specific, most programmers probably know close to
nothing about compiler optimization and about the processor structure. Will
they need more knowledge in the future, or can the details be hidden from
them? (
1
)


What will be the minimum knowledge needed by a parallel programmer and
how will he or she acquire it, with emphasis on testing/debugging? (
1
)