Slides as PPT

coleslawokraSoftware and s/w Development

Dec 1, 2013 (3 years and 4 months ago)

58 views

1

Johannes Schneider



Transactional Memory:


How to Perform Load Adaption


in a Simple And Distributed Manner



Johannes Schneider

David
Hasenfratz

Roger
Wattenhofer

2

Johannes Schneider









“computer science will become washing machine science.“


Without easy and efficient parallel programming methods…

How to handle access to shared data?


Locks, Monitors…


Coarse grained vs. fine grained locking



easy but slow program demanding, time consuming but fast programs




















Problems


difficult


error prone


Composability







Johannes Schneider


lock all data


modify/use data


unlock all data


lock
A


lock
B


modify/use A,B


lock
C


modify/use A,B,C


unlock A


modify/use B,C


unlock B,C

lock B

lock A

modify/use
A,B

unlock A,B

Deadlock!

Only 1 thread can
execute

3

Thread 1 Thread 2

Transactional memory(TM)
-

a possible solution


Simple for the programmer




Composable





Idea from database community


Many TM systems (internally) still use locks


But the TM system (not the programmer) takes care of


Performance


Correctness (no deadlocks...)



Johannes Schneider

Begin transaction

modify/use data

End transaction

Method
A.x
()

Begin Transaction

B.y
()



End Transaction

Method
B.y
()

Begin
transaction




End transaction

4

Transactional memory systems


If transactions modify



different data, everything is ok




the same data, conflicts arise that must be resolved


Transactions might get delayed or aborted



Job of a contention manager



A transaction keeps track of all modified values


It restores all values, if it is aborted


A transaction successfully finishes with a commit



Johannes Schneider

5


Abort or delay a transaction, i.e. adapt load


Distributed


Each thread has its own manager


Example


Initially: A=1, B=1

Manager 1 Manager 2

T1

Trans. 1

T1

Trans. 2

B:=2



A:=
3



conflict



A:=
2





A
bort
(undo all changes, i.e. set A
:=
1
)


and
restart (after a while)

T1

Trans.1



A:=
2



Trans. 2

B:=2



A:=
3



conflict






A
bort
(set B
:=
1
)
and
restart


OR


wait and retry

Conflicts


A contention manager decides

Johannes Schneider

6

Manager 1

Manager 2

Delay to adapt load!

Prior work


Contention Managers

[PODC
03
,PODC
05
,ISAAC
09
…]


System load was not (explicitly) considered


Load adaption (based on contention)


Estimate contention intensity: CI
[SPAA
08
]


If abort:
CI =
a

CI + (
1
-
a
) with parameter
a

[
0
,
1
]



If commit: CI =
a

CI


If CI > parameter
b

then resort to central scheduler



Keep a transaction queue per core
[PODC
08
]


Central dispatcher assigns transactions to a core, i.e. its queue



Each core iteratively executes transactions from queue


If transaction A on core
1
is aborted due to B on core
2


then A is appended to the queue of core
2



Central scheduler will become a bottleneck




Johannes Schneider

7

Core 1

Core 2

A

B

C

D

Core 1

Core 2

A

B

C

D

B aborts A

This paper


Theoretical analysis


Decentralized (simple) approaches to load adaption


based on contention



Johannes Schneider

8

Strategies


Ignore: Do not learn from conflicts


ImmediateRestart


Stay real: Remember faced conflicts


SerializeFacedConflicts


Do not schedule prior conflicting transactions


concurrently



Be cautious: Assume additional conflicts


SerializeAll


All
transactions in a
subgraph

are assumed to conflict


Johannes Schneider

9

B

A

D

C

Conflict graph

A conflicted with C

D conflicted with B

A

D

C

B

A

D

C

B

A

C

B

D

Load Adaption Strategies


AbortBackoff


If aborted wait for a random time [0,2
#aborts
]


Priority = number of aborts
#aborts



Who wins a conflict?


2 strategies


Estimate the work done


Unrelated to work done





Johannes Schneider

10

Theory Part
-

Model


n

transactions (and threads)


Start concurrently on
n

cores



Transaction


sequence of operations


operation takes 1 time unit


duration (number of operations)
t
T

is fixed


2 types of operations


Write = modify (shared) resource and lock it until commit


Compute/abort/commit



Ignore overhead of load adaption


Remembering transactions, scheduling…

Johannes Schneider

11

Core 1

Core
2

B

A

Core n

Z



A

Moderate parallelism


Shared counter


Conflicts directly after transaction start


Linked List


Conflicts at arbitrary time


Expected time span until all transactions committed








Speed
-
up log n (at best)

Johannes Schneider

12

Policy

Counter

List

ImmediateRestart

AbortBackoff

SerializeFacedConflicts

SerializeAll

Transaction run time

#transactions

Substantial parallelism


Worst case


Conflict graph is d
-
ary

tree of logarithmic height








Exponential gap in worst case



SerializeAll

and others


Johannes Schneider

13

Policy

Time until transactions

committed

ImmediateRestart

AbortBackoff

SerializeFacedConflicts

SerializeAll



T1

T2

T3

T4

T5



Practical investigation


Remembering conflicts causes too much overhead


Good for analysis but not for implementation


Quickadapter


Serializes transactions


Each core has a “waiting” flag


If aborted, set flag and wait until flag unset


If commit, unset some flag



AbortBackOff



(Also considered some variants)

Johannes Schneider

14

Practical investigation


Evaluation on 16 core machine


DSTM2 system


Visible readers


Six benchmarks


Little parallelism


Shared counter, Sorted List (accessed objects not released),
Listcounter


Considerable parallelism


Red Black Tree,
LFUCache
,
RandomAccessArray



Compare new load adaption policies to existing contention
managers




Johannes Schneider

15

Discussion









Hard to keep maximum throughput,
also in [SPAA08, PODC08]


Even without conflicts


Improvement for 1 benchmark worsens another


On average better than schemes without load adaption

16

Johannes Schneider

Conclusion


Simple and distributed load adaption strategies



Theory


(For now) constants and parameters matter a lot



Practice


Hard to keep load at peak for all usage patterns









17

Johannes Schneider

18

Johannes Schneider

\
vspace
{10pt}


Thanks for your attention!

Questions?


???

Analysis
AbortBackoff

for counter



Recall: If aborted wait for a random time [0,2
#aborts
]


Assume #aborts ~ log (
nt
T
) + x
(for some x)


Define: a(x) := fraction of active nodes


a(0) = 1 (after time ~2
log (
nt
T
)

=
nt
T

a constant fraction still active)



Chance conflict for interval [0,2
#aborts
]
Interval [0, 2
log(
ntT
)+x

]


~ a(x)
nt
T

/ 2
log (
nt
T
) +x

= a(x) /2
x


a(x+1) = a(x)/2
x

= 1/2

i=0..x
i

~ 1/2
x
2


a(√log n) = 1/2
(√log n)
2
= 1/n




i
=0.. log (
nt
T
) +√log n
length interval =

i
=0.. .. log (
nt
T
) +√log n
2
i

=
nt
T

2
√log n+1


Johannes Schneider

19

T1

T2

T3


a(x)
nt
T

= 3/n
n

t
T

= 3t
T