1
Johannes Schneider
Transactional Memory:
How to Perform Load Adaption
in a Simple And Distributed Manner
Johannes Schneider
David
Hasenfratz
Roger
Wattenhofer
2
Johannes Schneider
“computer science will become washing machine science.“
Without easy and efficient parallel programming methods…
How to handle access to shared data?
Locks, Monitors…
Coarse grained vs. fine grained locking
easy but slow program demanding, time consuming but fast programs
Problems
difficult
error prone
Composability
…
Johannes Schneider
lock all data
modify/use data
unlock all data
lock
A
lock
B
modify/use A,B
lock
C
modify/use A,B,C
unlock A
modify/use B,C
unlock B,C
lock B
lock A
modify/use
A,B
unlock A,B
Deadlock!
Only 1 thread can
execute
3
Thread 1 Thread 2
Transactional memory(TM)

a possible solution
Simple for the programmer
Composable
Idea from database community
Many TM systems (internally) still use locks
But the TM system (not the programmer) takes care of
Performance
Correctness (no deadlocks...)
Johannes Schneider
Begin transaction
modify/use data
End transaction
Method
A.x
()
Begin Transaction
B.y
()
…
End Transaction
Method
B.y
()
Begin
transaction
…
End transaction
4
Transactional memory systems
If transactions modify
different data, everything is ok
the same data, conflicts arise that must be resolved
Transactions might get delayed or aborted
Job of a contention manager
A transaction keeps track of all modified values
It restores all values, if it is aborted
A transaction successfully finishes with a commit
Johannes Schneider
5
Abort or delay a transaction, i.e. adapt load
Distributed
Each thread has its own manager
Example
Initially: A=1, B=1
Manager 1 Manager 2
T1
Trans. 1
T1
Trans. 2
B:=2
…
A:=
3
…
conflict
…
A:=
2
…
A
bort
(undo all changes, i.e. set A
:=
1
)
and
restart (after a while)
T1
Trans.1
…
A:=
2
…
Trans. 2
B:=2
…
A:=
3
…
conflict
A
bort
(set B
:=
1
)
and
restart
OR
wait and retry
Conflicts
–
A contention manager decides
Johannes Schneider
6
Manager 1
Manager 2
Delay to adapt load!
Prior work
Contention Managers
[PODC
03
,PODC
05
,ISAAC
09
…]
System load was not (explicitly) considered
Load adaption (based on contention)
Estimate contention intensity: CI
[SPAA
08
]
If abort:
CI =
a
CI + (
1

a
) with parameter
a
[
0
,
1
]
If commit: CI =
a
CI
If CI > parameter
b
then resort to central scheduler
Keep a transaction queue per core
[PODC
08
]
Central dispatcher assigns transactions to a core, i.e. its queue
Each core iteratively executes transactions from queue
If transaction A on core
1
is aborted due to B on core
2
then A is appended to the queue of core
2
Central scheduler will become a bottleneck
Johannes Schneider
7
Core 1
Core 2
A
B
C
D
Core 1
Core 2
A
B
C
D
B aborts A
This paper
Theoretical analysis
Decentralized (simple) approaches to load adaption
based on contention
Johannes Schneider
8
Strategies
Ignore: Do not learn from conflicts
ImmediateRestart
Stay real: Remember faced conflicts
SerializeFacedConflicts
Do not schedule prior conflicting transactions
concurrently
Be cautious: Assume additional conflicts
SerializeAll
All
transactions in a
subgraph
are assumed to conflict
Johannes Schneider
9
B
A
D
C
Conflict graph
A conflicted with C
D conflicted with B
A
D
C
B
A
D
C
B
A
C
B
D
Load Adaption Strategies
AbortBackoff
If aborted wait for a random time [0,2
#aborts
]
Priority = number of aborts
#aborts
Who wins a conflict?
2 strategies
Estimate the work done
Unrelated to work done
Johannes Schneider
10
Theory Part

Model
n
transactions (and threads)
Start concurrently on
n
cores
Transaction
sequence of operations
operation takes 1 time unit
duration (number of operations)
t
T
is fixed
2 types of operations
Write = modify (shared) resource and lock it until commit
Compute/abort/commit
Ignore overhead of load adaption
Remembering transactions, scheduling…
Johannes Schneider
11
Core 1
Core
2
B
A
Core n
Z
…
A
Moderate parallelism
Shared counter
Conflicts directly after transaction start
Linked List
Conflicts at arbitrary time
Expected time span until all transactions committed
Speed

up log n (at best)
Johannes Schneider
12
Policy
Counter
List
ImmediateRestart
AbortBackoff
SerializeFacedConflicts
SerializeAll
Transaction run time
#transactions
Substantial parallelism
Worst case
Conflict graph is d

ary
tree of logarithmic height
Exponential gap in worst case
SerializeAll
and others
Johannes Schneider
13
Policy
Time until transactions
committed
ImmediateRestart
AbortBackoff
SerializeFacedConflicts
SerializeAll
T1
T2
T3
T4
T5
…
Practical investigation
Remembering conflicts causes too much overhead
Good for analysis but not for implementation
Quickadapter
Serializes transactions
Each core has a “waiting” flag
If aborted, set flag and wait until flag unset
If commit, unset some flag
AbortBackOff
(Also considered some variants)
Johannes Schneider
14
Practical investigation
Evaluation on 16 core machine
DSTM2 system
Visible readers
Six benchmarks
Little parallelism
Shared counter, Sorted List (accessed objects not released),
Listcounter
Considerable parallelism
Red Black Tree,
LFUCache
,
RandomAccessArray
Compare new load adaption policies to existing contention
managers
Johannes Schneider
15
Discussion
Hard to keep maximum throughput,
also in [SPAA08, PODC08]
Even without conflicts
Improvement for 1 benchmark worsens another
On average better than schemes without load adaption
16
Johannes Schneider
Conclusion
Simple and distributed load adaption strategies
Theory
(For now) constants and parameters matter a lot
Practice
Hard to keep load at peak for all usage patterns
17
Johannes Schneider
18
Johannes Schneider
\
vspace
{10pt}
Thanks for your attention!
Questions?
???
Analysis
AbortBackoff
for counter
Recall: If aborted wait for a random time [0,2
#aborts
]
Assume #aborts ~ log (
nt
T
) + x
(for some x)
Define: a(x) := fraction of active nodes
a(0) = 1 (after time ~2
log (
nt
T
)
=
nt
T
a constant fraction still active)
Chance conflict for interval [0,2
#aborts
]
Interval [0, 2
log(
ntT
)+x
]
~ a(x)
nt
T
/ 2
log (
nt
T
) +x
= a(x) /2
x
a(x+1) = a(x)/2
x
= 1/2
∑
i=0..x
i
~ 1/2
x
2
a(√log n) = 1/2
(√log n)
2
= 1/n
∑
i
=0.. log (
nt
T
) +√log n
length interval =
∑
i
=0.. .. log (
nt
T
) +√log n
2
i
=
nt
T
2
√log n+1
Johannes Schneider
19
T1
T2
T3
a(x)
nt
T
= 3/n
n
t
T
= 3t
T
Comments 0
Log in to post a comment