QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
1
Disconnected Diagrams, Multi

grid, Nvidia
& all that
y
Richard Brower (Boston University)
James Brannick (Penn)
Ron Babich (BU)
Kipton Barros (BU)
Mike Clark (BU)
George Fleming (Yale)
James Osborn (Argonne)
Claudio Rebbi (BU)
QCDNA
2008
–
Regensburg
Sept
5
,
2008
y
WARNING: Much here is a FUTURE plan NOT proven results but .....
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
What is QCD?
Acronym
Definition:
QCD
Qualified Charitable Distribution (IRS)
QCD
Quality, Cost, Delivery
QCD
Quantum Chromodynamics
QCD
QuarkCopyDesk (file extension)
QCD
Quasi

Cyclic Dyadic
QCD
Quick Change Directory
QCD
Quick Claim Deed (real estate)
QCD
Quintessential CD
(PC media player)
QCD
Quit Claim Deed (real estate)
QCD
Quality Control Department
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
What do these mean?
QCD From Wikipedia, the free encyclopedia
Quintessential Player, formerly known as Quintessential CD
Quality, Cost, Delivery, a three

letter acronym used in lean manufacturing
Quad City DJ's, Southern rap group
Quick Control Dial, a control on many DSLR cameras, like the Canon EOS 40D
Quote

Comma

Delimited known also as Comma

separated values
Quantum chromodynamics, the theory describing the Strong Interaction
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
Physics:
(How strange is the proton?)
Algorithms:
(Multi

grid to the rescue?)
Hardware:
(GPU propagator farm?)
Outline
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
5
Physics:
Disconnected Diagrams
Connected vs. Disconnected
Want matrix element:
X
u,d
N
N
t = 0
t = t
f
t = t'
X
X
u,d
u,d,s
N
N
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
How strange
y
is the proton?
Who cares?
Violation of Standard Model:
Dark Energy (Neutralino scattering):
NuTev anomaly:
Nucleon Physics (include u/d + s quares):
iso

scalar Form Factors, nucleon structure function,
Spin crisis for proton, matrix element etc.
y
see Lattice
2008
:
http://conferences.jlab.org/lattice
2008
/parallel

bytopic

struct.html
S.Collins, G. Bali, A.Schafer
“
Hunting for the strangeness ... nucleon”
Takumi Doi et al “Strangeness and glue in the nucleon from lattice QCD
Ron Babich et al “
Strange quark content of the nucleon
”
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
8
Direct detection of dark matter
In SUSY, the neutralino scatters
from a nucleon via Higgs exchange:
The strange scalar matrix element is
a major uncertainty:
Uncertainty in
f
Ts
gives up to a factor
of
4
uncertainty in the cross

section!
Bottino et al., hep

ph/
0111229
;
Ellis et al., hep

ph/
0502001
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
9
Nuclear Experiment
Pate et al., arXiv:
0805.2889
[hep

ex]
J. Liu et al., arXiv:0706.0226 [nucl

ex]
(see also Young et al., nucl

ex/
0605010
)
Parity

violating electron scattering
(SAMPLE, HAPPEx, PVA
4
, G
0
)
PVES + BNL E
734
(ν
p
scattering)
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
Monte Carlo update
(Long auto correlations times)
GlobalHeatbathaka“StochasticEstimator:”
(Zero auto correlations)
Find
Á
= D

1
´
for
´
Gaussian
or
Gaug
e
or Z
2
(
Zero auto correlations!)
With
<
´
y
´
x
> =
±
yx
Algorithm
A
xy
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
Variance reduction
:
Dilution
vs hopping parameter
y
(Short distance)
Multi

grid
vs“deflation”/truncation
y
(Long distance)
Curing volume divergence
Trace versus Gauge fluctuations
Better and more source (all to all?).
Full multi

grid O(N long N) Trace?
Improving Stochastic Estimate
y
S.Collins, G. Bali, A.Schafer
“
Hunting for the strangeness ... nucleon”
x
y
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
12
Two sources of error: gauge noise and error in trace. In this
calculation, we largely eliminate the second source by calculating a
“nearlyexact”traceonfourtime

slices.
864
sources (x
12
for color/spin). A given source is nonzero on
4
sites on each of
4
time

slices.
Minimal spatial separation between sites is . Small
residual contamination is gauge

variant and averages to zero.
Equivalent to using a single stochastic source with
“extremedilution.”
Trace estimation
4
x
6
3
=
864
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
13
Preliminary Methods
ConfigurationswereprovidedbytheLHPC“SpectrumCollaboration”
anisotropic lattice with
2
dynamical flavors, Wilson fermion and gauge actions
863
configurations
64
(x
12
) inversions per configuration at the light quark mass, for the
nucleon correlators
864
(x
12
) inversions per configuration at the strange mass, for the
trace
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
14
Strange scalar form factor
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
15
Conventionally, one extracts the (e.g. zero

momentum) form factor from the
large
t
behavior of the ratio
(or from a similar expression integrated over time).
Instead, we fit the numerator directly, since this allows us
to avoid contamination from backward

propagating states, which are
problematic due to the short temporal
extent of our lattice ( ).
to explicitly take into account the contribution of (forward

propagating)
excited states.
In the following, we always treat the system
symmetrically with
Ratio approach
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
17
Direct fit
First, we perform a fit to the nucleon two

point function, of the
form
The coefficients and masses are very well

determined, since we
are required to calculate correlators from all initial times (a total of
863
x
64
=
55
,
232
).
Next, we perform a fit to the three

point function,
Here
j
1
and
j
2
are the form factors for the proton and its first excited
state, and
j
12
is a transition matrix element between them. In
practice, we expect
j
2
and
j
12
to absorb the contribution of still higher
states, and trust only
j
1
to be reliable.
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
18
Strange scalar form factor
For the renormalization

invariant quantity
f
Ts
, we estimate
where we have inserted the physical nucleon mass. The second error is the
uncertainty in relating this mass to the lattice scale, the first error is
statistical, and no other systematics are included.
Note that the matrix element in the numerator
was calculated for a world
with a
400
MeV pion. If we work consistently
in such a world by inserting our calculated
nucleon mass, the scale dependence drops
out, and we find
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
19
Momentum dependence of G
S
(q
2
)
PRELIMINARY
s
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
20
Strange axial form factor
PRELIMINARY
Results have not been renormalized.
Calculated value is distinct from zero at the
3

s
level.
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
Error = O(L
3
/
2
)
)
as L
3
)
1
For
Exact Trace
in a Connect correlator,
t = 0
t = t
f
t = t'
X
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
Most Important New Trick:
Multi

grid Variance Reduction
The signal and variance of the first term is down by
1
to
2
orders of magnitude because D
c
»
D
The Coarse level Trace for D

1
c
is as cheap to calculate as
the level down operator inverse.
This can of course be done recursively giving (I think) an
O(N log N)trace calculation to fixed tolerance.
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
HARDWARE
G
raphics hardware is well suited to highly parallel
numerical tasks.
Hardware vendors provide development tools to support
high performance computing.
NVIDIA'S CUDA offers direct access to graphics hardware
through a programming language similar to C.
Dirac

Wilson operator which runs at an effective
68
Gigaflops on the Tesla C
870
GPU.
The recently released GTX
280
GPU at
92
Gigaflops and
we expect improvement pending code optimization.
(Now
98
Gigaflops hope to get O(
150
) Gigaflops)
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
Nvidia GPU architecture
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
Two Generations Consumer vs HPC GPUs
Consumer cards
)
High Performance (HPC) GPUs
I.
8880
GTX
)
Tesla C
870
(
16
multi

processor with
8
cores each)
II. GTX
280
)
Tesla C
1060
(
30
multi

processor with
8
cores each)
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
C
870
code using
60
% of the memory bandwidth.
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
http://www.scala

lang.org/
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
Future software Plans
Need find out why we are
only saturating
60
%
of Memory bandwidth
Further educe memory traffic:
8
real number per SU(
3
)
matrix (
2
/
3
of
12
used now)
shear spinors in
4
3
blocks (
5
/
9
of used now)
Generalize to
clover Wilson
&
Domain Wall
operator (slightly better
flops/mem ratio).
DMA between GPU on Quad
system and network for cluster
Start to design
SciDAC API for many

core
technologies.
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
Tesla
10

Series: What’s the Big Deal?
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
Consumer Chip GTX
280
)
Tesla C
1060
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
1
U
Quad S
1070
System $
8
K
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
CUDA
2.0
(Compute Unified Device
Architecture)
Can compile CUDA code into highly efficient SSE

based multi

threaded C code
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
Need a GPU Dirac Propagator Farm
The Clark

Kennedy RHMC Paradox:
(Faster you go harder it is to keep up)
Analysis is now the
“
Ἀ
χιλλεύς
heel”
Solution: Dedicated Analysis farm.
GPU can deliver O(
10
) to O(
100
) gain in flops/$
Two quad Tesla
)
1
Sustained Teraflop!
Two quad Tesla @
25
K
?
´
One BG/L rack @
2
,
000
K
QCDNA
2008

Sept
5
,
2008
Rich Brower (Boston U.)
Commercial Break:
BOSTON POST DOC IN SEPT
2009
PetaAPPS/SciDAC fellow
(QCDNA in Boston Fall
2009
?)
Comments 0
Log in to post a comment