Introduction
SelfOrganizing Map
Algorithms
Results
Practical Tips
Multivariate Analysis Techniques at the LHC
Eric Malmi
Helsinki Institute of Physics/Adaptive Informatics Research Centre,
Aalto University (Helsinki University of Technology)
January 6,2010
1/25
Introduction
SelfOrganizing Map
Algorithms
Results
Practical Tips
Outline
Introduction
SelfOrganizing Map
Algorithms
Neural Networks
Support Vector Machines
Gene Expression Programming
Multiclass classication
Results
Practical tips
2/25
Introduction
SelfOrganizing Map
Algorithms
Results
Practical Tips
Introduction
Mikael Kuusela,Jerry W.Lamsa,Eric Malmi,Petteri Mehtala,and
Risto Orava.Multivariate techniques for identifying diractive
interactions at the LHC.International Journal of Modern Physics
A,to appear
3/25
Introduction
SelfOrganizing Map
Algorithms
Results
Practical Tips
Data
Protonproton scattering events in
Single diractive (SD)
Double diractive (DD)
Central diractive (CD)
Nondiractive processes (ND)
Generated by PYTHIA (SD,ND) and PHOJET (DD,CD)
Monte Carlo generators
12,000 events of each category:10,000 for training and 2,000
for testing (SD x2)
4/25
Introduction
SelfOrganizing Map
Algorithms
Results
Practical Tips
Data
23 variables => 23 dimensional
data vectors to be classied
Instead of the usual
signal/background separation,our
task is to determine the diraction
types of the events,i.e.to give
labels in the range 14 for the data
vectors
This a multiclass pattern
recognition (=classication)
problem
Variable
Comments
E
zdcl
ZDC energy left
E
casl
CASTOR energy left
E
h
HF energy left
t2ml
T2 multiplicity left
t1ml
T1 multiplicity left
fwdm1l
FSC multiplicity left planes 12
fwdm2l
FSC multiplicity left planes 38
fwdm3l
FSC multiplicity left planes 910
fwd1stl
1st FSC plane hit left
fwdmaxl
FSC plane with the max.hits left
e
zdcr
ZDC energy right
e
casr
CASTOR energy right
e
hfr
HF energy right
t2mr
t2 multiplicity right
t1mr
t1 multiplicity right
fwdm1r
FSC multiplicity right planes 12
fwdm2r
FSC multiplicity right planes 38
fwdm3r
FSC multiplicity right planes 910
fwd1str
1st FSC plane hit right
fwdmaxr
FSC plane with the max.hits right
endc
l
CMS endcap energy left
endc
r
CMS endcap energy right
barrel
CMS barrel energy
5/25
Introduction
SelfOrganizing Map
Algorithms
Results
Practical Tips
Exploratory data analysis by the selforganizing map
The selforganizing map (SOM) is a computational method
which can be used,e.g.,for dimensionality reduction and data
visualization
SOM conducts a nonlinear mapping from the 23 dimensional
space to two dimensional map
Gives us a qualitative view of the data
Which event types are easily distinguished and which are
overlapping
Which are the relevant features (detectors) for distinguishing
certain event types
6/25
Introduction
SelfOrganizing Map
Algorithms
Results
Practical Tips
7/25
Introduction
SelfOrganizing Map
Algorithms
Results
Practical Tips
8/25
Introduction
SelfOrganizing Map
Algorithms
Results
Practical Tips
Neural Networks
We use the multilayer perceptron (MLP) network:
MLP consists of an input layer,an
output layer and one or more hidden
layers of neurons
1.Data vector x is fed to the input layer
consisting of 23 nodes.
2.From there it propagates to the
hidden layer where we apply the
transfer function f(x)=tanh(x).
3.Finally it goes to the output node(s)
which denes the event category
y = Bf (Ax +a) +b
Network is trained by the backpropagation algorithm to give label
1 to the signal events and 0 to the background
9/25
Introduction
SelfOrganizing Map
Algorithms
Results
Practical Tips
Support Vector Machines
The idea in SVM is to nd a hyperplane that separates two
dierent data samples with the largest possible margin
10/25
Introduction
SelfOrganizing Map
Algorithms
Results
Practical Tips
Support Vector Machines
Usually the data vectors are rst projected
into a higher dimensional space
However,we only need to dene the dot
product,called the kernel function
(x;y),in the highdimensional space
(the kernel trick)
We use the popular radial basis function:
(x;y) = exp( jjx yjj
2
)
Finding of the hyperplane is a quadratic
optimization problem
11/25
Introduction
SelfOrganizing Map
Algorithms
Results
Practical Tips
Gene Expression Programming
Gene Expression Programming (GEP),introduced in 2001,is
an evolutionary algorithm that has similarities with genetic
algorithms (GA) and genetic programming (GP)
The main idea is to mimic biological evolution to evolve a
population of simple text strings called chromosomes
The chromosomes,in turn,encode complex expression trees
that can be used for classication
For each generation of chromosomes we select the best
individuals and apply crossover and mutation to produce the
ospring
12/25
Introduction
SelfOrganizing Map
Algorithms
Results
Practical Tips
Gene Expression Programming
Nodes of the expression trees consist of
mathematical functions,input variables
and random constants.E.g.
*/aQbcaacb
13/25
Introduction
SelfOrganizing Map
Algorithms
Results
Practical Tips
Gene Expression Programming
Nodes of the expression trees consist of
mathematical functions,input variables
and random constants.E.g.
*/aQbcaacb
14/25
Introduction
SelfOrganizing Map
Algorithms
Results
Practical Tips
Gene Expression Programming
Nodes of the expression trees consist of
mathematical functions,input variables
and random constants.E.g.
*/aQbcaacb
15/25
Introduction
SelfOrganizing Map
Algorithms
Results
Practical Tips
Gene Expression Programming
Nodes of the expression trees consist of
mathematical functions,input variables
and random constants.E.g.
*/aQbcaacb
16/25
Introduction
SelfOrganizing Map
Algorithms
Results
Practical Tips
Gene Expression Programming
Nodes of the expression trees consist of
mathematical functions,input variables
and random constants.E.g.
*/aQbcaacb
17/25
Introduction
SelfOrganizing Map
Algorithms
Results
Practical Tips
Gene Expression Programming
Nodes of the expression trees consist of
mathematical functions,input variables
and random constants.E.g.
*/aQbcaacb
18/25
Introduction
SelfOrganizing Map
Algorithms
Results
Practical Tips
Gene Expression Programming
Nodes of the expression trees consist of
mathematical functions,input variables
and random constants.E.g.
*/aQbcaacb
19/25
Introduction
SelfOrganizing Map
Algorithms
Results
Practical Tips
Gene Expression Programming
Nodes of the expression trees consist of
mathematical functions,input variables
and random constants.E.g.
*/aQbcaacb
20/25
Introduction
SelfOrganizing Map
Algorithms
Results
Practical Tips
Gene Expression Programming
Nodes of the expression trees consist of
mathematical functions,input variables
and random constants.E.g.
*/aQbcaacb
21/25
Introduction
SelfOrganizing Map
Algorithms
Results
Practical Tips
Gene Expression Programming
Nodes of the expression trees consist of
mathematical functions,input variables
and random constants.E.g.
*/aQbcaacb
Advantages of GEP are
Every chromosome encodes a valid expression tree )
eciency
It is not a black box in the same way as the NN
We get an idea of which are the important variables
(detectors)
Mimics the natural evolution more consistently:
chromosome $genotype,expression tree $ phenotype
22/25
Introduction
SelfOrganizing Map
Algorithms
Results
Practical Tips
MultiClass Classication
Ordered binarization
We train the following classiers:ND vs.fCD,SD,DDg,
CD vs.fSD,DDg and SD vs.DD
An event is fed to these classiers one by one in the same
order until one classier outputs label 1
Gives good results in case some events are easily distinguished
(in our case the ND events)
For the MLP network we can use several output nodes and
see which one gives the largest value
23/25
Introduction
SelfOrganizing Map
Algorithms
Results
Practical Tips
Results
Average eciencies of dierent algorithms (left) and the
performance of the NN ordered binarization (right).
Method
<Eciency>
GEP
92.49
SVM
94.21
NN
94.54
RealnPred
DD SD CD ND
DD
87.60 12.05 0.35 0.00
SD
2.15 95.20 2.58 0.07
CD
0.00 4.25 95.75 0.00
ND
0.15 0.25 0.00 99.60
Purities
97.44 85.19 97.03 99.93
The results have been obtained optimizing the total accuracy (the
probability that an event of random category is classied correctly)
24/25
Introduction
SelfOrganizing Map
Algorithms
Results
Practical Tips
Practical Tips for Data Analysis
1.Visualize with the selforganizing map
2.Normalize the data
3.Know your goals { do you want a high eciency or a high
purity?
25/25
Enter the password to open this PDF file:
File name:

File size:

Title:

Author:

Subject:

Keywords:

Creation Date:

Modification Date:

Creator:

PDF Producer:

PDF Version:

Page Count:

Preparing document for printing…
0%
Comments 0
Log in to post a comment