Open Source Machine Learning
Open Source Probabilistic Network Library
Gary Bradski
Program Manager
Systems Technology Labs

Intel
2
Open Source ML
What are we announcing today?
Intel is releasing a library of Open Source
Software for Machine Learning
First library is Probabilistic Network Library (PNL);
comprised of code for inference and learning using
Bayesian Networks
Research and Development was conducted in
Intel research labs in US, Russia and China
Software is released as part of Intel Open
Research Program
Tool for research in many application areas
Open Source under a BSD license
The code is free for academic and commercial use
More info:
http://www.intel.com/research/mrl/pnl
3
Open Source ML
Why is Intel involved?
Statistical Computing and Machine Learning
can change computing applications in a
considerable way
Machine Learning requires high

powered
processors
Ties into Intel’s research in other areas such as
wireless networking, sensor networks and
Proactive Health
4
Open Source ML
What is Machine Learning?
Machine Learning allows computers to learn from their
experiences and from gathered data
We’ve known for > 200 years that probability theory is
the right tool to model systems, but it has always been
too hard to compute. Recent advances in computing
allow calculation of complex models
Machines are good at gathering data and performing
complex analysis
Machine Learning is a sea change in development of
applications since it allows computers to be more
proactive and predictive
5
Open Source ML
Applications
of Machine Learning
Interface
–
Audio Visual Speech Recognition (AVSR);
natural language processing, etc.
AI
–
robotics, computer games, entertainment, etc.
Data Analysis
–
information retrieval, data mining, etc.
Biological
–
gene sequencing, genomics,
computational pharmacology
Computer
–
run time optimization
Industrial
–
fault diagnosis
Applications of machine learning cover a broad range
Genomics

matching of protein strands
Collaborative Filtering

personal “Google”
Drug Discovery
–
shortening of drug discovery cycle
Patient and elder care
–
wireless camera and sensor network
help monitor patients
6
Open Source ML
Open ML Components & Plan
Key:
•
Optimized
•
Implemented
•
Not implemented
Modeless
Model based
Unsupervised
Supervised
•
K

means
•
K

NN
•
Boosted decision trees
•
SVM
•
Agglomerative clustering
•
Spectral clustering
•
BayesNets: Classification
•
Decision trees
•
BayesNets: Parameter fitting
•
Dependency Nets
•
PCA
•
Influence diagrams
•
Bayesnet structure learning
Statistical
Learning
OpenSL

2004
Bayesian Networks
OpenPNL

2003
OpenML
7
Open Source ML
Model Based Machine Learning
Machine Learning can be based on Models (model

based) or it could be Model

less
In version 1.0 of OpenML Intel is focusing on Bayesian
Networks and the Probabilistic Networks which fall
under model

based category
The Bayesian approach provides a mathematical rule
explaining how one should change existing beliefs in
the light of new evidence
Model

less approaches are used for clustering and
classification
Intel will release libraries using model

less approaches next
year
8
Open Source ML
Applications of Model

less ML
•
Suitable for applications such as Fault Diagnosis
•
The system does not have a model
•
It collects data and clusters and classifies them
•
Recognition is derived from these clusters
Machine 18
Fab 11
Tolerance goes
out when
temperature >87
o
9
Open Source ML
Applications of Model

based ML
Our research has focused on
Bayesian Networks
Hidden Markov Models (HMM)
–
a
Bayesian Net

are widely used in
speech recognition, couple Hidden
Markov Models are used in Audio
Visual Speech Recognition (use of
visual data in speech recognition)
Open Source PNL is an optimized
infrastructure for research and
development in Model Based
Machine Learning
Audio Visual Speech Recognition
Face Recognition & Tracking
10
Open Source ML
Example: Vision Applications
Image super resolution

Use a Bayesian method to
develop a clear image from a small resolution picture
11
Open Source ML
Intel Systems Technology Lab
Santa Clara, CA, USA
Graphics Lab
Machine Learning
Architecture Lab
Hillsboro, OR, USA
Wireless Systems
Media
3D Graphics
Tech. Management
Beijing, PR China
China Research Center
Speech and Machine
Learning
Nizhny Novgorod, Russia
Architecture for Machine
Learning, Media, 3D Graphics,
Computer Vision
•
One of three major labs of Intel Corporate Technology Group
•
300 researchers worldwide
•
Focus on impact on Intel Architecture
•
Drive university and industry initiatives
12
Open Source ML
Why
Open Source..?
Expands our research base
Allows Intel researchers to collaborate easily
with thousands of colleagues worldwide
Remove barriers, speed up collaboration
Tap into a very large innovative community
Ability to get feedback from a large number of
developers to design future microprocessors
Chance to explore innovative usage models
Diffuse new technologies and usage
models to a wide group of early adopters
13
Open Source ML
Open Research Program
Currently four open source projects
http://www.intel.com/software/products/opensource/index.htm
OpenCV
–
Computer Vision Library
http://www.intel.com/research/mrl/research/opencv/
OpenRC

Open Research Compiler
http://ipf

orc.sourceforge.net/ORC

overview.htm
OpenLF
–
Open Light Fields
http://www.intel.com/research/mrl/research/lfm/
OpenAVSR
–
Audio Visual Speech Recognition
http://www.intel.com/research/mrl/research/avcsr.htm
14
Open Source ML
Released in June 2000
A library of 500+ computer vision algorithms,
including applications such as Face
Recognition, Face Tracking, Stereo Vision,
Camera Calibration
Highly tuned for IA
Windows and Linux Versions
Over 500,000 Downloads
Broad use in academia (450) and Industry (360)
Example: OpenCV
15
Open Source ML
More Information
Visit
Open Source ML
Web page & download at:
http://www.intel.com/research/mrl/pnl
16
Open Source ML
Backup
17
Open Source ML
Modeless and Model Based ML
Modeless
Classifiers
Clustering
Kernel estimators
Model Based
Bayesian Networks
Function fitters
Regression
Filters
We’ll use an example application from our current research
to descibe two basic approaches to machine learning:
AAA
AACACB
CBABBC
CCB
ABBC
CB
B
C
A
B
C
18
Open Source ML
Quick view of Bayesian networks
19
Open Source ML
What is a Bayesian Network?
A
Bayesian network
, or a belief network, is a graph in
which the following holds:
A set of random variables makes up nodes of the network.
A set of directed links connects pairs of nodes to denote causality
relations between variables.
Each node has a
conditional probability
distribution (CPD) that
quantifies the effects
that the parents have on
the node
Graphical Models are
more general, allowing
undirected links, mixed
directed/undirected
connections, and loops
within the graph
20
Open Source ML
Computational Advantages of
Bayesian Networks
Bayesian Networks graphically express
conditional independence
of probability
distributions.
Independencies can be exploited for large computational savings.
EXAMPLE:
Joint probability of 3 discrete variable (A,B,C) system with 5 possible values each:
P(A,B,C) = 5x5x5 table:
A
B
C
A
B
C
A
B
C
A
But a graphical model factors the probabilities taking advantage of the independencies:
A
125 parameters
55 parameters
21
Open Source ML
Causality and Bayesian Nets
Mains
Transf.
Diode
Diode
Capac.
Ammeter
Battery
Observed
Un

Observed
Think of Bayesian Networks as a “Circuit Diagram” of Probability
Models
•
The Links indicate causal effect, not direction of information flow.
•
Just as we can predict effects of changes on the circuit diagram,
we can predict consequences of “operating” on our probability
model diagram.
22
Open Source ML
Quick view of Decision Trees and
Statistical Boosting
23
Open Source ML
Statistical Classification
Cluster data to infer or predict properties
Example: Decision trees
Find splits that most “purify” the labeled data
AACBAABBCBCC
AACACB
CBABBC
All the way down …
AAA
AACACB
CBABBC
CCB
B
CC
ABBC
CB
B
C
A
BBC
C
BB
Prune the tree to minimize complexity
AAA
AACACB
CBABBC
CCB
ABBC
CB
B
C
The split rules are used to classify
Future data
24
Open Source ML
Statistical Classification
Boosting
Use a weak classifier such as a 1 level tree:
AACBAABBCBCC
AACACB
CBABBC
Re

weight the error cases and classify again;
Record weight factor “W
i
” for “i
th
” case.
Use the error weighted forest to vote
on the classification of new data
AA
C
B
A
A
BB
C
B
CC
AAAACB
CCBBBC
AA
CBAA
BBCB
CC
AACC
CCAABBBB
AACB
A
ABBC
B
C
C
AAAABBBB
CCCC
AA
CBA
ABB
CB
CC
AAAA
CBCCBBBC
A
AC
B
AAB
B
CB
C
C
AAABBB
ACCCCB
A
AC
B
AAB
B
CB
C
C
AAABBB
ACCCCB
AACB
A
ABBC
B
C
C
AAAABBBB
CCCC
AACB
A
ABBC
B
C
C
AAAABBBB
CCCC
Repeat until you have a “forest”
AA
CBAA
BBCB
CC
AACC
CCAABBBB
A
AC
B
AAB
B
CB
C
C
AAABBB
ACCCCB
AACB
A
ABBC
B
C
C
AAAABBBB
CCCC
AA
CBAA
BBCB
CC
AACC
CCAABBBB
AA
CBA
ABB
CB
CC
AAAA
CBCCBBBC
A
AC
B
AAB
B
CB
C
C
AAABBB
ACCCCB
AA
CBA
ABB
CB
CC
AAAA
CBCCBBBC
Decision
1
* W
1
A
AC
B
AAB
B
CB
C
C
AAABBB
ACCCCB
AA
CBAA
BBCB
CC
AACC
CCAABBBB
Decision
2
* W
2
Decision
N
* W
N
Weighted Sum Decision
25
Open Source ML
Application areas and libraries
26
Open Source ML
Applications of ML
Interface
Data Analysis
AI
Biometric ID
Lips+Speech
AVSR
Vision
Models
Speech
Audio
Models
Text Recog.
Natural Lang.
Action Planning
Cognitive
Modeling
Game Play
Robotics
Mapping
Neural Nets
SVM
Trees,
Boosting,
Random
forest
Reinforcement
Learning
Statistical
Regression,
ANOVA, …
Stochastic
Discrimination
Adaptive
Filters
Relational
Networks
Decision
Theory,
Influence
Diagrams
Graphical
Models/MRFs
Bayesian
Networks
Genetic
Algorithms
Industrial
Fault
Diagnosis
Process
Control
Disposition
Supply
Chain
Models of
Manufacturing
TOOLS:
Actively working on
External activity
Past work
Ramping
Key:
Information
Retrieval
Datamining
Sensor Fusion
Info
Filtering
Collaborative
Filtering
Biologic
Proteomics
Genomics
Metabolics
Gene
Sequencing
Epidemiology
Computational
Pharmacology
Computer
Trace
Compression
Compiler
Optimization
Binary Trans
Adaptation
Run Time
Optimization
27
Open Source ML
Game Play
Cognitive
Modeling
Probabilistic Network Library
Application
Driven
Drive into
Future Hardware
Lips+Speech
AVSR
Information
Retrieval
Trace
Compression
Learned
Control
Vision
Models
Gene
Sequencing
Epidemiology
Genomics
Interface
Data Mining
“AI”
Bayesian Network
Engine
Workload
Analysis
Architecture
Intel
Universities
Robotics
Drive into
hardware
Chipset
Platform
CPU Instructions
cache
Create New Architectures
Modify Existing Architectures
Theories &
Algorithms
Structure
Learning
Decision &
Utility theory
Dynamic BN
MRFs
Gibbs Sampling
Particle Filter
Junction Tree
Factor Graph
EM
Reinforcement
Loopy Belief
Variational
Data Handling
Cross Validation
Plates
Info
Filtering
Speech
Audio
Models
Natural Lang.
Biometric ID
Process
Control
Disposition
Supply
Chain
Fault
Diagnosis
Models of
Manufacturing
Industrial
28
Open Source ML
Open Source Computer Vision (OpenCV)
29
Open Source ML
Machine Learning Library (OpenMLL)
AACBAABBCBCC
AAA
AACACB
CBABBC
CCB
B
CC
ABBC
CB
B
C
A
BBC
C
BB
CLASSIFICATION / REGRESSION
CART
Statistical Boosting
MART
Random Forests
Stochastic Discrimination
Logistic
SVM
K

NN
CLUSTERING
K

Means
Spectral Clustering
Agglomerative Clustering
LDA, SVD, Fisher Discriminate
TUNING/VALIDATION
Cross validation
Bootstrapping
Sampling methods
Alpha Q1’04, Beta Q4’04
30
Open Source ML
Optimization (
Lib ?
)
Large

scale
Optimizations
Continuous
Mixed
Discrete
Constrained
Unconstrained
Linear
Nonlinear
Nonlinear
LP
QP
NLP
Interior
Point
Active
Set
Branch and
Bound
Conjugate
Gradient,
Newton
Sim. Anealing,
Genetic Alg,
Stoch. Search,
Network
Programming,
Dynamic
Programming
Combinatorial
Optimizations
Domain
Reduction,
Constraints
Propagation
SQP
Simplex
Problems looking at
:
Circuit layout; Device geometry; Chemical binding synthesis
Comments 0
Log in to post a comment