Open Source Machine Learning

journeycartΤεχνίτη Νοημοσύνη και Ρομποτική

15 Οκτ 2013 (πριν από 3 χρόνια και 8 μήνες)

65 εμφανίσεις

Open Source Machine Learning



Open Source Probabilistic Network Library




Gary Bradski

Program Manager


Systems Technology Labs
-

Intel

2

Open Source ML

What are we announcing today?


Intel is releasing a library of Open Source
Software for Machine Learning


First library is Probabilistic Network Library (PNL);

comprised of code for inference and learning using
Bayesian Networks


Research and Development was conducted in
Intel research labs in US, Russia and China


Software is released as part of Intel Open
Research Program


Tool for research in many application areas


Open Source under a BSD license


The code is free for academic and commercial use



More info:
http://www.intel.com/research/mrl/pnl

3

Open Source ML

Why is Intel involved?


Statistical Computing and Machine Learning
can change computing applications in a
considerable way


Machine Learning requires high
-
powered
processors


Ties into Intel’s research in other areas such as
wireless networking, sensor networks and
Proactive Health


4

Open Source ML

What is Machine Learning?


Machine Learning allows computers to learn from their
experiences and from gathered data


We’ve known for > 200 years that probability theory is
the right tool to model systems, but it has always been
too hard to compute. Recent advances in computing
allow calculation of complex models


Machines are good at gathering data and performing
complex analysis


Machine Learning is a sea change in development of
applications since it allows computers to be more
proactive and predictive


5

Open Source ML

Applications

of Machine Learning


Interface


Audio Visual Speech Recognition (AVSR);
natural language processing, etc.


AI


robotics, computer games, entertainment, etc.


Data Analysis


information retrieval, data mining, etc.


Biological


gene sequencing, genomics,
computational pharmacology


Computer


run time optimization


Industrial


fault diagnosis


Applications of machine learning cover a broad range


Genomics
-

matching of protein strands


Collaborative Filtering
-

personal “Google”


Drug Discovery


shortening of drug discovery cycle


Patient and elder care


wireless camera and sensor network
help monitor patients


6

Open Source ML

Open ML Components & Plan

Key:



Optimized



Implemented



Not implemented

Modeless

Model based

Unsupervised

Supervised



K
-
means



K
-
NN



Boosted decision trees



SVM



Agglomerative clustering



Spectral clustering



BayesNets: Classification



Decision trees



BayesNets: Parameter fitting



Dependency Nets



PCA



Influence diagrams



Bayesnet structure learning


Statistical
Learning

OpenSL
-

2004

Bayesian Networks

OpenPNL
-
2003

OpenML

7

Open Source ML

Model Based Machine Learning


Machine Learning can be based on Models (model
-
based) or it could be Model
-
less


In version 1.0 of OpenML Intel is focusing on Bayesian
Networks and the Probabilistic Networks which fall
under model
-
based category


The Bayesian approach provides a mathematical rule
explaining how one should change existing beliefs in
the light of new evidence


Model
-
less approaches are used for clustering and
classification


Intel will release libraries using model
-
less approaches next
year

8

Open Source ML

Applications of Model
-
less ML



Suitable for applications such as Fault Diagnosis



The system does not have a model



It collects data and clusters and classifies them



Recognition is derived from these clusters

Machine 18

Fab 11

Tolerance goes
out when
temperature >87
o

9

Open Source ML

Applications of Model
-
based ML


Our research has focused on
Bayesian Networks


Hidden Markov Models (HMM)


a
Bayesian Net
-

are widely used in
speech recognition, couple Hidden
Markov Models are used in Audio
Visual Speech Recognition (use of
visual data in speech recognition)



Open Source PNL is an optimized
infrastructure for research and
development in Model Based
Machine Learning

Audio Visual Speech Recognition

Face Recognition & Tracking

10

Open Source ML

Example: Vision Applications

Image super resolution
-

Use a Bayesian method to
develop a clear image from a small resolution picture

11

Open Source ML

Intel Systems Technology Lab

Santa Clara, CA, USA


Graphics Lab

Machine Learning

Architecture Lab

Hillsboro, OR, USA

Wireless Systems

Media

3D Graphics

Tech. Management

Beijing, PR China


China Research Center

Speech and Machine
Learning

Nizhny Novgorod, Russia

Architecture for Machine
Learning, Media, 3D Graphics,

Computer Vision


One of three major labs of Intel Corporate Technology Group


300 researchers worldwide


Focus on impact on Intel Architecture


Drive university and industry initiatives

12

Open Source ML

Why

Open Source..?


Expands our research base


Allows Intel researchers to collaborate easily
with thousands of colleagues worldwide


Remove barriers, speed up collaboration


Tap into a very large innovative community


Ability to get feedback from a large number of
developers to design future microprocessors


Chance to explore innovative usage models


Diffuse new technologies and usage
models to a wide group of early adopters

13

Open Source ML

Open Research Program


Currently four open source projects


http://www.intel.com/software/products/opensource/index.htm



OpenCV


Computer Vision Library

http://www.intel.com/research/mrl/research/opencv/


OpenRC
-

Open Research Compiler

http://ipf
-
orc.sourceforge.net/ORC
-
overview.htm


OpenLF


Open Light Fields

http://www.intel.com/research/mrl/research/lfm/


OpenAVSR


Audio Visual Speech Recognition

http://www.intel.com/research/mrl/research/avcsr.htm


14

Open Source ML


Released in June 2000


A library of 500+ computer vision algorithms,
including applications such as Face
Recognition, Face Tracking, Stereo Vision,
Camera Calibration


Highly tuned for IA


Windows and Linux Versions


Over 500,000 Downloads


Broad use in academia (450) and Industry (360)

Example: OpenCV

15

Open Source ML

More Information

Visit
Open Source ML

Web page & download at:


http://www.intel.com/research/mrl/pnl

16

Open Source ML

Backup

17

Open Source ML

Modeless and Model Based ML


Modeless



Classifiers


Clustering


Kernel estimators


Model Based


Bayesian Networks


Function fitters


Regression


Filters


We’ll use an example application from our current research

to descibe two basic approaches to machine learning:

AAA

AACACB

CBABBC

CCB

ABBC

CB

B

C

A

B

C

18

Open Source ML

Quick view of Bayesian networks

19

Open Source ML

What is a Bayesian Network?


A

Bayesian network
, or a belief network, is a graph in
which the following holds:


A set of random variables makes up nodes of the network.


A set of directed links connects pairs of nodes to denote causality
relations between variables.




Each node has a
conditional probability
distribution (CPD) that
quantifies the effects
that the parents have on
the node


Graphical Models are
more general, allowing
undirected links, mixed
directed/undirected
connections, and loops
within the graph

20

Open Source ML

Computational Advantages of

Bayesian Networks


Bayesian Networks graphically express
conditional independence

of probability
distributions.


Independencies can be exploited for large computational savings.


EXAMPLE:


Joint probability of 3 discrete variable (A,B,C) system with 5 possible values each:

P(A,B,C) = 5x5x5 table:

A

B

C

A

B

C

A

B

C

A

But a graphical model factors the probabilities taking advantage of the independencies:

A

125 parameters

55 parameters

21

Open Source ML

Causality and Bayesian Nets

Mains

Transf.

Diode

Diode

Capac.

Ammeter

Battery

Observed

Un
-
Observed

Think of Bayesian Networks as a “Circuit Diagram” of Probability
Models



The Links indicate causal effect, not direction of information flow.



Just as we can predict effects of changes on the circuit diagram,
we can predict consequences of “operating” on our probability
model diagram.

22

Open Source ML

Quick view of Decision Trees and
Statistical Boosting

23

Open Source ML

Statistical Classification

Cluster data to infer or predict properties


Example: Decision trees

Find splits that most “purify” the labeled data

AACBAABBCBCC

AACACB

CBABBC

All the way down …

AAA

AACACB

CBABBC

CCB

B

CC

ABBC

CB

B

C

A

BBC

C

BB

Prune the tree to minimize complexity

AAA

AACACB

CBABBC

CCB

ABBC

CB

B

C

The split rules are used to classify

Future data

24

Open Source ML

Statistical Classification

Boosting

Use a weak classifier such as a 1 level tree:

AACBAABBCBCC

AACACB

CBABBC

Re
-
weight the error cases and classify again;

Record weight factor “W
i
” for “i
th
” case.

Use the error weighted forest to vote

on the classification of new data

AA
C
B
A
A
BB
C
B
CC

AAAACB

CCBBBC

AA
CBAA
BBCB
CC

AACC

CCAABBBB

AACB
A
ABBC
B
C
C

AAAABBBB

CCCC

AA
CBA
ABB
CB
CC

AAAA

CBCCBBBC

A
AC
B
AAB
B
CB
C
C

AAABBB

ACCCCB

A
AC
B
AAB
B
CB
C
C

AAABBB

ACCCCB

AACB
A
ABBC
B
C
C

AAAABBBB

CCCC

AACB
A
ABBC
B
C
C

AAAABBBB

CCCC

Repeat until you have a “forest”

AA
CBAA
BBCB
CC

AACC

CCAABBBB

A
AC
B
AAB
B
CB
C
C

AAABBB

ACCCCB

AACB
A
ABBC
B
C
C

AAAABBBB

CCCC

AA
CBAA
BBCB
CC

AACC

CCAABBBB

AA
CBA
ABB
CB
CC

AAAA

CBCCBBBC

A
AC
B
AAB
B
CB
C
C

AAABBB

ACCCCB

AA
CBA
ABB
CB
CC

AAAA

CBCCBBBC

Decision
1

* W
1

A
AC
B
AAB
B
CB
C
C

AAABBB

ACCCCB

AA
CBAA
BBCB
CC

AACC

CCAABBBB

Decision
2

* W
2

Decision
N

* W
N

Weighted Sum Decision

25

Open Source ML

Application areas and libraries


26

Open Source ML

Applications of ML

Interface

Data Analysis

AI

Biometric ID

Lips+Speech

AVSR

Vision

Models

Speech

Audio

Models

Text Recog.

Natural Lang.

Action Planning

Cognitive

Modeling

Game Play

Robotics

Mapping

Neural Nets

SVM

Trees,

Boosting,

Random

forest

Reinforcement

Learning

Statistical

Regression,

ANOVA, …

Stochastic

Discrimination

Adaptive

Filters

Relational

Networks

Decision

Theory,

Influence

Diagrams

Graphical

Models/MRFs

Bayesian

Networks

Genetic

Algorithms

Industrial

Fault

Diagnosis

Process

Control

Disposition

Supply

Chain

Models of

Manufacturing

TOOLS:

Actively working on

External activity

Past work

Ramping

Key:

Information

Retrieval

Datamining

Sensor Fusion

Info

Filtering

Collaborative

Filtering

Biologic

Proteomics

Genomics

Metabolics

Gene

Sequencing

Epidemiology

Computational

Pharmacology

Computer

Trace

Compression

Compiler

Optimization

Binary Trans

Adaptation

Run Time

Optimization

27

Open Source ML

Game Play

Cognitive

Modeling

Probabilistic Network Library

Application

Driven

Drive into

Future Hardware

Lips+Speech

AVSR

Information

Retrieval

Trace

Compression

Learned

Control

Vision

Models

Gene

Sequencing

Epidemiology

Genomics

Interface

Data Mining

“AI”

Bayesian Network

Engine

Workload

Analysis

Architecture

Intel

Universities

Robotics

Drive into

hardware

Chipset

Platform

CPU Instructions

cache

Create New Architectures

Modify Existing Architectures

Theories &

Algorithms

Structure

Learning

Decision &

Utility theory

Dynamic BN

MRFs

Gibbs Sampling

Particle Filter

Junction Tree

Factor Graph

EM

Reinforcement


Loopy Belief

Variational

Data Handling

Cross Validation

Plates

Info

Filtering

Speech

Audio

Models

Natural Lang.

Biometric ID

Process

Control

Disposition

Supply

Chain

Fault

Diagnosis

Models of

Manufacturing

Industrial

28

Open Source ML

Open Source Computer Vision (OpenCV)

29

Open Source ML

Machine Learning Library (OpenMLL)

AACBAABBCBCC

AAA

AACACB

CBABBC

CCB

B

CC

ABBC

CB

B

C

A

BBC

C

BB

CLASSIFICATION / REGRESSION

CART

Statistical Boosting

MART

Random Forests

Stochastic Discrimination

Logistic

SVM

K
-
NN


CLUSTERING

K
-
Means

Spectral Clustering

Agglomerative Clustering

LDA, SVD, Fisher Discriminate


TUNING/VALIDATION

Cross validation

Bootstrapping

Sampling methods

Alpha Q1’04, Beta Q4’04

30

Open Source ML

Optimization (
Lib ?
)

Large
-
scale
Optimizations

Continuous

Mixed

Discrete

Constrained

Unconstrained

Linear

Nonlinear

Nonlinear

LP

QP

NLP

Interior

Point

Active
Set

Branch and
Bound

Conjugate
Gradient,
Newton

Sim. Anealing,
Genetic Alg,
Stoch. Search,
Network
Programming,

Dynamic

Programming

Combinatorial
Optimizations

Domain
Reduction,
Constraints
Propagation

SQP

Simplex

Problems looking at
:

Circuit layout; Device geometry; Chemical binding synthesis