4.30 Machine Learning

kettledoctorΤεχνίτη Νοημοσύνη και Ρομποτική

15 Οκτ 2013 (πριν από 3 χρόνια και 11 μήνες)

119 εμφανίσεις

Machine Learning Group

University College Dublin

4.30 Machine Learning

Pádraig Cunningham

Intro to ML

2

Outline


Week 1


Introduction & General Overview of Matrix Decomposition


Nearest Neighbour Classifiers


Tutorial


Week 2: Neural Networks


Simple Perceptron, Backpropagation


Other Architectures: Hopfield, Self
-
Organising Maps


Tutorial


Week 3


Support Vector Machines


Kernel Methods & Evaluation


Tutorial


Week 4


Decision Trees


Naïve Bayes


Tutorial

Intro to ML

3

Outline


Week 5: Ensemble Techniques


Bagging


Boosting


Tutorial


Week 6: Unsupervised Learning


Hierarchical Clustering


Other Clustering Algorithms:
k
-
Means, Spectral Clustering


Tutorial


Week 7: Dimension Reduction


Principle Components Analysis, LSI, SVD


Feature Selection


Tutorial


Later


2 revision tutorials



Coursework

3
-
4 pieces, 15 hours, Weka & Java

Intro to ML

4

Why Machine Learning


Recent progress in algorithms and theory


Loads of processing power


Computational power is available Growing flood of
online data


Amazon


Google



Intro to ML

5

3 niches for ML


Data mining: using historical data to improve

decisions


medical records


medical knowledge


Software applications that cannot be programmed by hand.


a
utonomous driving


speech recognition


i.e. weak theory domains.


S
elf customising programs


Personalised Newspaper


E
-
mail filtering


Intro to ML

6

Data
-
mining in medical records

Quality Assurance in Maternity Care.

http://svr
-
www.eng.cam.ac.uk/projects/qamc/qamc.html

Intro to ML

7

Rule Learning

The QAMC system uses
Decision /trees

(
I think!)


It is also possible to extract rules from data:
-





If

No previous normal delivery, and



Abnormal 2
nd

Trimester Ultrasound, and



Malpresentation at admission


Then




Probability of Emergency C
-
Section is 0.6




Over training dat 26/41 = 0.63


O
ver test data: 12/20 = 0.6


<Rule taken from Machine Learning by Tom Mitchell>

Intro to ML

8

Spam Filtering


For Machine Learning…


Lots of training data


High dimensionality data (lots of features)


Email is a diverse concept


Porn, mortgage, religion, cheap drugs…


Work, family, play…


Spam Filtering is a challenge because…


Arms race: spammers vs filters


False Positives are unacceptable


Spam is a changing concept


Intro to ML

9

ALVIN

Problems too difficult to
program by han
d


Alvin drives at 70mph on
motorways

Intro to ML

10

Autonomous Vehicles


DARPA Grand Challenge 2005


Winner: Stanley from Stanford


Various modules use ML

Intro to ML

11

SmartRadio


Internet
-
based music radio


Personalised


Collaborative Recommendation


Content
-
Based Recommendation


supported by knowledge discovery from log data


supported by feature extraction from sound files


feature seleciton


refinement

Intro to ML

12

Smart Radio


S
mart Radio is a web
based client
-
server
music application
which allows listeners
build, manage and
share music
programmes


The project was set up to look at a possible model for:


The regulated distribution of music on the web


A personalised stream of music service


To provide an architecture and data to test our data mining and collaborative
filtering algorithms


Intro to ML

13

ML Dimensions


Lazy v’s Eager


k
-
NN v’s rule learning


Supervised v’s Unsupervised


Symbolic v’s Sub
-
symbolic