Machine Learning as a Service

milkygoodyearΤεχνίτη Νοημοσύνη και Ρομποτική

14 Οκτ 2013 (πριν από 3 χρόνια και 5 μήνες)

73 εμφανίσεις

?

•  PSI is not a web service, but a specification for many web
services. It defines a small number of different kinds of resource
that can represent a wide variety of machine learning activities:
•  datasets (called relations) and their attributes
•  learners, which analyse and model data and produce.
•  predictors, which analyse new data to make inferences about it
•  To support different kinds of data and different learning tasks, each
resource must be able to express the form of data it emits or
ingests
•  PSI uses a schema language to express differences between
resources. Schema is the 'glue' that allows different datasets to be
safely composed with learning algorithms, and for predictors to be
selected and compared—or combined—on the same data.
Flexible and Federated
⇒  Schema allows the PSI protocol to be flexible
⇒  Flexibility supports multiple, different services
⇒  The common interface allows these services to be combined into
federated data analysis and machine learning solutions.


The PSI approach has potential benefits for:
•  machine learning researchers, by allowing greater collaboration
and easier use of newly developed analysis techniques; and
•  industry, by supporting the wider use of machine learning and the
simpler delivery of commercial machine learning services.
Making machine learning and data analysis technologies easier to access and combine,
for both the research community and industry.
Machine Learning as a Service
Towards a Web of Inference
Research Excellence in ICT
Wealth Creation for Australia
•  Machine learning is used to model, deduce patterns in, and make
predictions about data or future events
•  The Protocols and Structures for Inference (PSI) project is
developing a flexible standard for presenting data, learning
algorithms and the predictive models they produce as resources
accessible over the web
•  The standard can support a federated collection of different
machine learning services
•  The project is a collaboration between Canon Information Systems
Research Australia and the ANU
•  Existing machine learning software packages are complex, so are:
•  difficult for less technical people to use individually
•  difficult for everyone to use collectively to combine the best
features from different packages.
•  A simplified programming interface via the web can help
•  Existing web-based machine learning services offer only a tiny
fraction of the range of different techniques and cannot be
combined to solve difficult tasks that require the simultaneous use
of multiple techniques.
Approach
James Montgomery and Mark Reid are part of NICTA’s Machine
Learning Research Group, which aims to turn machine learning into an
engineering discipline and an everyday tool of ICT developers.


Technical Contacts:
James.Montgomery@nicta.com.au
Mark.Reid@nicta.com.au
Predict
Train
Attribute
1
Attribute
2
Attribute
n
Predicted
value
Relation
resource
instance
output
schema
resource
output
schema
output
schema
instance
representations
Learner
task
schema
resource
Predictor
input
schema
update schema
output
schema
resource
instance
Attributes
output
schema
resource
Attributes
output
schema
resource
instance
Update
Prediction API
Existing machine learning services each offer a machine
learning service: they individually solve only a small number
of problems and cannot communicate with other services.
A B C D
A
A
A
A
A
http://example.org/data
http://.example.com/predictor
http://example.com/learner
More information:
http://psi.cecs.anu.edu.au
Try it out!
http://psi.cecs.anu.edu.au/demo
This research is supported under Australian Research Council's Linkage Projects
funding scheme (project number LP0991635) as a collaboration between the Australian
National University and Canon Information Systems Research Australia.
There are three typical workflows in PSI: training, prediction
and updating a predictive model. Schema describes the
output of attributes and the required inputs to learners and
predictors. It is used to safely compose parts of the system.
Flexibility
Service
variety
Federated
solutions
Learner
A common interface allows data sources to be connected
to data analysis (machine learning) algorithms, and for
clients to easily use the predictive models produced.
Fundamental research on:
•  Learning in large scale networks
•  Integrating adaptor grammars with graphical models
•  Efficient mechanisms to solve graphical models
•  Prediction markets
•  Convexification of machine learning problems
•  Loss functions and their relationships
•  Kernel methods for heterogeneous data
•  Large scale Bayesian inference
•  Evaluation of machine learning methods
•  Empirical psychology of using machine learning
•  Machine learning as a service.









Approach
•  Look for relations
•  Pluralistic not monistic
•  No one right solution
•  Work on specific projects as well as fundamental science
behind machine learning.
Democratising Machine Learning
Machine Learning
The Problem
Research Excellence in ICT
Wealth Creation for Australia
[Insert image or gragh]
[Insert image or gragh]
[Insert image or gragh]
Machine Learning (ML) – the science of big data analytics
•  It finds patterns in, and makes sense of data
•  ML is powerful and widely used, but it usually needs
PhD trained experts to work.
Goal: Make Machine Learning usable by all
•  How to match problem to technique
•  How to avoid re-inventing the wheel.



NICTA’s ML group
•  Works on specific projects with particular classes of data
(see examples below).
•  Studies fundamental principles behind the use of ML:
•  how to formalise real problems
•  how to understand their relationship to each other
•  how people actually use the results of machine
learning.
[separator panels between images]
[separator panels between images]
This poster summarises some of the work of NICTA’s Machine Learning
Research Group. Specific project activities are presented separately.

Contact: Bob.Williamson@nicta.com.au
Project Data Type Real Problem ML Problem Progress
Geothermal Spatio-temporal Fuse multiple data sources to
identify likely good geothermal
energy sites.
Scaling up Bayesian methods;
Integrating data of different
types.
Online data portal built;
Joint Bayesian inversion from
multiple data sources.
Social
Recommendation
Network data;
structured; real-time
Exploit social network structure
to make better
recommendations.
Developing optimisation
criteria that are effective and
efficiently optimisable.
Beats state of the art; contract
with e-book retailer.
Topic and
Sentiment Mining
Unstructured natural
language
Infer topics and sentiment from
unstructured news, blogs, &
tweets.
Segmentation of text; topic
modelling; sentiment
extraction.
Novel grammar-based and
differential topic models
improve topic extraction; faster
(also GPU version).
Water Pipes Heterogeneous (many
factors associated with
water pipes)
Predict water mains most likely
to fail.
Building scalable complex
probabilistic models.
Beats state of the art; trials with
major water distributor.
Enterprise Big
Data
Very large scale (Peta-
bytes), multiple
systems
Extract valuable business
intelligence from customer
records.
Scaling extant ML methods to
work practically at this scale.
Methods deployed and used at
scale in major financial institution.
Lens Structured natural
language (world’s
patent corpus)
Enhance the lens (patent
explorer) using advanced ML
methods.
Computer based
understanding of patent
claims; structured content
extraction.
New Project: initial work on
author resolution, citation
extraction, and topic
visualisation.
Solar Forecasting Spatio-temporal;
distributed; real-time;
heterogeneous; vision
Real-time prediction energy
output of large-scale distributed
photovoltaic systems.
Heterogeneity of data
sources; scale; choice of error
measures.
New Project: initial work on data
collection and infrastructure.
We're building an interactive visualisation
tool to sift through thousands of news
articles, blogs, and tweets to understand
the key topics and opinions they contain.
OpinionWatch
Our tool:
Research Excellence in ICT
Wealth Creation for Australia
This OpinionWatch project is part of NICTA’s Machine Learning Group,
which is turning Machine Learning into an Engineering Discipline and an
everyday tool of ICT developers.


Technical Contacts:
Wray.Buntine@nicta.com.au
Scott.Sanner@nicta.com.au
Business Contact:
Michelle.Carden@nicta.com.au
•  Government: understanding public opinion
towards existing and proposed policies.
•  Marketing: mining consumer opinion about
products and those of competitors.
•  Corporate: tracking media mentions to better
understand corporate image.
Use cases:
We’ve invested years of research into text analytics:
•  Topic modelling
•  Named entity recognition
•  Keyphrase extraction
•  Sentiment analysis.
We’ve integrated this analysis into a multi-
perspective visualisation that allows the user to
interactively drill down to find specific information
they are seeking.
How it works:
We’re actively trialing our software with industry
partners and we’ve worked with the DSTO to
use OpinionWatch to aggregate expert opinion
to better inform decision-makers.

Are you drowning in text? Do you want to trial
OpinionWatch?
Present and future:
Applying machine learning techniques for achieving real-
time condition-informed bridge structural health monitoring.
Machine Learning for Structural Health
Monitoring – Bridge Damage Prediction
Research Excellence in ICT
Wealth Creation for Australia
[Insert image or gragh]
[Insert image or gragh]
[separator panels between images]
[separator panels between images]
This Structural Health Monitoring project is part of NICTA’s Machine
Learning Research Group, which is turning Machine Learning into an
Engineering Discipline and an everyday tool of ICT developers.
Technical Contact:
Fang.Chen@nicta.com.au

Business Contact:
Rob.Fitzpatrick@nicta.com.au
×  Current bridge maintenance strategy is time-based, failure-driven,
reactive
×  Regular human visual inspection (e.g., every 2 years)
×  Lack of remaining life prediction / future maintenance cost estimation
  We offer: real-time condition-informed proactive maintenance strategy
  Failure warning / prevention
  Prediction on remaining life and future cost
Challenges, Research Topics, Results and Impacts
•  Partners
  RMS (Road Maritime Service)
  ANSHM (Australian Network of
Structural Health Monitoring)
•  Challenges
  How to cover all the possible
failure patterns
  Environmental impact
  Remaining life estimation
  Sparse accident data
•  Research topics
  Survival analysis
Problems and Opportunities
  Vibration system
  Bayesian Nonparametric method
  Unsupervised regression
•  Results on real-world data
  98% / 86% classification
accuracies achieved by
supervised / unsupervised
learning methods
•  Impacts
  Real-time monitoring
  Ease maintenance burden
  Multi-million dollars on severe
damage repair can be saved
  Warning for preventing failure
Scene Understanding by Labeling Pixels
[He et al., 2004; Shotton et al., 2006; Gould et al., 2009]

•  The goal of scene understanding algorithms is to
annotate every pixel in an image with its semantic
category (e.g., grass, road, sky, etc.)
•  Traditional approaches employ a conditional Markov
random field (CRF) to model the scene, i.e.,

•  The unary and pairwise terms in these models are
learned from training data
•  Training can be slow. Moreover, the models scale
poorly with the number of images and categories.

A Label Transfer Approach
[Liu et al., 2009; Gould and Zhang, 2012]

•  We propose a model-free approach which transfers
labels across images
•  Our method builds a large graph of images where
edges link regions, or patches, of similar appearance
•  The graph can be thought of as representing a
nearest neighbour field.
Building the Graph
[Barnes et al., 2010; Gould and Zhang, 2012]

•  Formally, we solve the optimization problem:




•  We have developed a fast move-making technique
based on the PatchMatch algorithm for solving this
problem efficiently.








Advantages

•  Our model does not require any prior training (but
does require a well tuned distance function)
•  Our method scales linearly with the number of images
and can be build incrementally
•  Our method is agnostic to the label set, which can be
updated or changed at any time.

How can we leverage the growing availability of online visual resources to improve machine
understanding of images?
Semantic Scene Understanding
Technical contact: Stephen Gould (stephen.gould@anu.edu.au)
The Problem
Research Excellence in ICT
Wealth Creation for Australia
•  For the past 50 years the grand challenge of computer
vision has been to get machines to see.
•  In the very early days of computer vision researchers
had little data and poor computational resources. As a
result methods failed.

•  Technology has advanced tremendously. As have
computer vision and machine learning algorithms.
•  Today we have ready access to millions of images,
advanced machine learning algorithms and
significant computation power.
•  The goal of this project is to answer the question:

How can we leverage these resources to get
machines to see?
This Semantic Scene Understanding project is part of NICTA’s Machine
Learning Group, which is turning Machine Learning into an Engineering
Discipline and an everyday tool of ICT developers.


•  First ever objective and automatic measurement of
trust

•  Collaboration with AFRL United States and Sunway
University Malaysia
•  Funding from AFOSR USA
•  Long term: intelligent systems and robots that
understand their users better and so adapt their
response and behaviour dynamically.

•  Explore the impact of trust and its indicators on
human behaviour
•  Develop underlying assessment approach by
focussing on human-human and/or human-machine
interactions
•  Next steps:
•  Investigate wider scope of the kinds of trust
indicators that can be reliably detected
•  Expand scope of research through real-life field-
based validation.
•  Automatic, objective, real-time and non-obtrusive
assessment
•  Multidisciplinary: psychology / HCI, signal processing,
machine learning
•  Multiple sensors: speech, language, interaction,
gestures, EEG, GSR
•  Experimental methodology
•  Extensive lab-based validation of basic assessment
approach
•  Real-life data validation through industry
partnership.
Developing and applying dynamic and objective methods of measuring trust by analysing
human behaviour and their interaction with systems, to support their decision making in critical
situations and high-load workplaces.
Modelling Trust using
Multimodal Behavioural Analysis
The Problem
Research Excellence in ICT
Wealth Creation for Australia
[Insert image or gragh] [Insert image or gragh]
•  Trust - a critical factor in driving human behaviour and
decision making while using autonomous systems.

•  Trust - extremely subjective and severely affected by
other factors like cognitive load.

•  Go beyond psychology’s manual, subjective, post-hoc
measures of:

•  Mental state, like cognitive load, emotion,
depression, and trust

•  Causal and cultural effects of behaviour on trust and
performance.

•  Robust automatic measurement of trust has never
been attempted.
Impact
[separator panels between images]
[separator panels between images]
This project is part of NICTA’s Machine Learning Research Group, which
is turning Machine Learning into an Engineering Discipline and an
everyday tool of ICT developers.


Technical Contact:
Asif.Khawaja@nicta.com.au
Business Contact:
Fang.Chen@nicta.com.au
User’s
Behavioural
Input
Trust
Feedback
Loop
Calibrate
Response to
Build Trust
System Processes
User’s Multimodal
Behavioural Trust
Features
User Interacts
with the System
(computer, robot)
Approach
Next Steps
17.5
18
18.5
19
19.5
20
20.5
21
21.5
22
22.5
23
Low High
Avg Freq of Total Pauses per Minute

0
0.05
0.1
0.15
0.2
0.25
Low High
0.95
1
1.05
1.1
1.15
1.2
1.25
1.3
Low High
0.14
0.145
0.15
0.155
0.16
0.165
0.17
Low High
Percent of Negative emotion words
Percent of Trust words
Percent of Distrust words
c
esses
i
modal
l
Tr
us
t
es