Efficient Deployment of Predictive Analytics through Open Standards and Cloud Computing

meatcologneInternet και Εφαρμογές Web

3 Νοε 2013 (πριν από 3 χρόνια και 9 μήνες)

84 εμφανίσεις

Efficient Deployment of Predictive
Analytics through Open Standards
and Cloud Computing

ACM

SIGKDD

Explorations

Volume 11, Issue 1, July 2009

報告人:黃啟智

學號:
69821503

1

Outline


Introduction


Interoperability and Open Standards


Putting Models to Work


Performance


Conclusion



2

Introduction


Deployment and practical application of
predictive model:


Limited choice of options


Often takes months for models to be integrated and
deployment(
時間冗長
)


Custom coding or proprietary process(
成本昂貴
)


Open standards
and
Internet
-
based technologies
are available to provide a more effective end
-
to
-
end solution for the deployment.


3

Introduction


SOA

Service Oriented Architecture


For the design of loosely coupled IT systems(e.g.
based on Web Services)


SaaS

Software
-
as
-
a
-
Service


A license model


Vendors deliver software solutions as a cost
-
effect
service


PMML

Predictive Model Markup Language


A open standard that allows users to exchange
predictive models among various software tools


4

Interoperability and Open Standards


Cloud Computing



Web Services

SaaS
, IaaS, PaaS

Cloud Computing

(an computing
architecture)

SOAP

WSDL

UDDI

RPC

SOA

REST

(access)

(SOA
-
related standards)

5

Interoperability and Open Standards


Cloud Computing


Reduce cost and management overhead for IT


Shift in the geography of computation


The Internet as a platform


A set of services that provide computing resources


A variety of services:

Storage capacity, processing power, business application…


Cloud infrastructures

Amazon Web Service(AWS)

Sector/Sphere

Hadoop




The OCC, Open Cloud Consortium
(www.opencloudconsortium.org)



6

Interoperability and Open Standards


Web Service


W3C

definition


Providing
the foundation of SOA


Use
XML

to code and decode data


Use
SOAP
(Simple Object Access

Protocol
) standard to transport data


Data can be easily exchanged between different
applications and platforms


Can be described by a
WSDL
(
Web Service Description
Language
) file


UDDI
(
Universal Description, Discovery, and Integration
):a
platform independent XML
-
based registry for business to
list themselvs on the Internet


http://zh.wikipedia.org

7

Interoperability and Open Standards


A SOAP request for PMML file


(The file/model was previously uploaded to the service provider.)

8

A JDM(Java Data Mining) call

Interoperability and Open Standards


SaaS



Software as a Service


A license model
, users may access software via
the Internet(not actually “buy and install”)


Users only
pay for the right for a certain time
period(e.g. NT$100 for an hour)


No upfront costs in setting up servers or software


Minimizing the risk of purchasing costly software
that may not provide adequate return of
investment


E.g. Salesforce.com, Google Apps.

9

Interoperability and Open Standards


PMML
-
Predictive Model Markup Language


Developed by the Data Mining Group(
www.dmg.org
)


An open standard for representing data mining
models


An
XML
-
based

language


Can describe
data preprocessing
and
predictive
algorithms


Can represent
input data
and
data transformations




10


Interoperability and Open Standards

PMML Structure examples(a test data file)

Required (active)data fields

Predicted data field

11

Interoperability and Open Standards

PMML Structure examples


12

Interoperability and Open Standards

PMML Structure examples


Array of counts of different
field values under different
class labels

13

Interoperability and Open Standards


PMML Model
specifics (parameters, architecture) are
defined under different model elements
, including:


Neural Networks


Support Vector Machines


Regressions Models


Decision Trees


Association Rules


Clustering


Sequences


Naïve Bayes


Text Models


Rules

14

Interoperability and Open Standards


PMML On
-
The
-
Go


PMML 4.0

Time series, boolean data types, model segmentation,
lift/gain charts, expanded range of built
-
in functions…


More applications support export and import
functionality in PMML


Open
-
source environments:

KNIME(
www.knime.org
)

The R project(
www.R
-
project.org
)

15

Putting Models to Work




Amazon EC2


Elastic Compute Cloud


powered by Amazon Web Services


ADAPA scoring engine


uses JDM(Java Data Mining) Web Service calls and therefore


allows for automatic decisions to be virtually embedded into
enterprise systems and applications


available as a service to minimize total cost

16


Model Verification and Execution






Typical tasks in the life cycle of a data mining project:


Building, deploying, testing and using data mining models

(A cross
-
platform and multi
-
vendor environment)


Putting Models to Work

17


Model Verification and Execution


Model testing/verification


To ensure that both
the scoring engine
and
the model
development environment

produce exactly the same
result


It allows for
a test file
containing any number of records
with
all the necessary input variables
and
the expected
result

for each record to be upload for
score matching

Putting Models to Work

18


Model Verification and Execution


Model execution


Batch mode
: via the web console ,uploading a data file
containing records (in CSV format or zipped)


Real
-
Time mode
: via web services,

embedded calls (SOAP request)

Putting Models to Work

instance

19


Demo Excel
-
addin

Putting Models to Work

20


Demo Excel
-
addin

Putting Models to Work

21


Security on the Cloud


Uploading
proprietary information
to 3rd party
service → security and control questions


The engine
should not store any data


An instance
shares nothing with other instances


And instance is Private (via authentication)


Access to an instance only
via HTTPS


Models and data are deleted
after an instance is
terminated

Putting Models to Work

22

Performance







Instance type reference : http://aws.amazon.com/ec2/

23

Performance







24

Conclusion


Cloud computing

It offers a powerful and revolutionizing way for putting
data mining models to work.


Open standard(PMML)

It helps predictive models to be easily accessed from
anywhere in the enterprise (web
-
service calls or
uploading data files).


The combination of both
accelerates

the
deployment of predictive models and makes it
more affordable
.

25

Questions


Security (transmission via Internet, to a 3rd
party vendors)

privacy


High
-
dimensionality /

Large database

transmission time + processing time


26