3A.Software and Stac..

trainerhungarianAI and Robotics

Oct 20, 2013 (3 years and 9 months ago)

66 views

1

1

Software Stack:
The
processing/workflow/rules
backbone
that supports all other activities


Flexible, rapidly configurable and easily
customized


Supports all data flow and processing

throughout the process


Cloud
-
based platforms allowing parallel
processing and rapid analytics deployment

2

Marketing Services

Credit and Risk

Data Services

Supply Chain Services

Common Pool

Common Pool

Ancillary Work

Global Markets

We use a “Federated” model for developing software: co
-
located
teams focus on specific solutions and technologies, and a Core
Architecture team ensures best practices and software reuse

Analytic Engines

High Speed Transactions

UI Design

Process/Tools

“Big Batch” Processing

QA Leadership

UI Design

UI Configuration/Development

ETL Configuration/Development

QA Staff

Operations

Core

Architecture,

PMO

(San Diego,

Boston)

Shanghai

Jersey

City

San Diego

Boston

New

Delhi

TopCoder

Analytic Engines

Analytics Support Team

Data Entry Forms

UI Widgets

Design Ideas

Documents

Utilities

Hadoop Processing

Data Visualization

Too detailed for website,
for background info

3

We dramatically reduce the time it takes to translate Signals from
development to deployment by using the same infrastructure in
both environments

Model Development

Model Deployment

Review
Historical
Data

Merge, Tag,
Sample,
Split Data

Calculate
Variables

Execute
Segment
Logic

Determine
Best
Variables

Build
Models

Package
Models to
Deploy

Receive
Client Data

Verify/Clean
Data

Calculate
Variables

Execute
Model (s)

Execute
Business
Logic

Respond to
Client

Report, Act,

Analyze

Common

Signals Creation Engine

(Python, SQL, Java)


ETL


Online Data Structuring

(Java, Python, Hadoop, SQL)

ETL


Offline Data Structuring

(SAS, BIQ, Hadoop, SQL, Python)

User Interfaces

Model Training Tools

(SAS, Matlab, CART, NN, Custom)

Model Execution

(Java, Python)

Too detailed for website,
for background info

4

We have built the Opera Stack to support a machine learning
environment in a Big Data world


while also responding to
customers’ needs and requirements

Secure Browser
(HTML and Flex)

Mobile
Devices

Custom Handheld

Web Services

ACCESS

FTP and SFTP

MQ, JMS,
Sockets

Opera Proprietary


Best of Breed Commercial
and Open Source

Client
-
system agnostic

Accepts real
-
time feeds

Accepts unstructured data

Includes Opera
-
proprietary data, e.g.,
our Zip+4 Geo data

EXTRACT

STRUCTURE

DATA STORE

SIGNALS IDENTIFICATION

SIGNALS DATABASE

ANALYTICS RESULTS DATABASE

ETL

SIGNALS
CREATION

ANALYTICS

DIRECTED ACTIONS TO HUMANS

DIRECTED ACTIONS TO MACHINES

Visualization & Dashboards

DIRECTED
ACTIONS

Enterprise: Oracle,
SAP, Customer
Systems, etc.

In Vivo Feedback

3
rd

Party (Including
Live Feed)

Web/
Unstructured

Opera Proprietary
Databases

DATA SOURCES

MODEL EXECUTION

Ensemble

Decision Rules

NoSQL

BIQ

Hadoop In
-
Memory RDBMS

Library of algorithms
to create Signals:


Time Series


Events


Sparse Matrix


Statistics


Geo
-
Location

MODEL DEVELOPMENT/

CONTINUOUS LEARNING


Neural Nets


Restricted Boltzmann


K
-
Nearest Neighbor


SVD


Linear Regression


Matrix Factorization


Global Effects


Factor Similarity


Temporal Distance


Stochastic Gradient Descent


Kalman Filters




SIGNALS SELECTION


Mahalanobis Distance


Stepwise Regression


Classification Trees


Mutual Information


Principal Component Analysis


Sensitivity Analysis


Bivariate Charts


Clustering Covariance

SAS
Talend

Hadoop

Kettle

Simulations

Scores, Decisions, Curricula, Alerts

ANALYSIS SERVICES

INPUT SERVICES

NoSQL In
-
Memory RDBMS

OUTPUT
SERVICES

OLAP Memcached RDBMS

MQ Web Services Sockets SFTP SQL SAP

SERVICE ORIENTED, HIGHLY SCALABLE PLATFORM

BIQ

Optimized approaches
for record read speed

Flexibility; optimized
approaches

Proprietary libraries
and capabilities for
critical signal and
modeling areas

Optimized approaches
for high
-
volume reads

Flexible interfaces
(client
-
system
agnostic; structured,

unstructured)

Flexible output into
any insertion point

Too detailed for website,
for background info

5

ACCESS

ETL

SIGNALS
CREATION

ANALYTICS

DIRECTED
ACTIONS

DATA SOURCES

ANALYSIS SERVICES

INPUT SERVICES

OUTPUT SERVICES

SERVICE ORIENTED, HIGHLY SCALABLE PLATFORM

LISTEN

SYNTHESIZE PATTERNS

EVALUATE

DIRECT ACTION TO PEOPLE
OR MACHINES

CAPTURE, INTEGRATE,
TRANSFORM

CREATE AND SELECT
VARIABLES

ASSEMBLE

TRIAL AND TEST

DETECT SIGNALS

A common reference model can be found consistently across
Solutions

This show process of
data management to
final input

6

Quick integration of new data and modification of existing data is critical. We
continue to add to a library of connectors, accumulators and transforms that
allow us to do this in batch and real time


and that are adaptable to new
data sets and systems


Batch Processing

Automatically detects new
files
and runs them
through a
configurable

series

of verification,
cleanse, load, and analyze
processes


Real
-
time Processing

Messages are run through
a
similar verification layer
to access the core data
and analytics services

REAL TIME CONNECTORS



Client Website


Transaction
Interface


(Web Server, TCP/IP
Socket, MQ)


Analyze

Parallel

Call Center App

Data Operations

Analysis Request

Analysis Response

BATCH CONNECTORS

Data To
Process

Encrypt/
Compress
(optional)

Extract

Client Hosted

FTP

File
Monitor

Verify,
Clean,

& Load

Analyze

Parallel

Data

Opera
Hosted

7

Getting Signals from Big Data is highly dependent on how that Big Data is
stored. We have sophisticated, complementary signal extraction tools housed
in a flexible architecture that can readily support additional engines; we are
“infrastructure agnostic,” not tied to a single technology

Client
Data

External

Relational
Database

Network
Database

Memory

Cube

Distributed
Database

Indexed

Files

General reporting
and ad hoc analysis


SQL Server

Oracle

MySQL


Netezza

Tracking “chains” of
relationships, like
references



Neo Database

Custom Develop

High speed “slice
and dice” analysis
of a flat set of data


BIQ

Mondrian

Custom

Processing of large
data that can be
broken down and
processed in pieces
(Map/Reduce)

Hadoop

Vertica

Accessing data by a
single key value,
like a customer
profile record


Berkeley DB

CTREE

Object
Transaction

Fast retrieval of
transaction history
for longitudinal
profiling


Custom

MongoDB

Signals Algorithms (SQL, MapReduce, MDX, API)

8

We use a “container approach” that provides access to reusable platform layers
while also supporting customizable model and rule execution with dynamic Java
and Python plug
-
ins. Services can support both model development (signal
selection, model training) and production (signal creation, model execution)


Analytic Service

Analytic Service

Verify request type

Signal Creation


Retrieve “context” data from Data Store


Calculate variables In
-
Database or In
-
Code


Data

Store

Transaction
Router

Real
-
Time
Transactions

Batch
Processing




Loop
through
records

ANALYTIC SERVICES

Signals
DB

Save and Deliver

Results
DB

Production


Combine and
Execute Models


Decision Rules

Dev
/ Learning


Select Signals


Train Models


Evaluate
Performance

Too detailed for website,
for background info

9

We deliver into any insertion point, batch or real time, through any type of
interface, taking in feedback via a closed loop system that allows analytics to
learn and adapt without lag time. Advanced visualization tools allow human
judgment to provide oversight



Results
Database

Extract
Analytics
Response

Interactive User Interfaces

Client Systems

Reporting
and
Scenarios

Response
Connectors


Real
-
time


Batch


Websites


Operational Databases


Customer/Sales Touch
-
Points


Multiple form factors (desktop,
handheld,
iPad
)


Custom Dashboards


“What
-
if” Scenarios


Rule / Parameter Changes

Analytics

Automated Feedback

Manual Adjustments

Too detailed for website,
for background info

10

Our platform allows for rapid implementation with low IT investment. Key to
this is the flexibility to address customers’ requirements for both the outbound
data streams and the inbound “directed actions”

Public Cloud

Opera Cloud

Client Cloud

On Premise


xxx

Enterprise

Systems

Structure

Data

Structured

Data Store

Social Media
Collector

Directed Actions to
Machines

Data

Warehouses

Operational

Databases

Enterprise Operations

Websites

POS

Events

Extract

SFTP

SQL

Web Services

MQ

Sockets

Signals Creation

Signals Database

Third Party

Data

Results

Database

Directed Actions to
Humans

Delivery

SFTP

Web Services

Web
Application
Server

Opera performs
all data
structuring and
integration


little
new customer IT
required

Opera creates all
interfaces
(platform agnostic:
desktops, tablet,
handhelds, kiosks,
more); little new
customer IT
required

Opera adds in new data
sources with valuable Signals

Model
Execution

Develop/Train
Models

Signal Select

Learning

Adaptive Learning Capabilities

Too detailed for website,
for background info

11

Case Study: Building the Platform to manage daily feeds for
recommendations


Created for Schwan’s and reused for Nissan

xxx

xxx

OPERA SERVER

SCHWAN’S

Customers, Products, Sales,
Routes, Inventory

SFTP

Auto FTP Pull

Data Check/Load

MySQL RDBMS

INTERFACES

SFTP

Reports

Web Server

(Query tools)

Handhelds

Auto FTP Push

Recommender (KNN,
Neural Network)

xxx

xxx

OPERA SERVER

NISSAN

Inventory, Condition Reports,
Sales, Auctions

SFTP

Auto FTP Pull

Data Check/Load

MySQL RDBMS

INTERFACES

SFTP

Reports

Web Server

(Query tools,

Real
-
time Web
Service)

Auction Block

Auto FTP Push

Pricing (KNN,
Kalman
)

Live

Dashboard

Generic process is configured to
continually read and write files from
FTP/SFTP servers

Generic process checks all files for
errors and calculates statistics on all
fields prior to loading

Create KNN functions and a Neural
Network train/execute function

Used the same programs created for
Schwan’s to monitor for files, clean
them, and load to an RDBMS

Used the KNN function from Schwan’s.
Create new Kalman train/execute
functions

Used HTML/PDF report tools from
Schwan’s. Created new iPad application

Created both Flex and HTML/PDF
reports

12

Case Study: Building the Platform for large Batch Processing


Created for
Mobiuss and reused for FA Performance Aggregator (and Insight Engine)

xxx

OPERA SERVER

Data Providers

Loans, Borrowers,
Properties, Payments, Bonds,
Deals, Prices

SFTP

Auto FTP Pull

Data Check/Load

MSSQL RDBMS

INTERFACES

SFTP

Reports and Scenarios

Web Server

(Query tools,
trigger
simulators)

Intex

Pricing Engine

Auto FTP Push

Forecast Models

(Run in
Hadoop
)

xxx

xxx

OPERA SERVER

MSSB

FA, Accounts, Performance,
Benchmarks

FTP

Auto FTP Pull

Data Check/Load

MSSQL

INTERFACES

Web Server

(Query tools)

Aggregator

Reused the same FTP monitor job from
Schwan’s/Nissan

Same Data Check/Load process, with
MSSQL interface

Implement WoE algorithms and others
to create forecast models that are fed
to the pricing engine. First use of
Hadoop to run 80,000 Deals and Monte
Carlo

Used the same programs created for
Schwan’s to monitor for files

Created new Aggregation engine to
execute complex rollup and
benchmarking functions

Used same Flex framework to create
the Performance dashboard. Also,
created a BIQ dataset

Used same Flex framework to build
Mobiuss Reports

HDFS

FA Dashboard

BIQ

Create Hadoop (HDFS) load process.
Same process used for Insight Engine.

Too detailed for website,
for background info,
can’t show how we
scaled it