Associative Data Schemes for Cloud Computing

foamyflumpMobile - Wireless

Nov 21, 2013 (3 years and 8 months ago)

74 views

1

Associative Data Schemes for Cloud Computing





Amir Basirat

PhD Candidate

Amir.Basirat@monash.edu



Supervisor: Dr Asad Khan

Clayton School of IT, Monash University

STINT Workshop, Lulea, Sweden
-

May 2012


2


Cloud
Computing


Hadoop MapReduce

Research Objective


Graph Neuron for Scalable Pattern Recognition

HGN and DHGN



Contents




8

EdgeHGN



9

Simulation Showcase


Pattern Recognition and Distributed Approach

Web
-
based GN

1

4

3

5

6

2

7

3


What is Cloud Computing?




The

vision

of

Cloud

Computing

encompasses

a

general

shift

of

computer

processing,

storage,

and

software

delivery

away

from

the

desktop

and

local

servers,

across

the

network,

and

into

next

generation

of

data

centers

hosted

by

large

infrastructure

companies
.

4


Big Data!





An

IDC

estimate

put

the

size

of

the


digital

universe


at

0
.
18

zetta
-
bytes

back

in

2006
,

and

forecasted

a

tenfold

growth

by

2011

to

1
.
8

zetta
-
bytes
.



This

flood

of

data

is

coming

from

many

sources
.

Consider

the

following
:



The

New

York

Stock

Exchange

generates

about

one

terabyte

of

new

trade

data

per

day
.



Facebook

hosts

approximately

10

billion

photos,

taking

up

one

petabyte

of

storage
.



Ancestry
.
com
,

the

genealogy

site,

stores

around

2
.
5

petabytes

of

data
.



The

Internet

Archive

stores

around

2

petabytes

of

data,

and

is

growing

at

a

rate

of

20

terabytes

per

month
.



The

Large

Hadron

Collider

near

Geneva,

Switzerland,

will

produce

about

15

petabytes

of

data

per

year
.


5

Challenge?




Our

existing

capability

to

generate

data

seems

to

outstrip

our

capability

to

analyze

it
.

6


Data Management in Cloud




There are some underlying issues that need to be addressed properly by any data

management scheme deployed for clouds (Abadi, 2009), including:


capability to parallelise data workload


security concerns as a result of storing data at an untrusted host


and data replication functionality.

Thus the question, how to
effectively process immense
data sets is becoming
increasingly urgent.

7


Cloud
Computing


Hadoop MapReduce

Research Objective


Graph Neuron for Scalable Pattern Recognition

HGN and DHGN



Contents




8

EdgeHGN



9

Simulation Showcase


Pattern Recognition and Distributed Approach

Web
-
based GN

1

4

3

5

6

2

7

8

Hadoop




In

a

nutshell,

what

Hadoop

provides
:



“A

reliable

shared

storage

and

analysis

system
.

The

storage

is

provided

by

HDFS

and

analysis

by

MapReduce



(Hadoop, 2011)


9





10







MapReduce




(Hadoop, 2011)


MapReduce

programming

model

requires

expressing

the

solutions

with

two

functions
:

Map

and

Reduce
.



A

map

function

takes

a

key/value

pair,

computes

and

emits

a

set

of

intermediate

key/value

pairs

as

output
.



A

reduce

function

merges

all

intermediate

values

associated

with

the

same

intermediate

key,

executes

some

computation

on

them,

and

emits

the

final

output
.


11







Word Count in MapReduce




1:

class
MAPPER

2:

method
MAP

(
docid

a
, doc
d
)

3:
for all
term
t

in doc
d

do

4:
EMIT
(term
t
, count
1
)



1:

class
REDUCER

2:
method

REDUCE
(term
t
, counts [
c
1
,c
2
,
…])

3: sum = 0

4:
for all
count
c

in counts [
c
1
,c
2
,…]
do

5: sum = sum +
c

6:
EMIT
(term
t
, count
sum
)

Pseudo code for word count algorithm in MapReduce

12



Challenges and Hurdles in MapReduce





Map

function

conducts

its

operation

assuming

all

related

data

is

distributed

vertically,

i
.
e
.

records

being

uniformly

distributed

across

the

network
.

However,

it

is

possible

that

some

parts

of

the

related

records

being

stored

at

different

physical

locations
.





Intermediate

records

would

need

to

be

sorted

before

these

are

input

to

the

reduce

function
.





Solution

must

be

expressed

in

terms

of

the

Map

and

Reduce

functions

working

on

key/value

pairs,

while

in

some

cases

this

may

not

be

possible

or

natural
,

such

as

multi
-
stage

processes
.




Moreover,

dependency

on

HDFS

for

data

storage

and

retrieval

can

create

single
-
points

of

failure

for

Map/Reduce

infrastructure,

especially

at

master

nodes
.


13


Cloud Computing


Hadoop MapReduce

Distributed Hierarchical Graph Neuron (DHGN)

Graph Neuron (GN)

Hierarchical Graph Neuron (HGN)


Contents




8

Simulation Showcase



9

Question Time


Distributed Pattern Recognition


Edge Detecting Hierarchical Graph Neuron (EdgeHGN)

1

4

3

5

6

2

7

Existing

data

management

schemes

do

not

work

well

when

data

is

partitioned

among

numerous

available

nodes

dynamically
.



Approaches

towards

scalable

data

management

in

cloud,

which

offer

greater

portability,

manageability

and

compatibility

of

applications

and

data,

are

yet

to

be

fully

realised
.






14


Solution?





T
reat

data
records as
patterns


As

a

result,

data

storage

and

retrieval

is

performed

using

a

distributed


pattern

recognition

approach

that

is

implemented

through

the

integration


of

loosely
-
coupled

computational

networks,

followed

by

a

divide
-
and
-

distribute

approach

that

allows

distribution

of

these

networks

within

the


cloud

dynamically
.


To

develop

a

distributed

data

access

scheme

that

enables

data


storage

and

retrieval

by

association


15



Associative Model of Data



This

associative

model

treats

data

records

as

pattern

and

hence

it


does

not

matter

how

data

is

represented
.



The

associative

model

uses

a

single,

common

structure

for

all


types

of

data

16


Cloud
Computing


Hadoop MapReduce

Research Objective


Graph Neuron for Scalable Pattern Recognition

HGN and DHGN



Contents




8

EdgeHGN



9

Simulation Showcase


Pattern Recognition and Distributed Approach

Web
-
based GN

1

4

3

5

6

2

7

17


Distributed Pattern Recognition



Distributed

computing

approach

offers

seemingly

unlimited

scalability

towards

pattern

growth

with

the

rapid

advent

of

network

computing

technology

that

enables

processing

to

be

performed

within

the

body

of

a

network

rather

than

concentrating

on

exhaustive

single
-
CPU

utilization



Existing

approaches

are

still

lagged

behind,

due

to

highly
-
complex

recognition

algorithms

being

implemented
.



Neural

network

approach

offers

promising

tool

for

large
-
scale

pattern

recognition
.

However,

there

are

also

several

issues

related

to

its

implementation
.

These

include
:


convergence

problems,



complex

iterative

learning

procedures,



and

low

scalability

with

regards

to

the

training

data

required

for

optimum

recognition




18


Cloud
Computing


Hadoop MapReduce

Research Objective


Graph Neuron for Scalable Pattern Recognition

HGN and DHGN



Contents




8

EdgeHGN



9

Simulation Showcase


Pattern Recognition and Distributed Approach

Web
-
based GN

1

4

3

5

6

2

7

19

An eight node GN is in the process of storing patterns (Khan, 2002).


P1 (RED), P2 (BLUE), P3 (BLACK), and P4 (GREEN)



20


Cloud
Computing


Hadoop MapReduce

Research Objective


Graph Neuron for Scalable Pattern Recognition

HGN and DHGN



Contents




8

EdgeHGN



9

Simulation Showcase


Pattern Recognition and Distributed Approach

Web
-
based GN

1

4

3

5

6

2

7

21


Hierarchical Graph Neuron (HGN)



HGN compositions of 2
-
dimension (7x5) and 3
-
dimension (7x5x3) for pattern sizes

22


Distributed Hierarchical Graph Neuron (DHGN)





DHGN distributed pattern recognition architecture


(Muhammad Amin and Khan, 2009).


23


Cloud
Computing


Hadoop MapReduce

Research Objective


Graph Neuron for Scalable Pattern Recognition

HGN and DHGN



Contents




8

EdgeHGN



9

Simulation Showcase


Pattern Recognition and Distributed Approach

Web
-
based GN

1

4

3

5

6

2

7

24


Research Objectives



Redesigning

data

management

architecture

from

a

scalable

associative

computing

perspective

for

creating

a

database
-
like

functionality

that

can

scale

up

or

down

over

the

available

infrastructure

without

interruption

or

degradation,

dynamically
.




Investigating

a

distributed

data

access

scheme

that

enables

data

storage

and

retrieval

by

association

while

data

records

are

treated

as

patterns



Processing

the

database

and

handling

the

dynamic

load

using

a

distributed

pattern

recognition

approach



Developing

an

intelligent

MapReduce

framework

that

allows

complex

data

representations

to

be

used

as

keys

for

Map

operations



Reducing

cloud

storage

fragmentation

by

implementing

a

divide
-
and
-
distribute

approach




Enhancing

the

existing

cloud

data

management

models

for

scalability



Validation

of

results

and

finding

asymptotical

limits

of

the

technique

through

a

rigorously

designed

computer

simulation

environment

25


Cloud
Computing


Hadoop MapReduce

Research Objective


Graph Neuron for Scalable Pattern Recognition

HGN and DHGN



Contents




8

EdgeHGN



9

Simulation Showcase


Pattern Recognition and Distributed Approach

Web
-
based GN

1

4

3

5

6

2

7

26


Progress to Date






Proposing

a

Web
-
based

GN

for

Real
-
time

Image

Recognition


27


Web
-
based GN



(
a) Total number of positive and negative matches.

(
b) Distortion rates for each line of image (each constructed HGN).


Image
distortion rates vs. rotation degrees.



28


Cloud
Computing


Hadoop MapReduce

Research Objective


Graph Neuron for Scalable Pattern Recognition

HGN and DHGN



Contents




8

EdgeHGN



9

Simulation Showcase


Pattern Recognition and Distributed Approach

Web
-
based GN

1

4

3

5

6

2

7

29


Edge Detecting Hierarchical Graph Neuron (
EdgeHGN
)



7
-
by
-
7 bit Binary Character
A
and its 7 equally
-
sized DHGN subnets

Reducing number of neurons by applying a drop
-
fall technique

30

Drop Fall Scheme




Drop
-
fall

is

often

used

for

dividing

touching

pairs

of

digits

into

isolated

character
.

Drop
-
fall

algorithm

simulates

the

path

produced

by

a

drop

of

water

falling

from

above

the

character

and

sliding

downwards

along

the

contour

under

the

action

of

gravity
.




When

the

drop

gets

stuck

in

a

groove,

it

melts

the

character‘s

stroke

and

then

continues

to

fall
.

The

dividing

path

produced

by

Drop
-
fall

algorithm

depends

on

three

aspects
:

a

start

point,

movement

rules,

and

direction
.




There

are

four

possible

directions

that

generally

produce

four

different

paths

to

divide

touching

digits
.

They

can

start

on

the

left

or

right

side

and

can

evolve

downwards

or

upwards
.

One

of

the

four

is

likely

to

produce

the

right

result
.




Therefore,

a

set

of

Drop
-
fall

algorithms

consists

of

four

methods

which

try

to

segment

a

block

by

simulating

a

drop
-
falling

process
:

Descending
-
left

algorithm,

Descending
-
right

algorithm,

Ascending
-
left

algorithm,

and

Ascending
-
right

algorithm

31

EdgeHGN Performance



32


Cloud
Computing


Hadoop MapReduce

Research Objective


Graph Neuron for Scalable Pattern Recognition

HGN and DHGN



Contents




8

EdgeHGN



9

Simulation Showcase


Pattern Recognition and Distributed Approach

Web
-
based GN

1

4

3

5

6

2

7

33

Disclaimer




I

am

not

proposing

any

computer

vision

scheme

for

Image

processing

here
.


I

am

not

suggesting

in

any

way

that

my

scheme

is

capable

of

competing

against

a

bunch

of

image

processing

and

face

recognition

algorithms

which

are

treated

in

the

literature
.



I

am

doing

pattern

matching

and

I

could

simply

use

any

form

of

data

representation

for

the

purpose

of

my

research
.


Images

are

complex

matrixes

of

values,

but

people

can

relate

to

images

very

well,

and

that

is

why

I

found

it

an

easy

way

to

illustrate

the

effectiveness

and

strength

of

my

proposed

model
.


34

Binary Image Recognition




Fifty different individuals in the face image dataset obtained from the Face Recognition Data.

35

Sobel

Operator




Edge map after applying Global Binary Signature and Sobel‘s edge detection

In simple terms, the Sobel operator calculates the gradient of the image intensity at

each point, giving the direction of the largest possible increase from light to dark and

the rate of change in that direction.


The result therefore shows how "abruptly" or "smoothly" the image changes at that
point, and therefore how likely it is that that part of the image represents an edge, as
well as how that edge is likely to be oriented.


36


References



Abadi,

D
.
J
.

(
2009
)
.

Data

Management

in

the

Cloud
:

Limitations

and

Opportunities,

Bulletin

of

the

Technical

Committee

on

Data

Engineering,

pp
.

3

-

12
.


Khan,

A
.

I
.

and

Muhamad

Amin,

A
.

(
2007
)
.

One

shot

associative

memory

method

for

distorted

pattern

recognition,

Al

2007
:

Advances

in

Artificial

Intelligence,

Springer,

Berlin/Heidelberg,

pp
.

705

709
.


Muhamad

Amin,

A
.

and

Khan,

A
.

I
.

(
2009
)
.

Collaborative
-
comparison

learning

for

complex

event

detection

using

distributed

hierarchical

graph

neuron

(DHGN)

approach

in

wireless

sensor

network,

Al

2009
:

Advances

in

Artificial

Intelligence,

Springer,

Berlin/Heidelberg,

pp
.

111

120


Nasution,

B
.

B
.

and

Khan,

A
.

I
.

(
2008
)
.

A

hierarchical

graph

neuron

scheme

for

real
-
time

pattern

recognition,

IEEE

Transactions

on

Neural

Networks

19
(
2
)
:

212

229
.


Shiers,

J
.

(
2009
)
.

Grid

today,

clouds

on

the

horizon,

Computer

Physics

Communications,

pp
.

559

-

563
.


Welsh,

M
.
,

Malan,

D
.
,

Duncan,

B
.
,

Fulford
-
Jones,

T
.

and

Moulton,

S
.

(
2004
)
.

Wireless

sensor

networks

for

emergency

medical

care,

GE

global

conference,

Harvard

university

and

Boston

University

school

of

medicine,

Boston,

MA
.


37


Acknowledgement



Thank You.

I

would

like

here

to

thank

everyone

who

helped

me

to

make

this

possible
.

The

first

and

foremost

person

that

deserves

immense

gratitude

is

my

thesis

supervisor,

Dr

Asad

Khan

for

his

support

and

kind

contributions
.


38