Here

builderanthologyΤεχνίτη Νοημοσύνη και Ρομποτική

19 Οκτ 2013 (πριν από 3 χρόνια και 11 μήνες)

84 εμφανίσεις

www.company.com

MapReduce on Matlab

By:

Erum Afzal



www.company.com

MapReduce


MapReduce is a programming model
devised at Google to facilitate the
processing of large data sets.


For example, it is used at Google for
indexing websites

www.company.com

Matlab


Matlab, being software tenders with a
technical computing environment.


It is being used for numerical
manipulation, simulations and data
processing.

www.company.com

MapReduce on Matlab


MapReduce on Matlab allows Matlab users to
apply MapReduce’s framework to their own data
processing requirements. Like all data mining
tasks, dense detailed digital images. Similarly if
we could import matlab file to Map Reduce
framework several functionalities of Matlab can
processed on Hadoop as well as.

www.company.com

Working of MapReduce


As with the application of MapReduce, data can
be processed using multiple processors in
parallel. With this it can


Handle large volumes of input data.



Speed up processing due to parallelization of
tasks


www.company.com

Continue…

Map:

Each piece of input data,
identified by a key and a value,
is mapped to 1 or more
intermediate key/value

pairs.

Reduce

Each worker processes a part of
the intermediate key/values
pairs, to generate the final
key/value pairs.

www.company.com

Working of Matlab


The Matlab Parallel Computing Toolbox offers the
framework to write programs for a cluster of
computers. This enables a master computer to
dispatch jobs to workers running on McGill’s
cluster.

Master creates

MapReduce job,

passes user defined

Map and Reduce

functions to workers

At each worker, the input

key pairs are fed

into the map function to

get intermediate

key/value pairs

At each worker, the

intermediate key/value pairs

are fed into the reduce

function to get final key/value

pairs the output

www.company.com

Continue…

www.company.com

Orthogonal Matching Pursuit

Here in example


A sparse signal is
that x, can be stored
by multiplying it with
a measurement
matrix, A:


Where, y = Ax


y can be used to
recover x by


using OMP,

www.company.com

Application with Mapreduce


OMP becomes slow in
its tradition solution as
A grows larger in size. If
we resolve the problem
by processing individual
performed using
MapReduce.

www.company.com

Continue….


OMP becomes slow
as A grows larger in
size. This problem
can be solved by
processing individual
slices of A in parallel.


The MapReduce
method actually.

www.company.com

Results


MapReduce was implemented on Matlab, and
was used to run Orthogonal Matching Pursuit..


MapReduce on Matlab has the potential to
improve the performance of numerous parallel
processing algorithms by bringing the power
ofthe MapReduce programming model to Matlab

www.company.com

Singular Value Decomposition (SVD)


The Singular Value Decomposition (SVD) is a
powerful matrix decomposition frequently used
for dimensionality reduction. SVD is widely used
in problems involving least squares problems,
linear systems and finding a low rank
representation of a matrix. A wide range of
applications uses SVD as its main algorithmic
tool.

www.company.com

Problem


Finding patterns in large scale graphs, with millions and billions of
edges is increasing in computer network security intrusion
detection, spamming, in web applications.


Such a setting is the estimation of the clustering coefficients and
the transitivity ratio of the graph, which effectively translates in
computing the number of triangles that each node participates in
or the total number of triangles in the graph respectively.


The triangles are a frequently used network statistic in the
exponential random graph model and naturally appear in models
of real
-
world network evolution, the triangles have been used in
several applications such as spam detection ,uncovering the
hidden thematic structure of the web and for link recommendation
in online social networks .


It is worth noting that in social networks triangles have a natural
interpretation. AS

“friends of friends are frequently friends themselves.”

www.company.com

MATLAB implementation, k
-
rank approx


function 0 = EigenTriangleLocal(A,k) {A is the adjacency matrix, k is
the

required rank approximation}

n = size(A,1);

0 = zeros(n,1); {Preallocate space for 0}

opts.isreal=1; opts.issym=1; {Specify that the matrix is real and
symmetric}

[u l] = eigs(A,k,’LM’,opts); {Compute top k eigenvalues and
eigenvectors of

A}

l = diag(l)’;

for j=1:n do

0(j) = sum( l.ˆ3.*u(j,:).ˆ2)/2

end for

www.company.com

Summary of network data

www.company.com

Results

www.company.com

Continue….


In this work the EIGENTRIANGLE and
EIGENTRIANGLELOCAL algorithms have been
proposed to estimate the total number of
triangles and the number of triangles per node
respectively in an undirected, outweighed graph.
The special spectral properties which real
-
world
networks frequently possess make both
algorithms efficient for the triangle counting
problem. our knowledge, the knowledge

www.company.com

Fast Randomized Tensor Decompositions


There are many real
-
world problems involve multiple
aspect data. For example fMRI (functional magnetic
resonance imaging) scans, one of the most popular
neuroimaging techniques, result in multi
-
aspect data:
voxels
×

subjects
×

trials
×
task conditions
×

timeticks.
Monitoring systems result in three
-
way data, machine id
×

type of measurement
×

timeticks. The machine depending
on the setting can be for instance a sensor (sensor
networks) or a computer (computer networks). Large data
volumes generated by personalized web search, are
frequently modeled as three way tensors, i.e., users
×

queries
×

web pages.


All above is quite time taking task….



www.company.com

Problem


Ignoring the multi
-
aspect nature of the data by flattening
them in a two
-
way matrix and applying an exploratory
analysis algorithm, e.g., singular value decomposition
(SVD) is not optimal and typically hurts significantly the
performance


The same problem holds in the case of applying e.g., SVD
on different 2
-
way slices of the tensor as observed by [94].
On the contrary, multiway data analysis techniques
succeed in capturing the multilinear structures in the data,
thus achieving better performance than the
aforementioned ideas.

www.company.com

Problem Solution



Tensor decompositions have found as solution in
many applications in different scientific
disciplines. Specially in computer vision and
signal processing like neuroscience, time series
anomaly detection, psychometrics, graph
analysis and data mining.

www.company.com

Algorithm 8 MACH
-
HOSVD

www.company.com

Results

www.company.com

Continue….


Tensor decompositions are useful in many real
world problems. A simple randomized algorithm
MACH is purposed which is easily parallelizable
and adapted to online streaming systems.


This algorithm will be incorporated in the
PEGASUS library, a graph and tensor mining
system for handling large amounts of data.

www.company.com

More Applications


Comparing the Performance of Clusters,
Hadoop, and Active Disks on Microarray
Correlation Computations.


Beyond Online Aggregation: Parallel and
Incremental Data Mining with Online Map
-
Reduce (DRAFT).


Map
-
Reduce for Machine Learning on Multicore
.


www.company.com

Refrences


Charalampos E. Tsourakaki “Data Mining with
MAPREDUCE:Graph and Tensor Algorithmswith
Applications”, March 2010.


Arjita Madan, “ MapReduce on Matlab”