builderanthologyΤεχνίτη Νοημοσύνη και Ρομποτική

19 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

81 εμφανίσεις

MapReduce on Matlab


Erum Afzal


MapReduce is a programming model
devised at Google to facilitate the
processing of large data sets.

For example, it is used at Google for
indexing websites


Matlab, being software tenders with a
technical computing environment.

It is being used for numerical
manipulation, simulations and data

MapReduce on Matlab

MapReduce on Matlab allows Matlab users to
apply MapReduce’s framework to their own data
processing requirements. Like all data mining
tasks, dense detailed digital images. Similarly if
we could import matlab file to Map Reduce
framework several functionalities of Matlab can
processed on Hadoop as well as.

Working of MapReduce

As with the application of MapReduce, data can
be processed using multiple processors in
parallel. With this it can

Handle large volumes of input data.

Speed up processing due to parallelization of



Each piece of input data,
identified by a key and a value,
is mapped to 1 or more
intermediate key/value



Each worker processes a part of
the intermediate key/values
pairs, to generate the final
key/value pairs.

Working of Matlab

The Matlab Parallel Computing Toolbox offers the
framework to write programs for a cluster of
computers. This enables a master computer to
dispatch jobs to workers running on McGill’s

Master creates

MapReduce job,

passes user defined

Map and Reduce

functions to workers

At each worker, the input

key pairs are fed

into the map function to

get intermediate

key/value pairs

At each worker, the

intermediate key/value pairs

are fed into the reduce

function to get final key/value

pairs the output


Orthogonal Matching Pursuit

Here in example

A sparse signal is
that x, can be stored
by multiplying it with
a measurement
matrix, A:

Where, y = Ax

y can be used to
recover x by

using OMP,

Application with Mapreduce

OMP becomes slow in
its tradition solution as
A grows larger in size. If
we resolve the problem
by processing individual
performed using


OMP becomes slow
as A grows larger in
size. This problem
can be solved by
processing individual
slices of A in parallel.

The MapReduce
method actually.


MapReduce was implemented on Matlab, and
was used to run Orthogonal Matching Pursuit..

MapReduce on Matlab has the potential to
improve the performance of numerous parallel
processing algorithms by bringing the power
ofthe MapReduce programming model to Matlab

Singular Value Decomposition (SVD)

The Singular Value Decomposition (SVD) is a
powerful matrix decomposition frequently used
for dimensionality reduction. SVD is widely used
in problems involving least squares problems,
linear systems and finding a low rank
representation of a matrix. A wide range of
applications uses SVD as its main algorithmic


Finding patterns in large scale graphs, with millions and billions of
edges is increasing in computer network security intrusion
detection, spamming, in web applications.

Such a setting is the estimation of the clustering coefficients and
the transitivity ratio of the graph, which effectively translates in
computing the number of triangles that each node participates in
or the total number of triangles in the graph respectively.

The triangles are a frequently used network statistic in the
exponential random graph model and naturally appear in models
of real
world network evolution, the triangles have been used in
several applications such as spam detection ,uncovering the
hidden thematic structure of the web and for link recommendation
in online social networks .

It is worth noting that in social networks triangles have a natural
interpretation. AS

“friends of friends are frequently friends themselves.”

MATLAB implementation, k
rank approx

function 0 = EigenTriangleLocal(A,k) {A is the adjacency matrix, k is

required rank approximation}

n = size(A,1);

0 = zeros(n,1); {Preallocate space for 0}

opts.isreal=1; opts.issym=1; {Specify that the matrix is real and

[u l] = eigs(A,k,’LM’,opts); {Compute top k eigenvalues and
eigenvectors of


l = diag(l)’;

for j=1:n do

0(j) = sum( l.ˆ3.*u(j,:).ˆ2)/2

end for

Summary of network data



In this work the EIGENTRIANGLE and
EIGENTRIANGLELOCAL algorithms have been
proposed to estimate the total number of
triangles and the number of triangles per node
respectively in an undirected, outweighed graph.
The special spectral properties which real
networks frequently possess make both
algorithms efficient for the triangle counting
problem. our knowledge, the knowledge

Fast Randomized Tensor Decompositions

There are many real
world problems involve multiple
aspect data. For example fMRI (functional magnetic
resonance imaging) scans, one of the most popular
neuroimaging techniques, result in multi
aspect data:


task conditions

Monitoring systems result in three
way data, machine id

type of measurement

timeticks. The machine depending
on the setting can be for instance a sensor (sensor
networks) or a computer (computer networks). Large data
volumes generated by personalized web search, are
frequently modeled as three way tensors, i.e., users


web pages.

All above is quite time taking task….


Ignoring the multi
aspect nature of the data by flattening
them in a two
way matrix and applying an exploratory
analysis algorithm, e.g., singular value decomposition
(SVD) is not optimal and typically hurts significantly the

The same problem holds in the case of applying e.g., SVD
on different 2
way slices of the tensor as observed by [94].
On the contrary, multiway data analysis techniques
succeed in capturing the multilinear structures in the data,
thus achieving better performance than the
aforementioned ideas.

Problem Solution

Tensor decompositions have found as solution in
many applications in different scientific
disciplines. Specially in computer vision and
signal processing like neuroscience, time series
anomaly detection, psychometrics, graph
analysis and data mining.

Algorithm 8 MACH



Tensor decompositions are useful in many real
world problems. A simple randomized algorithm
MACH is purposed which is easily parallelizable
and adapted to online streaming systems.

This algorithm will be incorporated in the
PEGASUS library, a graph and tensor mining
system for handling large amounts of data.

More Applications

Comparing the Performance of Clusters,
Hadoop, and Active Disks on Microarray
Correlation Computations.

Beyond Online Aggregation: Parallel and
Incremental Data Mining with Online Map
Reduce (DRAFT).

Reduce for Machine Learning on Multicore


Charalampos E. Tsourakaki “Data Mining with
MAPREDUCE:Graph and Tensor Algorithmswith
Applications”, March 2010.

Arjita Madan, “ MapReduce on Matlab”