Machine Learning with Apache Hama

unknownlippsAI and Robotics

Oct 16, 2013 (4 years and 28 days ago)

66 views

1

Machine Learning with
Apache Hama

Tommaso Teofili

tommaso

[at] apache [dot] org

2

About me


ASF member having fun with:


Lucene / Solr


Hama


UIMA


Stanbol


… some others


SW engineer @

Adobe R&D


3

Agenda


Apache Hama and BSP


Why machine learning on BSP


Some examples


Benchmarks

4

Apache Hama


Bulk Synchronous Parallel computing
framework on top of HDFS for massive
scientific computations


TLP since May 2012


0.6.0 release out soon


Growing community

5

BSP
supersteps


A BSP algorithm is composed by a sequence of

supersteps


6

BSP
supersteps


Each task


Superstep 1


Do some computation


Communicate with other tasks


Synchronize


Superstep 2


Do some computation


Communicate with other tasks


Synchronize











Superstep N


Do some computation


Communicate with other tasks


Synchronize



7

Why BSP


Simple
programming

model


Supersteps

semantic

is

easy


Preserve

data
locality


Improve

performance


Well

suited

for iterative
algorithms


8

Apache Hama architecture


BSP Program execution flow

9

Apache Hama architecture

10

Apache Hama


Features


BSP API


M/R like I/O API


Graph
API


Job management / monitoring


Checkpoint recovery


Local & (Pseudo) Distributed run modes


Pluggable message transfer architecture


YARN supported


Running in Apache Whirr



11

Apache Hama BSP API


public abstract class BSP<K1, V1, K2, V2,
M extends Writable> …


K1, V1 are key, values for inputs


K2, V2 are key, values for outputs


M are they type of messages used for task
communication

12

Apache Hama BSP API


public void
bsp
(
BSPPeer
<K1, V1, K2, V2,
M> peer) throws ..



public void setup(
BSPPeer
<K1, V1, K2, V2,
M> peer) throws ..



public void cleanup(
BSPPeer
<K1, V1, K2,
V2, M> peer) throws ..

13

Machine learning on BSP


Lots (most?) of ML algorithms are
inherently iterative


Hama ML module currently counts


Collaborative
filtering


Clustering


Gradient descent

14

Benchmarking architecture

HDFS

Solr

Lucene

DBMS

Hama

Mahout

Node

Node

Node

Node

15

Collaborative filtering


Given user preferences on movies


We want to find users “near” to some
specific user


So that that user can “follow” them


And/or see what they like (which he/she could
like too)

16

Collaborative filtering BSP


Given a specific user


Iteratively (for each task)


Superstep

1*
i


Read a new user preference row


Find how near is that user from the current user


That is finding how near their preferences are


Since they are given as vectors we may use vector
distance measures like Euclidean, cosine, etc. distance
algorithms


Broadcast the measure output to other peers


Superstep

2*
i


Aggregate measure outputs


Update most relevant users



Still to be committed (HAMA
-
612)



17

Collaborative filtering BSP


Given user ratings about movies


"john"
-
> 0, 0, 0, 9.5, 4.5, 9.5, 8


"paula"
-
> 7, 3, 8, 2, 8.5, 0, 0


"jim”
-
> 4, 5, 0, 5, 8, 0, 1.5


"tom"
-
> 9, 4, 9, 1, 5, 0, 8


"timothy"
-
> 7, 3, 5.5, 0, 9.5, 6.5, 0



We ask for 2 nearest users to “
paula


and
we get

timothy


and

tom




user recommendation


We can extract highly rated movies
“timothy” and “tom” that “
paula


didn

t see


Item recommendation




18

Benchmarks


Fairly simple algorithm


Highly iterative


Comparing to Apache Mahout


Behaves better
than ALS
-
WR


Behaves similarly to
RecommenderJob

and
ItemSimilarityJob



19

K
-
Means clustering


We have a bunch of data (e.g. documents)


We want to group those docs in k
homogeneous clusters



Iteratively for each cluster


Calculate new cluster center


Add doc nearest to new center to the cluster







20

K
-
Means clustering






21

K
-
Means clustering BSP


Iteratively


Superstep

1*
i


Assignment phase


Read vectors splits


Sum up temporary centers with assigned
vectors


Broadcast sum and ingested vectors count


Superstep

2*
i


Update phase


Calculate the total sum over all received
messages and average


Replace old centers with new centers and
check for convergence

22

Benchmarks


One rack (16 nodes 256 cores) cluster


10G network


On average faster than Mahout’s impl

23

Gradient descent


Optimization algorithm


Find a (local) minimum of some function


Used for


solving linear systems


solving non linear systems


in machine learning tasks


linear regression


logistic regression


neural networks backpropagation





24

Gradient descent


Minimize a given (cost) function


Give the function a starting point (set of parameters)


Iteratively change parameters in order to minimize the
function


Stop at the (local)


minimum








There’s some math but intuitively:


evaluate derivatives at a given point in order to choose
where to “go” next


25

Gradient descent BSP


Iteratively


Superstep

1*
i


each task calculates and broadcasts portions of the
cost function with the current parameters


Superstep

2*
i


aggregate and update cost function


check the aggregated cost and iterations count


cost should always decrease


Superstep

3*
i


each task calculates and broadcasts portions of
(partial) derivatives


Superstep

4*
i


aggregate and update parameters




26

Gradient descent BSP


Simplistic example


Linear regression


Given real estate market dataset


Estimate new houses prices given known
houses’ size, geographic region and prices


Expected output: actual parameters for the
(linear) prediction function




27

Gradient descent BSP


Generate a different model for each region


House item vectors


price
-
> size


150k
-
> 80


2 dimensional space


~1.3M vectors dataset




28

Gradient descent BSP


Dataset and model fit






0
100000
200000
300000
400000
500000
600000
700000
800000
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
29

Gradient descent BSP


Cost checking






30

Gradient descent BSP


Classification


Logistic regression with gradient descent


Real estate market dataset


We want to find which estate listings belong to agencies


To avoid buying from them




Same algorithm


With different cost function and features


Existing items are tagged or not as “belonging to agency”


Create vectors from items’ text


Sample vector


1
-
> 1 3 0 0 5 3 4 1

31

Gradient descent BSP


Classification

32

Benchmarks


Not directly comparable to Mahout’s
regression algorithms



Both SGD and CGD are inherently better than
plain GD


But Hama GD had on average same
performance of Mahout’s SGD / CGD


Next step is implementing SGD / CGD on top of
Hama





33

Wrap up


Even if


ML module is still “young” / work in progress


and tools like Apache Mahout have better
“coverage”



Apache Hama can be particularly useful in
certain “highly iterative” use cases


Interesting benchmarks

34

Thanks!